Introduction: What Is This Project?
Legal Audit is an experimental open-source web tool for a fast, automated legal audit of any website. It checks whether the site sends any user data before consent, and whether it has the mandatory documents in place—terms and conditions, cookie policy, and privacy policy. For each of these documents, it offers an AI-powered legal analysis using the Grok API.
The project’s main motivation was to quickly identify all trackers present on a site, especially which advertising and measurement systems are active before any consent is given. Coming from an e-commerce background, I needed something that could instantly show what trackers a site uses, and at the same time, I wanted to test with my brother how reliable current AI models are at interpreting legal documents.
There are certainly similar tools out there, but Legal Audit is unique because it’s fully open-source and free.
The Idea: How Did It Start?
It started out of pure necessity—there was no simple tool that could immediately tell me which trackers were running on a given e-commerce site and whether the site was compliant with EU law. We also wanted to experiment with how well AI models could interpret and critique legal documents in Czech and English.
Our main goals:
- Detect all trackers and advertising scripts running on any website, regardless of whether the cookie banner has been interacted with.
- Analyze legal documents (privacy policy, cookie policy, terms) using AI, specifically with Grok.
- Create a tool that is free and open-source.
How It Works
The workflow is straightforward and focused on speed:
1. Enter Website URL
The user simply enters the URL of the site to be audited.
2. Detection Phase
- The backend scans the website’s HTML for known tracking scripts and identifies any requests being sent out as the page loads.
- No need to interact with the cookie banner—the audit is performed in the “fresh” (default) state.
- The tool looks for three legal documents: privacy policy, cookie policy, and terms and conditions, using regex patterns in the HTML.
- Found document URLs are displayed; users can edit them before proceeding.
3. AI Analysis (Optional)
- Once the documents are confirmed, users can trigger the AI legal audit.
- All three documents are sent (in parallel, asynchronously) to Grok (API, model 3 Mini), with language and jurisdiction settings (Czech = CZ/EU law, English = EU/GDPR).
- Grok generates a detailed analysis for each document: a percentage score, specific bullet points about issues, and references to the relevant law.
4. Results
- Users see two main sections:
- List of detected trackers: Are they sending data pre-consent or not? Three statuses—red (data sent before consent), orange (uncertain/undetectable), green (no data sent).
- AI Audit: Three Grok-generated reports (one per document), each with a score and detailed critique.
- All output is displayed instantly in the web UI. AI analysis can be copied with a single click.
- No registration, no login, no analytics, no cookies. Only audit results are stored (for cache and user convenience).
5. Handling Old Results
- If an audit was performed on the same site within the last hour, the cached results are shown first.
- Users can always choose to re-run a fresh audit.
- There’s no history/log of old audits for users—just a notification that the current result is cached.
UX / UI
- Frontend: Built in Flutter (Dart), deployed on Netlify, mobile-friendly (works as a web app, installable as a PWA).
- Design: Minimalist, with clear colored blocks (red, orange, green) to indicate tracker statuses, and simple output cards for each AI audit.
- No registration or authentication required. Privacy-first: user text and results are not linked to any identity.
Technical Details & Stack
- Estimated costs: ~$0.06 per full audit (varies with API usage)
- Target group: Web developers, privacy geeks, e-commerce, compliance auditors
- Open-source: MIT license
Main technologies:
- Flutter 3.x – frontend (PWA-ready, mobile/web)
- Netlify – hosting & deployment
- Supabase – Postgres DB for caching audit results (no user analytics)
- Vercel + Next.js – serverless API endpoints (strict CORS policy)
- BrowserCloud API – headless browser for tracker/request detection
- Grok 3 Mini – AI legal analysis via API (one call per document)
Architecture:
- Frontend in Flutter (installable as a PWA, fast UI for mobile & web)
- Backend is serverless—only exposes endpoints for audits & AI analysis
- Website scanning and request monitoring handled server-side; frontend only receives results
- Supabase stores only recent audit results (cache for 1 hour max), no cookies or analytics
Experimental parameters:
- Grok API parameters:
- temperature: 0.4
- top_p: 0.8
- max_tokens: 4500
- reasoning: { effort: “low”, max_tokens: 300, exclude: false }
- Parallelized audit (up to 3 Grok API calls per audit)
- Language/jurisdiction auto-detected (EN = GDPR/EU, CZ = CZ/EU)
- No registration, no login, no export (copy to clipboard only)
- Streaming output planned for faster UX
The Result: How Does It Work in Practice?
- Usage:
- Enter the URL.
- See immediately which trackers are running and whether they send data before consent.
- View links to all detected legal documents.
- (Optional) Run the AI audit—get scores and detailed legal critiques for each document.
- Copy results to clipboard. No export/download or sharing built-in (yet).
- AI Output Example:
- Each legal document is scored (usually max 75%—it always finds something to critique).
- Bullet points list the issues, each referencing GDPR or CZ law.
- Output is concise but practical—enough for a site owner to identify and fix compliance gaps.
- Performance:
- AI audits are parallelized but still take some time (streaming responses planned, not yet implemented).
- Only the last audit for a given URL is stored and reused for about an hour, unless the user requests a fresh one.
What We Learned & What’s Next?
Findings:
- Detecting trackers is simple on paper, but in practice, browser emulation and monitoring requests on serverless platforms (like Vercel) is a major challenge.
- AI models (Grok) are actually decent at legal critique—even with “dense” or “lawyerese” documents, they always find something to comment on and cite the exact paragraph.
- The audit is conservative, and we have never got more than an 80% score.
- First, we did a prompt in Czech. We found out that if we use an English prompt, we can save about 200 tokens (approximately 40% of tokens)—the Czech is much more exhausting on token count (definitely check the Tokenizer by xAI – https://x.ai/api).
- Some websites are blocking us—for example Alza—because we look too much like robots, probably.
- In this project, there is a bit more code than in the last—files were quite big and when we were using ChatGPT, we were facing problems with ChatGPT not understanding enough context window.
To improve/expand:
- Add streaming responses for faster UX.
- Add more languages, and possibly support for other jurisdictions.
- Potentially add statistics or aggregate reporting in the future.
Links & Resources
In conclusion:
If you want a quick, no-login, open-source check of whether your website is leaking data pre-consent, and how solid your legal documents are, Legal Audit is for you. Try it, experiment, and let us know where it can improve!