SkipTheTerms
AI Terms of Service summarizer — because life is short
FastAPI · Python · Manifest V3 · Supabase · Groq API · Llama 3.3 70B
the problem
The dominant UX pattern for AI browser extensions is broken: notice problem → open panel → paste text → click button → wait → read result → close panel → return to page → realise you lost context. Eight steps. For a tool that exists to save time. Terms of Service pages are fully predictable — the URL contains /terms or /privacy, the page title says it. The trigger is automatic. The extension shouldn't wait for you to initiate anything.
the approach
A Chrome Extension (Manifest V3) that detects Terms of Service pages automatically, pre-triggers summarisation in the background, and surfaces the result as a badge — one click. The FastAPI backend receives the document text, hashes it, checks Supabase for an existing summary, returns it if found, calls Llama 3.3 70B via Groq if not, and stores the result. Cold calls take under 3 seconds. Cache hits: 50ms.
key decisions
↳ Auto-detect + background summarisation
The extension doesn't wait for you to click 'summarise'. It detects the page context from the URL and title, triggers the backend call in the background, and shows a badge when ready. The interaction collapses to: arrive → see badge → click once → read. Four steps, not eight.
↳ Document hash caching via Supabase
Every Terms of Service document gets hashed on arrival. If Supabase has a summary for that hash, return it. Skip the LLM entirely. This hit a 70% cache rate within the first week — because the same GDPR policy gets read by hundreds of users. Cost reduction: 40%. Latency on cache hits: 50ms.
↳ Llama 3.3 70B for accuracy
Summarising legal documents requires a capable model — smaller models miss nuance and produce summaries that technically describe the document but miss the dangerous clauses. Llama 3.3 70B via Groq hits the accuracy bar without the GPT-4o price tag.
by the numbers
Cold call latency
< 3 seconds
Cache hit latency
50ms
LLM cost reduction
40%
Cache hit rate (week 1)
~70%
what i actually learned
Backend latency isn't just an engineering metric — it's a UX decision. Every second a user waits is a second they're thinking about the wait, not the result. And the best LLM optimisation is not calling the LLM: a worse model with a cache is often a better product than a better model without one.
50ms cache hits · 40% cost reduction · <3s cold