RH
← projects

SkipTheTerms

AI Terms of Service summarizer — because life is short

FastAPI · Python · Manifest V3 · Supabase · Groq API · Llama 3.3 70B

the problem

The dominant UX pattern for AI browser extensions is broken: notice problem → open panel → paste text → click button → wait → read result → close panel → return to page → realise you lost context. Eight steps. For a tool that exists to save time. Terms of Service pages are fully predictable — the URL contains /terms or /privacy, the page title says it. The trigger is automatic. The extension shouldn't wait for you to initiate anything.

the approach

A Chrome Extension (Manifest V3) that detects Terms of Service pages automatically, pre-triggers summarisation in the background, and surfaces the result as a badge — one click. The FastAPI backend receives the document text, hashes it, checks Supabase for an existing summary, returns it if found, calls Llama 3.3 70B via Groq if not, and stores the result. Cold calls take under 3 seconds. Cache hits: 50ms.

key decisions

Auto-detect + background summarisation

The extension doesn't wait for you to click 'summarise'. It detects the page context from the URL and title, triggers the backend call in the background, and shows a badge when ready. The interaction collapses to: arrive → see badge → click once → read. Four steps, not eight.

Document hash caching via Supabase

Every Terms of Service document gets hashed on arrival. If Supabase has a summary for that hash, return it. Skip the LLM entirely. This hit a 70% cache rate within the first week — because the same GDPR policy gets read by hundreds of users. Cost reduction: 40%. Latency on cache hits: 50ms.

Llama 3.3 70B for accuracy

Summarising legal documents requires a capable model — smaller models miss nuance and produce summaries that technically describe the document but miss the dangerous clauses. Llama 3.3 70B via Groq hits the accuracy bar without the GPT-4o price tag.

by the numbers

Cold call latency

< 3 seconds

Cache hit latency

50ms

LLM cost reduction

40%

Cache hit rate (week 1)

~70%

what i actually learned

Backend latency isn't just an engineering metric — it's a UX decision. Every second a user waits is a second they're thinking about the wait, not the result. And the best LLM optimisation is not calling the LLM: a worse model with a cache is often a better product than a better model without one.

50ms cache hits · 40% cost reduction · <3s cold