You're probably spending 10× more on LLM calls than you need to
LLM API economics · February 2026
cost optimisation · caching · what nobody tells you
When I first launched SkipTheTerms, every request hit the Groq API fresh. Llama 3.3 70B, every time, for every user, for every document. Clean. Expensive. Stupid.
The problem became obvious fast: Terms of Service documents don't change. The same GDPR policy gets read by hundreds of users. I was paying API credits to summarise an identical 5,000-word document dozens of times a day. Different users. Same document. Same result.
The fix was two hours of work: hash the document content on arrival, check Supabase for an existing summary, return it if found, call the LLM if not, store the result. Cache hit rate hit 70% within the first week.
Cost reduction: 40%. Latency on cache hits: 50ms vs ~3 seconds cold. User experience: the thing felt instant on the majority of requests.
The broader point is that most LLM applications have natural caching surfaces — repeated queries, identical documents, common prompts — and most developers skip them entirely because they're thinking about model quality rather than call frequency.
The uncomfortable truth: a worse model with a cache is often a better product than a better model without one. Your users probably cannot distinguish GPT-4o from Llama 3.3 70B on a summarisation task. They can absolutely tell the difference between 50ms and 3 seconds.
Optimise for the experience, not the benchmark.