#26 of 50

Context caching

You might be paying full price for tokens the provider already has in memory

What is context caching

Context caching stores previously sent prompt tokens on the provider side so they do not need to be reprocessed on subsequent calls. When the same system prompt, few-shot examples, or document context appears in multiple requests, the cached tokens are read from memory at a reduced cost instead of being processed from scratch.

Anthropic charges 90% less for cached input tokens. OpenAI offers a similar mechanism. The savings compound when your application sends the same context prefix thousands of times per day.

Why it matters

If your application uses a long system prompt or large document as context, and you make repeated calls, context caching can reduce input costs by 50–90%. The tradeoff is a small additional cost for the initial cache write. For applications with stable, repeated context — the typical pattern — this is a net saving from the first hour.

Verified March 2026 · Source: Anthropic prompt caching docs, OpenAI docs

Related terms

Token Input price Context window

← All terms

← What does 70B mean Batch pricing →