sourc.dev
Home LLMs Tools SaaS APIs
Claude 3.5 Sonnet input $3.00/1M ↓ -50%
GPT-4o input $2.50/1M
Gemini 1.5 Pro input $1.25/1M
Mistral Large input $2.00/1M ↓ -33%
DeepSeek V3 input $0.27/1M
synced 2026-04-05
Claude 3.5 Sonnet input $3.00/1M ↓ -50%
GPT-4o input $2.50/1M
Gemini 1.5 Pro input $1.25/1M
Mistral Large input $2.00/1M ↓ -33%
DeepSeek V3 input $0.27/1M
synced 2026-04-05
#26 of 50

Context caching

You might be paying full price for tokens the provider already has in memory

What is context caching

Context caching stores previously sent prompt tokens on the provider side so they do not need to be reprocessed on subsequent calls. When the same system prompt, few-shot examples, or document context appears in multiple requests, the cached tokens are read from memory at a reduced cost instead of being processed from scratch.

Anthropic charges 90% less for cached input tokens. OpenAI offers a similar mechanism. The savings compound when your application sends the same context prefix thousands of times per day.

Why it matters

If your application uses a long system prompt or large document as context, and you make repeated calls, context caching can reduce input costs by 50–90%. The tradeoff is a small additional cost for the initial cache write. For applications with stable, repeated context — the typical pattern — this is a net saving from the first hour.

Verified March 2026 · Source: Anthropic prompt caching docs, OpenAI docs

Related terms
TokenInput priceContext window
← All terms
← What does 70B mean Batch pricing →