sourc.dev
Home LLMs Tools SaaS APIs
Claude 3.5 Sonnet input $3.00/1M ↓ -50%
GPT-4o input $2.50/1M
Gemini 1.5 Pro input $1.25/1M
Mistral Large input $2.00/1M ↓ -33%
DeepSeek V3 input $0.27/1M
synced 2026-04-05
Claude 3.5 Sonnet input $3.00/1M ↓ -50%
GPT-4o input $2.50/1M
Gemini 1.5 Pro input $1.25/1M
Mistral Large input $2.00/1M ↓ -33%
DeepSeek V3 input $0.27/1M
synced 2026-04-05
#30 of 50

Max output tokens

Your output might be getting silently cut off

What is max output tokens

Max output tokens is the maximum number of tokens a model can generate in a single response. It is a hard limit — the model stops generating at this boundary regardless of whether the response is complete.

Most current models support 4,096–16,384 output tokens. Some newer models (Claude 3.5 Sonnet, GPT-4o) support up to 8,192. Claude 3 Opus supports 4,096.

Why it matters

If you ask a model to generate a long document and it stops mid-sentence, this is likely the max output token limit. It is separate from the context window — a model with 200K context can still only generate 8K output per call. For long-form generation, you need to chain multiple calls.

Verified March 2026 · Source: Anthropic docs, OpenAI docs

Related terms
Context windowTokenOutput price
← All terms
← Overage Tool use →