#30 of 50
Max output tokens
Your output might be getting silently cut off
What is max output tokens
Max output tokens is the maximum number of tokens a model can generate in a single response. It is a hard limit — the model stops generating at this boundary regardless of whether the response is complete.
Most current models support 4,096–16,384 output tokens. Some newer models (Claude 3.5 Sonnet, GPT-4o) support up to 8,192. Claude 3 Opus supports 4,096.
Why it matters
If you ask a model to generate a long document and it stops mid-sentence, this is likely the max output token limit. It is separate from the context window — a model with 200K context can still only generate 8K output per call. For long-form generation, you need to chain multiple calls.
Verified March 2026 · Source: Anthropic docs, OpenAI docs
Related terms