#30 of 50

Max output tokens

Your output might be getting silently cut off

What is max output tokens

Max output tokens is the maximum number of tokens a model can generate in a single response. It is a hard limit — the model stops generating at this boundary regardless of whether the response is complete.

Most current models support 4,096–16,384 output tokens. Some newer models (Claude 3.5 Sonnet, GPT-4o) support up to 8,192. Claude 3 Opus supports 4,096.

Why it matters

If you ask a model to generate a long document and it stops mid-sentence, this is likely the max output token limit. It is separate from the context window — a model with 200K context can still only generate 8K output per call. For long-form generation, you need to chain multiple calls.

Verified March 2026 · Source: Anthropic docs, OpenAI docs

Related terms

Context window Token Output price

← All terms

← Overage Tool use →