#8 of 25

Streaming

The difference between a product that feels alive and one that feels broken

Without streaming:

You send a request. You wait. Nothing happens. Five seconds. Eight seconds. Twelve seconds. A full response appears at once.

From the user's perspective: the application is frozen. Is it working? Did it crash? Should I click again?

With streaming:

You send a request. Words start appearing immediately. The model types, in real time, token by token. The user sees the answer forming. They start reading before it is done.

From the user's perspective: the application is alive.

Same model. Same response. Same time to complete. Completely different experience.

Streaming is the technique of sending the model's response token by token as it is generated, rather than waiting for the full response and sending it all at once. SSE stands for Server-Sent Events — the web standard that makes it possible.

The number that makes it real

A typical model response might take 5–10 seconds to generate fully. Without streaming, the user sees nothing for 5–10 seconds. With streaming, they see the first word in under a second. The total time is identical. The perception of speed is completely different.

Why this matters to you

User experience research consistently shows that perceived speed matters as much as actual speed. A response that starts in 800ms and takes 8 seconds to complete feels faster than a response that delivers all at once after 4 seconds.

If you are building anything with a conversational interface — a chatbot, an assistant, a writing tool, a support product — streaming is not optional. It is the baseline expectation. Users who experience non-streaming interfaces in 2026 assume something is broken.

How to use this

Most model APIs support streaming with a single parameter change — typically `stream: true` in the request body. The response changes from a single JSON object to a sequence of events, each containing a token chunk. Your frontend renders each chunk as it arrives.

The implementation is straightforward. The user experience difference is not subtle.

Verified March 2026 · Source: Anthropic, OpenAI streaming documentation

← All terms

← Vision / image input MCP — Model Context Protocol →