Streaming delivers LLM output incrementally, token by token, rather than waiting for the complete response. Users see text appear progressively, dramatically improving perceived responsiveness.
This is essential for chat interfaces where waiting 10+ seconds for long responses would feel unresponsive.