Max tokens sets the maximum number of tokens the model can generate in its response. When this limit is reached, generation stops—even mid-sentence. It's a hard ceiling on output length.
This parameter controls costs (you pay per token) and prevents runaway generations.