
Explanation:
The max-tokens parameter controls the maximum number of tokens (words/characters) that the model can generate in its response. When responses are excessively long, increasing latency and token costs, adjusting max-tokens to a lower value will limit the response length, reducing both latency and costs.
Explanation of other options:
Only max-tokens directly addresses the issue of response length, latency, and token cost.
Ultimate access to all questions.
No comments yet.