
Answer-first summary for fast verification
Answer: max-tokens
## Explanation The correct answer is **C. max-tokens**. ### Why max-tokens is the correct parameter: - **max-tokens** controls the maximum number of tokens (words/sub-words) that the model can generate in its response - When summaries are too long, it directly increases: - **Latency**: Longer responses take more time to generate - **Token cost**: More tokens = higher API costs ### How adjusting max-tokens helps: - Setting a lower max-tokens limit forces the model to generate more concise summaries - This reduces both generation time and cost - The model learns to be more efficient within the token constraint ### Why other options are incorrect: - **A. temperature**: Controls randomness/creativity of responses, not length - **B. top-p**: Controls vocabulary diversity through nucleus sampling, affects quality not length - **D. stop-sequences**: Used to define stopping points for generation, but doesn't directly control overall response length ### Best Practice: For healthcare chatbots where concise, accurate information is crucial, setting an appropriate max-tokens limit ensures responses remain focused and cost-effective while maintaining quality.
Author: Ritesh Yadav
Ultimate access to all questions.
No comments yet.