
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A healthcare chatbot on Bedrock generates long summaries, increasing latency and token cost. Which parameter should you adjust?
A
temperature
B
top-p
C
max-tokens
D
stop-sequences
Explanation:
The correct answer is C. max-tokens.
max-tokens controls the maximum number of tokens (words/sub-words) that the model can generate in its response
When summaries are too long, it directly increases:
Latency: Longer responses take more time to generate
Token cost: More tokens = higher API costs
Setting a lower max-tokens limit forces the model to generate more concise summaries
This reduces both generation time and cost
The model learns to be more efficient within the token constraint
A. temperature: Controls randomness/creativity of responses, not length
B. top-p: Controls vocabulary diversity through nucleus sampling, affects quality not length
D. stop-sequences: Used to define stopping points for generation, but doesn't directly control overall response length
For healthcare chatbots where concise, accurate information is crucial, setting an appropriate max-tokens limit ensures responses remain focused and cost-effective while maintaining quality.