
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
Q6 – A healthcare chatbot on Bedrock generates long summaries, increasing latency and token cost. Which parameter should you adjust?
A
temperature
B
top-p
C
max-tokens
D
stop-sequences
Explanation:
When a healthcare chatbot on Amazon Bedrock generates long summaries that increase latency and token costs, the max-tokens parameter should be adjusted.
Token Limitation: The max-tokens parameter controls the maximum number of tokens (words/subwords) that the model can generate in its response.
Cost Control: Each token generated incurs computational cost and API usage charges. By limiting the maximum tokens, you directly control the length of responses and associated costs.
Latency Reduction: Longer responses require more processing time, increasing latency. Setting an appropriate max-tokens limit ensures responses complete within acceptable timeframes.
Healthcare Context: In healthcare applications, concise, relevant summaries are often preferred over verbose responses, making token limitation particularly important.
A. temperature: Controls randomness/creativity of responses (higher = more random, lower = more deterministic). Doesn't directly control response length.
B. top-p: Controls nucleus sampling for response diversity. Affects quality/variety but not length.
D. stop-sequences: Defines sequences that stop generation, but this is reactive rather than proactive length control.
Set max-tokens to an appropriate value for your use case (e.g., 500-1000 tokens for summaries) to balance response quality, cost, and latency.