
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
Q6 – A healthcare chatbot on Bedrock generates long summaries, increasing latency and token cost. Which parameter should you adjust?
A
temperature
B
top-p
C
max-tokens
D
stop-sequences
Explanation:
When a healthcare chatbot on Bedrock generates long summaries, it increases both latency (response time) and token cost. The parameter that directly controls the length of the generated output is max-tokens.
max-tokens: This parameter sets the maximum number of tokens (words/subwords) that the model can generate in its response. By reducing this value, you can limit the length of the summaries, which will:
temperature: Controls randomness in the output (higher = more creative/random, lower = more deterministic). This doesn't directly control response length.
top-p: Controls nucleus sampling for diversity in responses. This affects quality/variety but not length.
stop-sequences: Defines sequences that cause the model to stop generating. While this can indirectly limit length, max-tokens is the direct parameter for controlling response length.
For healthcare chatbots, you might want to set an appropriate max-tokens value based on: