
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A healthcare assistant built with Amazon Bedrock returns excessively long responses, increasing latency and token cost. Which parameter should be adjusted?
A
temperature
B
top-p
C
max-tokens
D
top-k
Explanation:
The correct answer is C. max-tokens.
Why max-tokens is the correct parameter:
Purpose of max-tokens: The max-tokens parameter controls the maximum number of tokens (words or word pieces) that the model can generate in its response. This directly limits the length of the output.
Problem statement: The question describes "excessively long responses" which are causing:
How adjusting max-tokens helps:
max-tokens value, you limit how long the response can beWhy the other options are incorrect:
Best Practice Tip: When optimizing for cost and latency in production applications, setting appropriate max-tokens limits is crucial. You should determine the optimal balance between response completeness and cost/latency constraints for your specific use case.