
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A healthcare assistant built with Amazon Bedrock returns excessively long responses, increasing latency and token cost. Which parameter should be adjusted?
A
temperature
B
top-p
C
max-tokens
D
top-k
Explanation:
The correct answer is C. max-tokens.
Purpose of max-tokens: The max-tokens parameter controls the maximum number of tokens (words/subword units) that the model can generate in its response. This directly limits the length of the output.
Problem addressed: The question describes "excessively long responses" which increase both latency (response time) and token cost. By reducing the max-tokens value, you can cap the response length, preventing the model from generating unnecessarily verbose outputs.
Other parameters explained:
Practical application: In Amazon Bedrock, when working with foundation models, adjusting max_tokens (or similar parameter name depending on the specific model) is the standard way to control response length and manage costs associated with token usage.
When optimizing for cost and latency in production applications, it's recommended to set appropriate max-tokens limits based on your specific use case requirements, while also considering the trade-off between response completeness and resource consumption.