
Explanation:
The correct answer is C. max-tokens.
Purpose of max-tokens: The max-tokens parameter controls the maximum number of tokens (words/subword units) that the model can generate in its response. This directly limits the length of the output.
Problem addressed: The question describes "excessively long responses" which increase both latency (response time) and token cost. By reducing the max-tokens value, you can cap the response length, preventing the model from generating unnecessarily verbose outputs.
Other parameters explained:
Practical application: In Amazon Bedrock, when working with foundation models, adjusting max_tokens (or similar parameter name depending on the specific model) is the standard way to control response length and manage costs associated with token usage.
When optimizing for cost and latency in production applications, it's recommended to set appropriate max-tokens limits based on your specific use case requirements, while also considering the trade-off between response completeness and resource consumption.
Ultimate access to all questions.
No comments yet.