
Answer-first summary for fast verification
Answer: max-tokens
## Explanation The correct answer is **C. max-tokens**. ### Why max-tokens is the correct parameter: 1. **Purpose of max-tokens**: The `max-tokens` parameter controls the maximum number of tokens (words/subword units) that the model can generate in its response. This directly limits the length of the output. 2. **Problem addressed**: The question describes "excessively long responses" which increase both latency (response time) and token cost. By reducing the `max-tokens` value, you can cap the response length, preventing the model from generating unnecessarily verbose outputs. 3. **Other parameters explained**: - **A. temperature**: Controls randomness/creativity in responses (higher = more random, lower = more deterministic) - **B. top-p**: Nucleus sampling parameter that controls diversity by considering only tokens whose cumulative probability exceeds a threshold - **D. top-k**: Limits the model to consider only the top-k most probable tokens for each generation step 4. **Practical application**: In Amazon Bedrock, when working with foundation models, adjusting `max_tokens` (or similar parameter name depending on the specific model) is the standard way to control response length and manage costs associated with token usage. ### Best Practice Tip**: When optimizing for cost and latency in production applications, it's recommended to set appropriate `max-tokens` limits based on your specific use case requirements, while also considering the trade-off between response completeness and resource consumption.
Author: Jin H
Ultimate access to all questions.
No comments yet.