
Answer-first summary for fast verification
Answer: max-tokens
## Explanation When a healthcare chatbot on Amazon Bedrock generates long summaries that increase latency and token costs, the **max-tokens** parameter should be adjusted. ### Why max-tokens is the correct answer: 1. **Token Limitation**: The `max-tokens` parameter controls the maximum number of tokens (words/subwords) that the model can generate in its response. 2. **Cost Control**: Each token generated incurs computational cost and API usage charges. By limiting the maximum tokens, you directly control the length of responses and associated costs. 3. **Latency Reduction**: Longer responses require more processing time, increasing latency. Setting an appropriate `max-tokens` limit ensures responses complete within acceptable timeframes. 4. **Healthcare Context**: In healthcare applications, concise, relevant summaries are often preferred over verbose responses, making token limitation particularly important. ### Why other options are incorrect: - **A. temperature**: Controls randomness/creativity of responses (higher = more random, lower = more deterministic). Doesn't directly control response length. - **B. top-p**: Controls nucleus sampling for response diversity. Affects quality/variety but not length. - **D. stop-sequences**: Defines sequences that stop generation, but this is reactive rather than proactive length control. ### Best Practice: Set `max-tokens` to an appropriate value for your use case (e.g., 500-1000 tokens for summaries) to balance response quality, cost, and latency.
Author: Ritesh Yadav
Ultimate access to all questions.
No comments yet.