AWS Certified Cloud Practitioner

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

Q6 – A healthcare chatbot on Bedrock generates long summaries, increasing latency and token cost. Which parameter should you adjust?

Real Exam

Community

RRitesh

Last updated: December 3, 2025 at 18:26

temperature

top-p

max-tokens

stop-sequences

Explanation:

Explanation

When a healthcare chatbot on Amazon Bedrock generates long summaries that increase latency and token costs, the max-tokens parameter should be adjusted.

Why max-tokens is the correct answer:

Token Limitation: The max-tokens parameter controls the maximum number of tokens (words/subwords) that the model can generate in its response.
Cost Control: Each token generated incurs computational cost and API usage charges. By limiting the maximum tokens, you directly control the length of responses and associated costs.
Latency Reduction: Longer responses require more processing time, increasing latency. Setting an appropriate max-tokens limit ensures responses complete within acceptable timeframes.
Healthcare Context: In healthcare applications, concise, relevant summaries are often preferred over verbose responses, making token limitation particularly important.

Why other options are incorrect:

A. temperature: Controls randomness/creativity of responses (higher = more random, lower = more deterministic). Doesn't directly control response length.
B. top-p: Controls nucleus sampling for response diversity. Affects quality/variety but not length.
D. stop-sequences: Defines sequences that stop generation, but this is reactive rather than proactive length control.

Best Practice:

Set max-tokens to an appropriate value for your use case (e.g., 500-1000 tokens for summaries) to balance response quality, cost, and latency.

Powered ByGemini-3 Flash

Comments

Loading comments...