AWS Certified Cloud Practitioner

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A healthcare assistant built with Amazon Bedrock returns excessively long responses, increasing latency and token cost. Which parameter should be adjusted?

Real Exam

Community

RRitesh

Last updated: December 3, 2025 at 18:26

temperature

top-p

max-tokens

top-k

Explanation:

Explanation

The correct answer is C. max-tokens.

Why max-tokens is the correct parameter:

Purpose of max-tokens: The max-tokens parameter controls the maximum number of tokens (words or word pieces) that the model can generate in its response. This directly limits the length of the output.
Problem statement: The question describes "excessively long responses" which are causing:
- Increased latency (longer response times)
- Higher token costs (more tokens = higher cost)
How adjusting max-tokens helps:
- By reducing the max-tokens value, you limit how long the response can be
- Shorter responses mean:
  - Faster generation (reduced latency)
  - Fewer tokens used (lower cost)

Why the other options are incorrect:

A. temperature: Controls randomness/creativity of responses (higher = more random, lower = more deterministic). Doesn't control response length.
B. top-p: Controls nucleus sampling - determines the cumulative probability threshold for token selection. Affects response quality/coherence, not length.
D. top-k: Controls the number of highest-probability tokens to consider for sampling. Affects response diversity, not length.

Best Practice Tip: When optimizing for cost and latency in production applications, setting appropriate max-tokens limits is crucial. You should determine the optimal balance between response completeness and cost/latency constraints for your specific use case.

Powered ByGemini-3 Flash

Comments

Loading comments...