AWS Certified AI Practitioner

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

Q6 – A healthcare chatbot on Bedrock generates long summaries, increasing latency and token cost. Which parameter should you adjust?

Real Exam

Community

JJin

Last updated: February 18, 2026 at 14:05

temperature

top-p

max-tokens

stop-sequences

Explanation:

Explanation

When a healthcare chatbot on Bedrock generates long summaries, it increases both latency (response time) and token cost. The parameter that directly controls the length of the generated output is max-tokens.

Parameter Analysis:

max-tokens: This parameter sets the maximum number of tokens (words/subwords) that the model can generate in its response. By reducing this value, you can limit the length of the summaries, which will:
- Reduce latency (shorter responses are generated faster)
- Lower token costs (fewer tokens used means lower cost)
temperature: Controls randomness in the output (higher = more creative/random, lower = more deterministic). This doesn't directly control response length.
top-p: Controls nucleus sampling for diversity in responses. This affects quality/variety but not length.
stop-sequences: Defines sequences that cause the model to stop generating. While this can indirectly limit length, max-tokens is the direct parameter for controlling response length.

Why C is Correct:

The problem specifically mentions "long summaries" causing increased latency and token cost
max-tokens is the parameter that directly limits the maximum length of generated text
Adjusting max-tokens to a lower value will create shorter summaries, reducing both latency and cost

Best Practice:

For healthcare chatbots, you might want to set an appropriate max-tokens value based on:

The typical length needed for summaries
Cost constraints
User experience requirements (response time expectations)

Powered ByGPT-5.2

Comments

Loading comments...