AWS Certified Cloud Practitioner

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A healthcare chatbot on Bedrock generates long summaries, increasing latency and token cost. Which parameter should you adjust?

Real Exam

Community

RRitesh

temperature

top-p

max-tokens

stop-sequences

Explanation:

Explanation

The correct answer is C. max-tokens.

Why max-tokens is the correct parameter:

max-tokens controls the maximum number of tokens (words/sub-words) that the model can generate in its response
When summaries are too long, it directly increases:
- Latency: Longer responses take more time to generate
- Token cost: More tokens = higher API costs

How adjusting max-tokens helps:

Setting a lower max-tokens limit forces the model to generate more concise summaries
This reduces both generation time and cost
The model learns to be more efficient within the token constraint

Why other options are incorrect:

A. temperature: Controls randomness/creativity of responses, not length
B. top-p: Controls vocabulary diversity through nucleus sampling, affects quality not length
D. stop-sequences: Used to define stopping points for generation, but doesn't directly control overall response length

Best Practice:

For healthcare chatbots where concise, accurate information is crucial, setting an appropriate max-tokens limit ensures responses remain focused and cost-effective while maintaining quality.

Powered ByGemini-3 Flash

Comments

Loading comments...