AWS Certified Cloud Practitioner

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A healthcare assistant built with Amazon Bedrock returns excessively long responses, increasing latency and token cost. Which parameter should be adjusted?

Real Exam

Community

RRitesh

temperature

top-p

max-tokens

top-k

Explanation:

The max-tokens parameter controls the maximum number of tokens (words/characters) that the model can generate in its response. When responses are excessively long, increasing latency and token costs, adjusting max-tokens to a lower value will limit the response length, reducing both latency and costs.

Explanation of other options:

A. temperature: Controls randomness/creativity of responses (lower = more deterministic, higher = more creative)
B. top-p: Controls vocabulary diversity through nucleus sampling
D. top-k: Controls vocabulary diversity by limiting to top k tokens

Only max-tokens directly addresses the issue of response length, latency, and token cost.

Powered ByGemini-3 Flash

Comments

Loading comments...