AWS Certified Cloud Practitioner

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A support chatbot occasionally generates off-topic sentences. The team wants to limit improbable token choices but still allow some creative variation. Which inference parameter should they tune?

Real Exam

Community

RRitesh

Last updated: December 3, 2025 at 18:27

Reduce top-p from 0.95 to 0.6

Increase temperature to 1.2

Increase top-k to 200

Remove repetition penalties

Explanation:

Explanation

Correct Answer: A (Reduce top-p from 0.95 to 0.6)

Why this is correct:

Top-p (nucleus sampling) controls the cumulative probability threshold for token selection. When you reduce top-p from 0.95 to 0.6, you're narrowing the pool of tokens considered for generation to only those with the highest cumulative probability.
How it works:
- At top-p = 0.95: The model considers tokens until their cumulative probability reaches 95%, which includes many less probable tokens
- At top-p = 0.6: The model only considers tokens until their cumulative probability reaches 60%, excluding many improbable tokens
Effect on the problem:
- Limits improbable token choices: By excluding tokens with lower probabilities, you reduce the chance of off-topic or irrelevant responses
- Still allows creative variation: Unlike setting top-p to very low values (like 0.1), 0.6 still allows for some diversity in responses

Why other options are incorrect:

B. Increase temperature to 1.2: Temperature controls randomness - higher values (like 1.2) make outputs more random and creative, which would actually increase off-topic responses, not limit them.
C. Increase top-k to 200: Top-k limits the number of tokens considered to the top k most probable ones. Increasing top-k to 200 would consider MORE tokens, potentially including more improbable ones, which could increase off-topic responses.
D. Remove repetition penalties: Repetition penalties prevent the model from repeating the same phrases. Removing them would not address the issue of off-topic sentences and might make responses more repetitive instead.

Key takeaway: When you want to limit improbable outputs while maintaining some creativity, adjusting top-p (nucleus sampling) is the most appropriate parameter to tune.

Powered ByGemini-3 Flash

Comments

Loading comments...