AWS Certified Cloud Practitioner

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

An LLM is generating overly long and repetitive answers. The engineering team wants to force the model to sample from only the most likely next tokens. Which setting is most suitable?

Real Exam

Community

RRitesh

Last updated: December 3, 2025 at 18:27

Increase top-p to 1.0

Decrease temperature to 0.2

Increase top-k to 200

Use temperature = 1.0 and top-p = 0.95

Explanation:

Explanation

When an LLM generates overly long and repetitive answers, it's often sampling from too broad a distribution of possible next tokens. The goal is to make the model more deterministic and focused on the most probable tokens.

Analysis of Options:

A. Increase top-p to 1.0 - This would make the sampling more diverse, not less. Top-p (nucleus sampling) with p=1.0 means sampling from the entire vocabulary, which would increase randomness and potentially make outputs more verbose.

B. Decrease temperature to 0.2 - CORRECT. Temperature controls the randomness of predictions:

Lower temperature (closer to 0) makes the model more deterministic and confident in its predictions
The model will sample more from the highest probability tokens
This reduces randomness and makes outputs more focused and concise

C. Increase top-k to 200 - This would increase the number of tokens considered for sampling, making the output more diverse and potentially more verbose.

D. Use temperature = 1.0 and top-p = 0.95 - Temperature=1.0 is the default setting, and top-p=0.95 allows sampling from a broad distribution. This would maintain or increase randomness, not reduce it.

Why Decreasing Temperature Works:

Temperature scaling modifies the probability distribution before sampling
Lower temperature sharpens the distribution, making high-probability tokens even more likely
This reduces the chance of sampling less likely tokens that might lead to repetition or verbosity
The model becomes more conservative and sticks to the most probable continuations

For reducing verbosity and repetition while maintaining coherence, decreasing temperature is the most direct approach.

Powered ByGemini-3 Flash

Comments

Loading comments...