
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
An LLM is generating overly long and repetitive answers. The engineering team wants to force the model to sample from only the most likely next tokens. Which setting is most suitable?
A
Increase top-p to 1.0
B
Decrease temperature to 0.2
C
Increase top-k to 200
D
Use temperature = 1.0 and top-p = 0.95
Explanation:
When an LLM generates overly long and repetitive answers, it's often sampling from too broad a distribution of possible next tokens. The goal is to make the model more deterministic and focused on the most probable tokens.
A. Increase top-p to 1.0 - This would make the sampling more diverse, not less. Top-p (nucleus sampling) with p=1.0 means sampling from the entire vocabulary, which would increase randomness and potentially make outputs more verbose.
B. Decrease temperature to 0.2 - CORRECT. Temperature controls the randomness of predictions:
C. Increase top-k to 200 - This would increase the number of tokens considered for sampling, making the output more diverse and potentially more verbose.
D. Use temperature = 1.0 and top-p = 0.95 - Temperature=1.0 is the default setting, and top-p=0.95 allows sampling from a broad distribution. This would maintain or increase randomness, not reduce it.
For reducing verbosity and repetition while maintaining coherence, decreasing temperature is the most direct approach.