
Answer-first summary for fast verification
Answer: Decrease temperature to 0.2
## Explanation When an LLM generates overly long and repetitive answers, it's often sampling from too broad a distribution of possible next tokens. The goal is to make the model more deterministic and focused on the most probable tokens. ### Analysis of Options: **A. Increase top-p to 1.0** - This would make the sampling more diverse, not less. Top-p (nucleus sampling) with p=1.0 means sampling from the entire vocabulary, which would increase randomness and potentially make outputs more verbose. **B. Decrease temperature to 0.2** - **CORRECT**. Temperature controls the randomness of predictions: - Lower temperature (closer to 0) makes the model more deterministic and confident in its predictions - The model will sample more from the highest probability tokens - This reduces randomness and makes outputs more focused and concise **C. Increase top-k to 200** - This would increase the number of tokens considered for sampling, making the output more diverse and potentially more verbose. **D. Use temperature = 1.0 and top-p = 0.95** - Temperature=1.0 is the default setting, and top-p=0.95 allows sampling from a broad distribution. This would maintain or increase randomness, not reduce it. ### Why Decreasing Temperature Works: - Temperature scaling modifies the probability distribution before sampling - Lower temperature sharpens the distribution, making high-probability tokens even more likely - This reduces the chance of sampling less likely tokens that might lead to repetition or verbosity - The model becomes more conservative and sticks to the most probable continuations For reducing verbosity and repetition while maintaining coherence, decreasing temperature is the most direct approach.
Author: Ritesh Yadav
Ultimate access to all questions.
An LLM is generating overly long and repetitive answers. The engineering team wants to force the model to sample from only the most likely next tokens. Which setting is most suitable?
A
Increase top-p to 1.0
B
Decrease temperature to 0.2
C
Increase top-k to 200
D
Use temperature = 1.0 and top-p = 0.95
No comments yet.