
Answer-first summary for fast verification
Answer: Reduce top-p from 0.95 to 0.6
## Explanation **Correct Answer: A (Reduce top-p from 0.95 to 0.6)** **Why this is correct:** 1. **Top-p (nucleus sampling)** controls the cumulative probability threshold for token selection. When you reduce top-p from 0.95 to 0.6, you're narrowing the pool of tokens considered for generation to only those with the highest cumulative probability. 2. **How it works:** - At top-p = 0.95: The model considers tokens until their cumulative probability reaches 95%, which includes many less probable tokens - At top-p = 0.6: The model only considers tokens until their cumulative probability reaches 60%, excluding many improbable tokens 3. **Effect on the problem:** - **Limits improbable token choices:** By excluding tokens with lower probabilities, you reduce the chance of off-topic or irrelevant responses - **Still allows creative variation:** Unlike setting top-p to very low values (like 0.1), 0.6 still allows for some diversity in responses **Why other options are incorrect:** - **B. Increase temperature to 1.2:** Temperature controls randomness - higher values (like 1.2) make outputs more random and creative, which would actually increase off-topic responses, not limit them. - **C. Increase top-k to 200:** Top-k limits the number of tokens considered to the top k most probable ones. Increasing top-k to 200 would consider MORE tokens, potentially including more improbable ones, which could increase off-topic responses. - **D. Remove repetition penalties:** Repetition penalties prevent the model from repeating the same phrases. Removing them would not address the issue of off-topic sentences and might make responses more repetitive instead. **Key takeaway:** When you want to limit improbable outputs while maintaining some creativity, adjusting top-p (nucleus sampling) is the most appropriate parameter to tune.
Author: Ritesh Yadav
Ultimate access to all questions.
A support chatbot occasionally generates off-topic sentences. The team wants to limit improbable token choices but still allow some creative variation. Which inference parameter should they tune?
A
Reduce top-p from 0.95 to 0.6
B
Increase temperature to 1.2
C
Increase top-k to 200
D
Remove repetition penalties
No comments yet.