
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A support chatbot occasionally generates off-topic sentences. The team wants to limit improbable token choices but still allow some creative variation. Which inference parameter should they tune?
A
Reduce top-p from 0.95 to 0.6
B
Increase temperature to 1.2
C
Increase top-k to 200
D
Remove repetition penalties
Explanation:
Correct Answer: A (Reduce top-p from 0.95 to 0.6)
Why this is correct:
Top-p (nucleus sampling) controls the cumulative probability threshold for token selection. When you reduce top-p from 0.95 to 0.6, you're narrowing the pool of tokens considered for generation to only those with the highest cumulative probability.
How it works:
At top-p = 0.95: The model considers tokens until their cumulative probability reaches 95%, which includes many less probable tokens
At top-p = 0.6: The model only considers tokens until their cumulative probability reaches 60%, excluding many improbable tokens
Effect on the problem:
Limits improbable token choices: By excluding tokens with lower probabilities, you reduce the chance of off-topic or irrelevant responses
Still allows creative variation: Unlike setting top-p to very low values (like 0.1), 0.6 still allows for some diversity in responses
Why other options are incorrect:
B. Increase temperature to 1.2: Temperature controls randomness - higher values (like 1.2) make outputs more random and creative, which would actually increase off-topic responses, not limit them.
C. Increase top-k to 200: Top-k limits the number of tokens considered to the top k most probable ones. Increasing top-k to 200 would consider MORE tokens, potentially including more improbable ones, which could increase off-topic responses.
D. Remove repetition penalties: Repetition penalties prevent the model from repeating the same phrases. Removing them would not address the issue of off-topic sentences and might make responses more repetitive instead.
Key takeaway: When you want to limit improbable outputs while maintaining some creativity, adjusting top-p (nucleus sampling) is the most appropriate parameter to tune.