
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
Developers want the model to reduce repetition, especially repeating phrases or words. Which inference setting helps prevent this?
A
Increase temperature
B
Enable repetition penalty (e.g., 1.2)
C
Reduce top-p
D
Reduce top-k
Explanation:
Repetition penalty is a specific inference setting designed to reduce repetitive outputs from language models. Here's why option B is correct:
Repetition penalty is a parameter that penalizes tokens that have already appeared in the generated text
A value greater than 1.0 (like 1.2) discourages repetition
A value less than 1.0 encourages repetition
The penalty is applied to the probability of tokens that have already been generated
A. Increase temperature - Temperature controls randomness/creativity:
Higher temperature (e.g., 0.8-1.0) increases randomness and diversity
Lower temperature (e.g., 0.1-0.3) makes outputs more deterministic and focused
While higher temperature can sometimes reduce repetition, it's not specifically designed for this purpose and can make outputs less coherent
C. Reduce top-p - Top-p (nucleus sampling):
Controls the cumulative probability threshold for token selection
Lower top-p values make outputs more focused but don't specifically target repetition
This affects which tokens are considered, not whether they've been used before
D. Reduce top-k - Top-k sampling:
Limits the number of tokens considered to the top k most likely tokens
Lower top-k values restrict diversity but don't specifically prevent repetition
This affects token selection pool size, not repetition patterns
When you notice a model repeating phrases like "the the the" or "I think I think"
Set repetition penalty to values like 1.1-1.3 to moderately discourage repetition
Values like 1.5+ can strongly discourage repetition but may affect coherence
This is particularly useful for long-form generation, creative writing, or dialogue systems where natural flow is important
For reducing repetition while maintaining coherence, use a combination of:
Repetition penalty (1.1-1.2) to specifically target repeated tokens
Moderate temperature (0.7-0.9) for balanced creativity
Appropriate top-p (0.9-0.95) for focused yet diverse outputs