Explanation
Repetition penalty is a specific inference setting designed to reduce repetitive outputs from language models. Here's why option B is correct:
What is Repetition Penalty?
- Repetition penalty is a parameter that penalizes tokens that have already appeared in the generated text
- A value greater than 1.0 (like 1.2) discourages repetition
- A value less than 1.0 encourages repetition
- The penalty is applied to the probability of tokens that have already been generated
Why Other Options Are Incorrect:
A. Increase temperature - Temperature controls randomness/creativity:
- Higher temperature (e.g., 0.8-1.0) increases randomness and diversity
- Lower temperature (e.g., 0.1-0.3) makes outputs more deterministic and focused
- While higher temperature can sometimes reduce repetition, it's not specifically designed for this purpose and can make outputs less coherent
C. Reduce top-p - Top-p (nucleus sampling):
- Controls the cumulative probability threshold for token selection
- Lower top-p values make outputs more focused but don't specifically target repetition
- This affects which tokens are considered, not whether they've been used before
D. Reduce top-k - Top-k sampling:
- Limits the number of tokens considered to the top k most likely tokens
- Lower top-k values restrict diversity but don't specifically prevent repetition
- This affects token selection pool size, not repetition patterns
Practical Application:
- When you notice a model repeating phrases like "the the the" or "I think I think"
- Set repetition penalty to values like 1.1-1.3 to moderately discourage repetition
- Values like 1.5+ can strongly discourage repetition but may affect coherence
- This is particularly useful for long-form generation, creative writing, or dialogue systems where natural flow is important
Best Practice:
For reducing repetition while maintaining coherence, use a combination of:
- Repetition penalty (1.1-1.2) to specifically target repeated tokens
- Moderate temperature (0.7-0.9) for balanced creativity
- Appropriate top-p (0.9-0.95) for focused yet diverse outputs