
Answer-first summary for fast verification
Answer: Enable repetition penalty (e.g., 1.2)
## Explanation **Repetition penalty** is a specific inference setting designed to reduce repetitive outputs from language models. Here's why option B is correct: ### What is Repetition Penalty? - Repetition penalty is a parameter that penalizes tokens that have already appeared in the generated text - A value greater than 1.0 (like 1.2) discourages repetition - A value less than 1.0 encourages repetition - The penalty is applied to the probability of tokens that have already been generated ### Why Other Options Are Incorrect: **A. Increase temperature** - Temperature controls randomness/creativity: - Higher temperature (e.g., 0.8-1.0) increases randomness and diversity - Lower temperature (e.g., 0.1-0.3) makes outputs more deterministic and focused - While higher temperature can sometimes reduce repetition, it's not specifically designed for this purpose and can make outputs less coherent **C. Reduce top-p** - Top-p (nucleus sampling): - Controls the cumulative probability threshold for token selection - Lower top-p values make outputs more focused but don't specifically target repetition - This affects which tokens are considered, not whether they've been used before **D. Reduce top-k** - Top-k sampling: - Limits the number of tokens considered to the top k most likely tokens - Lower top-k values restrict diversity but don't specifically prevent repetition - This affects token selection pool size, not repetition patterns ### Practical Application: - When you notice a model repeating phrases like "the the the" or "I think I think" - Set repetition penalty to values like 1.1-1.3 to moderately discourage repetition - Values like 1.5+ can strongly discourage repetition but may affect coherence - This is particularly useful for long-form generation, creative writing, or dialogue systems where natural flow is important ### Best Practice: For reducing repetition while maintaining coherence, use a combination of: 1. **Repetition penalty** (1.1-1.2) to specifically target repeated tokens 2. **Moderate temperature** (0.7-0.9) for balanced creativity 3. **Appropriate top-p** (0.9-0.95) for focused yet diverse outputs
Author: Ritesh Yadav
Ultimate access to all questions.
No comments yet.
Developers want the model to reduce repetition, especially repeating phrases or words. Which inference setting helps prevent this?
A
Increase temperature
B
Enable repetition penalty (e.g., 1.2)
C
Reduce top-p
D
Reduce top-k