
Answer-first summary for fast verification
Answer: Set top-k to 40
**Explanation:** **Top-k sampling** is a technique that restricts the sampling to only the top k most likely tokens at each step of text generation. When you set top-k to 40, the model will only consider the 40 most probable tokens for sampling, effectively filtering out long-tail or irrelevant words that have lower probabilities. **Top-p (nucleus sampling)** works differently - it selects from the smallest set of tokens whose cumulative probability exceeds a threshold p (e.g., 0.2). This doesn't guarantee a specific number of tokens and could potentially include more or fewer than 40 tokens depending on the probability distribution. Therefore, to specifically restrict sampling to the top 40 most likely tokens, the correct approach is to set top-k to 40.
Author: Ritesh Yadav
Ultimate access to all questions.
No comments yet.