
Answer-first summary for fast verification
Answer: Decrease top-p to 0.6
## Explanation **Correct Answer: B - Decrease top-p to 0.6** **Why this is correct:** 1. **Top-p (Nucleus Sampling)** controls the cumulative probability threshold for token selection. When you decrease top-p (e.g., to 0.6), you restrict the model to consider only the most probable tokens that collectively make up 60% of the probability mass. 2. **How it affects conciseness:** - Lower top-p values make the model more deterministic and focused on high-probability tokens - This reduces the chance of the model wandering off into less relevant, verbose responses - The model becomes more conservative in its word choices, leading to more concise outputs **Why other options are incorrect:** - **A. Increase temperature to 1.2**: Higher temperature (above 1.0) increases randomness and creativity, which would likely make responses even more unpredictable and potentially more verbose. - **C. Increase top-k to 200**: Higher top-k values allow the model to consider more token candidates (200 in this case), which increases diversity and could lead to longer, more varied responses. - **D. Disable repetition penalty**: Repetition penalty helps prevent the model from repeating phrases. Disabling it might lead to repetitive or redundant content, not necessarily more concise responses. **Key Takeaway:** For more concise outputs, you want to make the model more focused and deterministic, which is achieved by decreasing top-p to restrict the token selection to higher probability options.
Author: Ritesh Yadav
Ultimate access to all questions.
A chatbot frequently generates long, rambling answers even when short responses are expected. Which parameter adjustment should help produce more concise outputs?
A
Increase temperature to 1.2
B
Decrease top-p to 0.6
C
Increase top-k to 200
D
Disable repetition penalty
No comments yet.