
Answer-first summary for fast verification
Answer: both the most sparse and the most frequent tokens in the dataset.
In textual analysis, "noisy features" typically refer to both: 1. **Most sparse tokens**: Words that appear very rarely across documents, which may not provide meaningful patterns and can be considered noise 2. **Most frequent tokens**: Very common words (like "the", "and", "is") that appear in almost all documents and don't help discriminate between different categories Both types of tokens can be considered noisy because they don't contribute significantly to distinguishing between different classes or categories in the data. Feature selection techniques often aim to remove these noisy features to improve model performance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.