
Answer-first summary for fast verification
Answer: We can use the mean rather than median of the observations to replace missing observations.
## Explanation Option B is false because: - **Mean vs. Median for Missing Data**: When replacing missing observations, the **median** is generally preferred over the **mean** because the median is more robust to outliers. The mean can be heavily influenced by extreme values, which can distort the imputation process. - **Data Cleaning Best Practices**: - Option A is correct - outliers (observations several standard deviations from the mean) should be carefully checked as they can significantly impact statistical results. - Option C is correct - irrelevant observations should be removed to improve model performance and reduce noise. - Option D is correct - consistent data formatting is essential for proper data processing and analysis. - **Statistical Considerations**: In data cleaning, using the median for missing value imputation is a common practice, especially when dealing with skewed distributions or potential outliers, making the mean an inappropriate choice in many scenarios.
Author: LeetQuiz .
Ultimate access to all questions.
In terms of the reasons for data cleaning, Which of the following is false?
A
Observations on a feature that are several standard deviations from the mean should be checked carefully, as they can have a big effect on results.
B
We can use the mean rather than median of the observations to replace missing observations.
C
Observations not relevant to the task at hand should be removed.
D
For data to be read correctly, it is important that all data is recorded in the same way.
No comments yet.