
Explanation:
Option B is false because:
Mean vs. Median for Missing Data: When replacing missing observations, the median is generally preferred over the mean because the median is more robust to outliers. The mean can be heavily influenced by extreme values, which can distort the imputation process.
Data Cleaning Best Practices:
Statistical Considerations: In data cleaning, using the median for missing value imputation is a common practice, especially when dealing with skewed distributions or potential outliers, making the mean an inappropriate choice in many scenarios.
Ultimate access to all questions.
No comments yet.
In terms of the reasons for data cleaning, Which of the following is false?
A
Observations on a feature that are several standard deviations from the mean should be checked carefully, as they can have a big effect on results.
B
We can use the mean rather than median of the observations to replace missing observations.
C
Observations not relevant to the task at hand should be removed.
D
For data to be read correctly, it is important that all data is recorded in the same way.