
Explanation:
Option B is false because when replacing missing observations, using the median is generally preferred over the mean, especially when dealing with skewed distributions or outliers. The mean is sensitive to extreme values, while the median is more robust and provides a better central tendency measure for imputation.
Why the other options are correct:
Ultimate access to all questions.
In terms of the reasons for data cleaning, Which of the following is false?
A
Observations on a feature that are several standard deviations from the mean should be checked carefully, as they can have a big effect on results.
B
We can use the mean rather than median of the observations to replace missing observations.
C
Observations not relevant to the task at hand should be removed.
D
For data to be read correctly, it is important that all data is recorded in the same way.
No comments yet.