Databricks Certified Machine Learning - Associate

Get started today

Ultimate access to all questions.

Explanation:

Mean imputation involves replacing missing values in a dataset with the mean value of the observed data for that variable. This method is most appropriate when the data is missing completely at random (MCAR). MCAR means that the likelihood of data being missing is the same across all observations; in other words, there is no systematic relationship between the propensity of missing values and any of the data, observed or unobserved.

When the data is missing at random (MAR), the probability of data being missing is related to the observed data but not the missing data. Mean imputation can introduce bias in this scenario because the missingness is related to what we observe.

When the data is missing not at random (MNAR), there is a relationship between the propensity of a value to be missing and its value. In this case, mean imputation is particularly problematic because it can significantly bias the analysis, given that the missingness itself carries information about the underlying distribution.

When the data is missing systematically, it implies a pattern or reason behind the missingness that could be related to the missing values themselves or other variables in the dataset. Using mean imputation here could also lead to biased estimates since the method does not account for the systematic nature of the missing data.

Therefore, mean imputation is most justifiable and least likely to introduce bias when the assumption of missing completely at random holds, making option C the correct answer.

Explanation:

Therefore, mean imputation is most justifiable and least likely to introduce bias when the assumption of missing completely at random holds, making option C the correct answer.

Comments (0)

No comments yet.

When is mean imputation most appropriate for handling missing values? Choose only ONE best answer.

Real Exam

When the data is missing not at random

3.1%

When the data is missing systematically

21.9%

When the data is missing completely at random

59.4%

When the data is missing at random

15.6%