
Answer-first summary for fast verification
Answer: Adding indicator variables for missing values after imputation is important to capture the uncertainty or difference in the distribution of the missing values compared to the observed values.
Option B is correct. Adding indicator variables for missing values after imputation is important because it captures the uncertainty or difference in the distribution of the missing values compared to the observed values. This can be particularly useful in scenarios where the mechanism of missingness is related to the target variable or other features in the dataset. For example, if older individuals are more likely to have missing 'Age' values and they also have different outcomes compared to younger individuals, the indicator variable can help the model learn this relationship.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are working on a dataset with a numerical feature 'Age' that has some missing values. You have decided to impute these missing values using the mean value. Explain why it is important to add indicator variables for the missing values after imputation, and provide a scenario where this approach would be particularly useful.
A
Adding indicator variables for missing values after imputation is not necessary, as the missing values have already been handled.
B
Adding indicator variables for missing values after imputation is important to capture the uncertainty or difference in the distribution of the missing values compared to the observed values.
C
Adding indicator variables is only useful for tree-based models, as they can directly split on these binary features.
D
Adding indicator variables is only useful for linear models, as they cannot handle missing values directly.