
Answer-first summary for fast verification
Answer: Generate a binary feature variable for each feature with missing values, indicating whether a row's value was imputed.
**Correct Answer: B** Creating binary indicators for imputed values is a strategic approach that preserves information about the original missingness in the dataset. These indicators, known as 'imputation flags,' allow the machine learning model to potentially discern patterns related to the absence of data. **Why Not Others?** - **A**: Relying on the algorithm to handle missing values is not advisable unless it's specifically designed for such scenarios. - **C**: Removing features with missing values can result in a significant loss of valuable information. - **D**: Switching to mean imputation does not solve the issue of losing information about which values were imputed. - **E**: Indicating the percentage of missing data at the feature level lacks specificity about which rows were imputed. In essence, binary imputation indicators offer a balanced solution by effectively managing missing values while retaining critical information about the dataset's original state.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A Data Scientist is utilizing a feature store and intends to replace missing values in one of the feature tables with the median value of each respective feature. A colleague argues that this approach discards valuable information. What alternative method can the Data Scientist employ to maximize the inclusion of information in the feature set? Choose the SINGLE best answer.
A
Refrain from imputing the missing values, allowing the machine learning algorithm to decide how to handle them.
B
Generate a binary feature variable for each feature with missing values, indicating whether a row's value was imputed.
C
Eliminate all feature variables that initially contained missing values from the feature set.
D
Impute the missing values using the mean value of each respective feature variable instead of the median.
E
Create a constant feature variable for each feature with missing values, showing the percentage of rows originally missing from the feature.