
Answer-first summary for fast verification
Answer: Replace the missing values with a placeholder category indicating a missing value.
Replacing the missing values with a placeholder category indicating a missing value is the best approach. This method explicitly acknowledges the missing data and avoids introducing assumptions during model training. It ensures that the model treats missing values as a separate category, maintaining the integrity of the dataset and minimizing bias. Removing rows with missing values (option A) could introduce bias if the missingness is not random, replacing with the mean (option B) is not suitable for categorical features, and moving rows with missing values to the validation set (option D) contaminates the validation data and hinders its ability to assess model performance on unseen data.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are conducting exploratory data analysis on a dataset and encounter an important categorical feature that has 5% missing values. To ensure the integrity of your analysis and to minimize any potential bias from these missing values, which of the following approaches would be the best way to handle these missing values?
A
Remove the rows with missing values, and upsample your dataset by 5%.
B
Replace the missing values with the feature’s mean.
C
Replace the missing values with a placeholder category indicating a missing value.
D
Move the rows with missing values to your validation dataset.
No comments yet.