Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.

You are working on a machine learning project where you have identified a categorical feature, Feature A, during exploratory data analysis. Feature A shows significant predictive power for your target variable but is found to have missing values in approximately 10% of the dataset. The dataset is large, and the missingness in Feature A is believed to be random. Given the importance of Feature A and the need to maintain the integrity of your model's predictions, which of the following approaches is the BEST course of action? Choose one correct option.

Real Exam

Impute the missing values in Feature A with the mode of the feature, assuming that the most common category is the best replacement for missing data.

8.3%

Replace the missing values in Feature A with values from the feature that has the highest Pearson correlation with Feature A, under the assumption that correlated features can provide reasonable substitutes.

Comments

Loading comments...

Introduce a new category within Feature A to represent missing values and create an additional binary feature indicating the presence or absence of a value in Feature A, to explicitly model the missingness.

83.3%