
Ultimate access to all questions.
In the context of preparing a dataset for a machine learning model, you encounter null values in a crucial categorical feature during exploratory data analysis. This issue could potentially introduce bias into your model. Considering the constraints of maintaining data integrity, minimizing bias, and ensuring the model's performance is not adversely affected, what is the optimal strategy to handle these missing values effectively? Choose the best option.
A
Replace the missing values with the mean of the feature, assuming the categorical data can be numerically encoded.
B
Introduce a special category (e.g., 'Missing' or 'Unknown') to denote missing values, preserving the categorical nature of the feature.
C
Remove the rows containing missing values entirely, and then artificially increase your dataset size by 5% to compensate for the loss.
D
Transfer the rows with missing values to your validation dataset, ensuring they do not influence the training phase.