
Ultimate access to all questions.
In a machine learning dataset, you have identified missing values in a categorical feature 'Color' with possible values 'Red', 'Green', and 'Blue'. You have decided to impute these missing values. Explain the process of one-hot encoding the 'Color' feature after imputing the missing values with the mode value, and discuss the potential issues that may arise when using this approach for tree-based models.
A
One-hot encoding involves creating a new binary column for each unique value in the 'Color' feature, excluding the mode value.
B
After imputing missing values with the mode, one-hot encoding will create three new binary columns for 'Red', 'Green', and 'Blue', with one column having a value of 1 for each instance and the rest as 0s.
C
Tree-based models can efficiently handle one-hot encoded features without any issues, as they can split on these binary features directly.
D
One-hot encoding the 'Color' feature after imputing with the mode value will create three new binary columns, but this approach can be inefficient for tree-based models as it may lead to imbalanced splits and overfitting.