Databricks Certified Machine Learning - Associate

Get started today

Ultimate access to all questions.

In a machine learning dataset, you have identified missing values in a categorical feature 'Color' with possible values 'Red', 'Green', and 'Blue'. You have decided to impute these missing values. Explain the process of one-hot encoding the 'Color' feature after imputing the missing values with the mode value, and discuss the potential issues that may arise when using this approach for tree-based models.

Simulated

Last updated: December 22, 2025 at 14:04

One-hot encoding involves creating a new binary column for each unique value in the 'Color' feature, excluding the mode value.

6.7%

After imputing missing values with the mode, one-hot encoding will create three new binary columns for 'Red', 'Green', and 'Blue', with one column having a value of 1 for each instance and the rest as 0s.

Comments

Loading comments...

One-hot encoding the 'Color' feature after imputing with the mode value will create three new binary columns, but this approach can be inefficient for tree-based models as it may lead to imbalanced splits and overfitting.

64.0%