Databricks Certified Machine Learning - Associate

Get started today

Ultimate access to all questions.

In a machine learning dataset, you have a categorical feature 'Marital Status' with possible values 'Single', 'Married', 'Divorced', and 'Widowed'. You have decided to use one-hot encoding to transform this feature. Explain the process of one-hot encoding and discuss why this approach can be inefficient for tree-based models.

Simulated

One-hot encoding involves creating a new binary column for each unique value in the 'Marital Status' feature, resulting in four columns. This approach is inefficient for tree-based models because they cannot handle categorical features directly.

15.8%

One-hot encoding involves creating a new binary column for each unique value in the 'Marital Status' feature, excluding one value to avoid multicollinearity. This approach is inefficient for tree-based models because it can lead to a large number of binary columns, increasing the dimensionality of the dataset.

Comments

Loading comments...

One-hot encoding involves creating a new binary column for each unique value in the 'Marital Status' feature, resulting in three columns. This approach is inefficient for tree-based models because they can handle categorical features directly without the need for one-hot encoding.

18.3%

One-hot encoding involves creating a new ordinal column for each unique value in the 'Marital Status' feature, assigning numerical values based on the order of the categories. This approach is inefficient for tree-based models because it introduces an arbitrary order to the categories.

13.3%