
Answer-first summary for fast verification
Answer: One-hot encoding involves creating a new binary column for each unique value in the 'Marital Status' feature, excluding one value to avoid multicollinearity. This approach is inefficient for tree-based models because it can lead to a large number of binary columns, increasing the dimensionality of the dataset.
Option B is correct. One-hot encoding involves creating a new binary column for each unique value in the 'Marital Status' feature, excluding one value to avoid multicollinearity. This results in three binary columns for 'Single', 'Married', and 'Divorced', with one column having a value of 1 for each instance and the rest as 0s. The fourth category, 'Widowed', is not needed as it can be inferred when all other columns are 0. This approach can be inefficient for tree-based models because it can lead to a large number of binary columns, increasing the dimensionality of the dataset. Tree-based models can handle categorical features directly without one-hot encoding, and using one-hot encoding may not provide any additional benefits.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In a machine learning dataset, you have a categorical feature 'Marital Status' with possible values 'Single', 'Married', 'Divorced', and 'Widowed'. You have decided to use one-hot encoding to transform this feature. Explain the process of one-hot encoding and discuss why this approach can be inefficient for tree-based models.
A
One-hot encoding involves creating a new binary column for each unique value in the 'Marital Status' feature, resulting in four columns. This approach is inefficient for tree-based models because they cannot handle categorical features directly.
B
One-hot encoding involves creating a new binary column for each unique value in the 'Marital Status' feature, excluding one value to avoid multicollinearity. This approach is inefficient for tree-based models because it can lead to a large number of binary columns, increasing the dimensionality of the dataset.
C
One-hot encoding involves creating a new binary column for each unique value in the 'Marital Status' feature, resulting in three columns. This approach is inefficient for tree-based models because they can handle categorical features directly without the need for one-hot encoding.
D
One-hot encoding involves creating a new ordinal column for each unique value in the 'Marital Status' feature, assigning numerical values based on the order of the categories. This approach is inefficient for tree-based models because it introduces an arbitrary order to the categories.