
Answer-first summary for fast verification
Answer: Apply one-hot encoding on the categorical variables in the test data
The correct approach is to apply one-hot encoding on the categorical variables in the test data. When you apply one-hot encoding to the training set, you create columns for each category. If a category is missing in the test set, you need to ensure that the encoded test data has the same column structure. By adding the missing categorical variable as a new column in the test set and initializing its values to 0, you maintain consistency between the training and test sets. This ensures that the model has the same feature space for both datasets, which is essential for accurate predictions.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are developing a machine learning model using a dataset that includes categorical input variables. To evaluate model performance, you have randomly split the dataset into equal training and test sets. After applying one-hot encoding to the categorical variables in the training set, you notice that one of the categorical variables present in the training set is missing from the test set. How should you address this discrepancy?
A
Use sparse representation in the test set.
B
Randomly redistribute the data, with 70% for the training set and 30% for the test set
C
Apply one-hot encoding on the categorical variables in the test data
D
Collect more data representing all categories
No comments yet.