
Ultimate access to all questions.
Answer-first summary for fast verification
Answer: Normalize numerical features and use one-hot encoding for categorical features.
Normalizing numerical features ensures that they are on a similar scale, which is important for many machine learning algorithms. One-hot encoding is suitable for categorical features as it avoids introducing ordinal relationships between categories. Ignoring categorical features or using label encoding for both types of features can lead to biased models or incorrect assumptions about the data.
Author: LeetQuiz Editorial Team
No comments yet.
Given a dataset with a mix of numerical and categorical features, describe how you would handle feature engineering within a Spark ML pipeline. Discuss the techniques you would use for transforming numerical features and encoding categorical features, and explain why these techniques are suitable. Additionally, outline any potential challenges in implementing these techniques and how you would mitigate them.
A
Normalize numerical features and use one-hot encoding for categorical features.
B
Ignore categorical features and use only numerical features.
C
Use label encoding for both numerical and categorical features.
D
Drop categorical features and normalize numerical features.