In a machine learning project where you're dealing with both numerical and categorical features, which Spark ML algorithm is most effective for handling this mix? | Databricks Certified Machine Learning - Associate Quiz - LeetQuiz
Databricks Certified Machine Learning - Associate
Explanation:
The Spark ML algorithm that can handle a mix of feature types effectively is Decision Trees. Here's why:
Linear Regression: This algorithm is limited to numerical features only.
Gradient Boosting: Although it can process categorical features, it necessitates one-hot encoding, which can expand the feature space and degrade performance.
Random Forest: Like Gradient Boosting, it can manage categorical features but faces similar challenges with one-hot encoding.
Decision Trees: This algorithm is capable of natively processing both numerical and categorical features without the need for extra transformations. It employs information gain to divide data based on features, irrespective of their type. Thus, Decision Trees stand out as the optimal choice for projects involving a combination of numerical and categorical features.
Get started today
Ultimate access to all questions.
In a machine learning project where you're dealing with both numerical and categorical features, which Spark ML algorithm is most effective for handling this mix?