
Answer-first summary for fast verification
Answer: Spark ML requires at least one bin for each category in each categorical feature
In Spark ML, the `maxBins` parameter is crucial for handling categorical features during the training of decision trees or tree-based ensemble models. Each unique value in a categorical feature is assigned to a separate bin to evaluate different splitting criteria. The requirement ensures that all categories can be represented in the model's splits, allowing the algorithm to fully leverage the information within categorical features for accurate predictions. Setting `maxBins` too low could lead to poorer performance by not accounting for all categories, highlighting the parameter's importance in Spark ML's decision tree and ensemble model training.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
When translating a decision tree from sklearn to Spark ML, an error occurs stating that the maxBins parameter should be at least equal to the number of values in each categorical feature. Why does Spark ML have this requirement? Choose only ONE best answer.
A
Spark ML tests only numeric features in the splitting algorithm
B
Spark ML requires more split candidates in the splitting algorithm than single-node implementations
C
Spark ML requires at least one bin for each category in each categorical feature
D
Spark ML tests only categorical features in the splitting algorithm
No comments yet.