
Ultimate access to all questions.
Describe the process of scaling decision trees in Spark, focusing on the role of data partitioning, parallel processing, and the use of Spark's MLlib for efficient training. Discuss how these elements contribute to the overall performance and scalability of decision tree models.
A
Decision trees are trained sequentially on a single node without any partitioning or parallel processing.
B
Data is partitioned across multiple nodes, and parallel processing is used to train decision trees efficiently, with Spark's MLlib providing optimized algorithms for feature selection and node splitting.
C
Spark only supports linear regression and does not scale decision trees.
D
Decision trees are trained on a centralized server without any partitioning or parallel processing.