
Ultimate access to all questions.
In the context of Apache Spark, explain how decision trees are scaled and parallelized. Discuss the specific mechanisms Spark uses to distribute the training of decision trees, including the handling of feature selection, node splitting, and the aggregation of results across multiple nodes.
A
Spark trains decision trees sequentially on a single node without any parallelization.
B
Spark distributes the training of decision trees by partitioning the data and processing each partition independently, using parallelized feature selection and node splitting techniques, and then aggregating the results to form the final tree.
C
Spark only supports linear regression and does not scale decision trees.
D
Spark uses a centralized server to collect all data and then processes it using decision trees.