
Ultimate access to all questions.
Explain how Spark's machine learning library (MLlib) handles the distribution and parallelization of linear regression models. Include details on how data is partitioned, how computations are distributed across nodes, and how the results are aggregated to form the final model.
A
MLlib processes all data on a single node and performs linear regression sequentially.
B
MLlib partitions the data across multiple nodes, distributes the computation of regression coefficients in parallel, and aggregates the results to form the final model, leveraging Spark's distributed computing capabilities.
C
MLlib only supports decision trees and does not handle linear regression.
D
MLlib uses a centralized server to collect all data and then processes it using linear regression.