
Ultimate access to all questions.
In the context of Spark MLlib, explain how Spark scales linear regression for large datasets and what are the key components involved in this process.
A
Spark uses a single machine to process the entire dataset, applying a standard linear regression algorithm.
B
Spark distributes the data across multiple nodes, parallelizing the computation of the cost function and gradient descent steps.
C
Spark applies a distributed version of the stochastic gradient descent algorithm, which updates the model parameters iteratively using subsets of the data.
D
Spark uses a combination of distributed data storage and a parallelized version of the normal equation to solve for the model parameters.