Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
In the context of Spark MLlib, explain how Spark scales linear regression for large datasets and what are the key components involved in this process.
A
Spark uses a single machine to process the entire dataset, applying a standard linear regression algorithm.
B
Spark distributes the data across multiple nodes, parallelizing the computation of the cost function and gradient descent steps.
C
Spark applies a distributed version of the stochastic gradient descent algorithm, which updates the model parameters iteratively using subsets of the data.
D
Spark uses a combination of distributed data storage and a parallelized version of the normal equation to solve for the model parameters.