
Answer-first summary for fast verification
Answer: Use a distributed version of the stochastic gradient descent algorithm to parallelize the computation.
In a Spark MLlib implementation, using a distributed version of the stochastic gradient descent algorithm would be the most effective strategy for optimizing the performance of a linear regression model on a large dataset. This approach parallelizes the computation of the gradient descent steps, allowing Spark to efficiently process large datasets and improve the scalability and efficiency of the model. While other strategies, such as increasing the number of iterations, reducing the number of features, or increasing the batch size, may also improve performance, they may not be as effective as parallelizing the computation using a distributed algorithm.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a Spark MLlib implementation, you are tasked with optimizing the performance of a linear regression model on a large dataset. Which of the following strategies would be most effective in improving the scalability and efficiency of the model?
A
Increase the number of iterations of the gradient descent algorithm to ensure convergence.
B
Use a distributed version of the stochastic gradient descent algorithm to parallelize the computation.
C
Reduce the number of features in the dataset to simplify the model and reduce the computational complexity.
D
Increase the batch size of the gradient descent algorithm to speed up the convergence.
No comments yet.