
Ultimate access to all questions.
Consider a scenario where you have a large dataset with millions of rows and you want to train a machine learning model using Spark ML. Explain the steps involved in handling the large dataset and optimizing the performance of the model. Provide a code snippet demonstrating the optimization techniques used in Spark ML.
A
Use the cache method of the Spark DataFrame API to cache the entire dataset in memory, and then train the model using Spark ML.
B
Use the repartition method of the Spark DataFrame API to increase the number of partitions, and then train the model using Spark ML.
C
Use the persist method of the Spark DataFrame API with the MEMORY_AND_DISK storage level to cache the dataset, and then train the model using Spark ML.
D
Use the coalesce method of the Spark DataFrame API to decrease the number of partitions, and then train the model using Spark ML.