
Answer-first summary for fast verification
Answer: Increase the number of workers in the Spark cluster.
Increasing the number of workers in the Spark cluster (Option B) is the most effective strategy for improving training performance. This approach leverages distributed computing to parallelize tasks, significantly reducing training time for large datasets. While converting the dataset to a Delta table (Option A) can enhance data access speed, it doesn't directly impact the training process. MLflow Tracking (Option C) is valuable for monitoring but doesn't optimize performance. Model parallelism (Option D) requires complex modifications to the model architecture and is more suited to specific scenarios within deep learning frameworks.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data scientist is working on training a deep learning model with a large dataset in Databricks, facing challenges due to the resource-intensive and time-consuming nature of the process. What strategy can the data scientist employ to enhance the training performance within the Databricks environment?
A
Convert the dataset to a Databricks Delta table for faster access.
B
Increase the number of workers in the Spark cluster.
C
Utilize MLflow Tracking to monitor training progress.
D
Implement model parallelism in the deep learning model.
No comments yet.