
Answer-first summary for fast verification
Answer: Opt for Cloud Storage as your persistent storage solution, Limit preemptible VMs to no more than 30% for secondary workers
Choosing Cloud Storage for persistent storage is advised over HDFS on local disks, ensuring data remains accessible even after cluster shutdown without the need for data migration. Google suggests capping preemptible VMs at 30% for secondary nodes to balance cost and reliability. Autoscaling is beneficial, especially with Cloud Storage, for clusters managing multiple jobs. For more insights, visit [Google's Dataproc Best Practices Guide](https://cloud.google.com/blog/topics/developers-practitioners/dataproc-best-practices-guide).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You're assisting a team of data analysts in setting up a Cloud Dataproc cluster for Spark-based analysis of large datasets. The cluster will handle numerous jobs. Adhering to Google's recommended best practices, which two actions would you prioritize?
A
Utilize HDFS storage on persistent disks
B
Opt for Cloud Storage as your persistent storage solution
C
Limit preemptible VMs to no more than 30% for secondary workers
D
Restrict preemptible VMs usage to workers operating HDFS
E
Turn off autoscaling features
No comments yet.