
Ultimate access to all questions.
Your analytics team aims to develop a basic statistical model to identify customers who are most likely to re-engage with your company, based on various metrics. They intend to deploy the model on Apache Spark using datasets stored in Google Cloud Storage. You have suggested leveraging Google Cloud Dataproc for executing this task. Preliminary tests indicate that the workload completes in about 30 minutes on a 15-node cluster, with the results being saved to Google BigQuery. The objective is to execute this workload on a weekly basis. How should you optimize the cluster to minimize costs?
A
Migrate the workload to Google Cloud Dataflow
B
Use pre-emptible virtual machines (VMs) for the cluster
C
Use a higher-memory node so that the job runs faster
D
Use SSDs on the worker nodes so that the job can run faster