Reddit

Your analytics team aims to develop a basic statistical model to identify customers who are most likely to re-engage with your company, based on various metrics. They intend to deploy the model on Apache Spark using datasets stored in Google Cloud Storage. You have suggested leveraging Google Cloud Dataproc for executing this task. Preliminary tests indicate that the workload completes in about 30 minutes on a 15-node cluster, with the results being saved to Google BigQuery. The objective is to execute this workload on a weekly basis. How should you optimize the cluster to minimize costs? | Google Professional Data Engineer Quiz - LeetQuiz