LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Google Professional Data Engineer

Google Professional Data Engineer

Get started today

Ultimate access to all questions.


Comments

Loading comments...

You've recently migrated a Hadoop job from your on-premises cluster to Google Cloud Dataproc and Google Cloud Storage (GCS). Your job involves running a complex set of Spark analytical tasks which include numerous shuffling operations. The initial dataset consists of parquet files, each ranging between 200 to 400 MB in size. Post-migration, you've noticed a decline in performance on Dataproc and are looking to optimize this. Given that your organization is highly cost-sensitive, you aim to maintain the use of Dataproc utilizing preemptible VMs with only two non-preemptible workers allocated to this workload. What steps should you take to achieve this optimization?

Exam-Like



Powered ByGPT-5