Google Professional Data Engineer

Get started today

Ultimate access to all questions.

Explanation:

A. Incorrect. Because many jobs are high priority, running them in sequence would not meet the business requirements. B. Incorrect. With Dataproc, this is not a recommended approach. Configurations and running jobs could interfere with each other. C. Correct. Jobs can use ephemeral clusters to quickly run the job and then deallocate the resources after use. Multiple jobs can be run in parallel without interfering with each other. D. Incorrect. Cluster autoscaling is effective within a cluster running individual jobs, but it is not recommended when running multiple jobs because they can interfere with the resource scaling. Links: https://cloud.google.com/blog/products/data-analytics/dataproc-job-optimization-how-t o-guide More information: Courses: Building Batch Data Pipelines on Google Cloud ● Executing Spark on Dataproc

Explanation:

Comments (0)

No comments yet.

You need to design a Dataproc cluster to run multiple small jobs. Many jobs (but not all) are of high priority.

Exam-Like

Reuse the same cluster and run each job in sequence.

8.6%

Reuse the same cluster to run all jobs in parallel.

17.1%

Use ephemeral clusters.

57.1%

Use cluster autoscaling.

17.1%