Google Professional Data Engineer

Get started today

Ultimate access to all questions.

You are tasked with designing a cost-efficient data pipeline on Google Cloud that ingests JSON messages from Cloud Pub/Sub, processes them, and subsequently loads the transformed data into BigQuery. The solution must accommodate fluctuating input data volumes with minimal manual intervention. Which approach should you take to achieve this?

Exam-Like

Use Cloud Dataproc to run your transformations. Monitor CPU utilization for the cluster. Resize the number of worker nodes in your cluster via the command line.

6.3%

Use Cloud Dataproc to run your transformations. Use the diagnose command to generate an operational output archive. Locate the bottleneck and adjust cluster resources.

8.3%

Comments

Loading comments...

Use Cloud Dataflow to run your transformations. Monitor the total execution time for a sampling of jobs. Configure the job to use non-default Compute Engine machine types when needed.

12.5%