Google Professional Data Engineer

Get started today

Ultimate access to all questions.

In your role as a data engineer, you are managing a real-time data processing pipeline using Google Cloud Dataflow. This pipeline is designed to receive messages from a Google Cloud Pub/Sub topic and subsequently write the processed data to a BigQuery dataset located in the European Union. Currently, the Dataflow pipeline is hosted in the europe-west4 region and operates with a maximum of 3 worker instances, each of type n1-standard-1. Recently, you have observed that during periods of high data inflow, the pipeline struggles with processing records promptly, as all three worker instances reach maximum CPU utilization.

Which two actions can you take to improve the performance of your Dataflow pipeline? (Choose two.)

Exam-Like

Increase the number of max workers

43.6%

Comments

Loading comments...

Create a temporary table in Bigtable that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Bigtable to BigQuery

9.0%

Create a temporary table in Cloud Spanner that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Spanner to BigQuery

7.7%