Google Professional Data Engineer

Ultimate access to all questions.

You are operating a Dataflow streaming pipeline that utilizes both the Streaming Engine and Horizontal Autoscaling features. The pipeline's maximum worker limit has been set to 1,000 workers. The input for this pipeline consists of Pub/Sub messages that relay notifications from Cloud Storage. One of the transformations in the pipeline involves reading CSV files, and it emits an element for each line in the CSV. However, you are experiencing low job performance; the pipeline is currently only employing 10 workers, and the autoscaler isn't scaling up to add more workers. What steps should you take to enhance the performance of the pipeline?

Exam-Like

Enable Vertical Autoscaling to let the pipeline use larger workers.

17.8%

Change the pipeline code, and introduce a Reshuffle step to prevent fusion.

Loading comments...