
Answer-first summary for fast verification
Answer: Change the pipeline code, and introduce a Reshuffle step to prevent fusion.
The correct answer is B: Change the pipeline code, and introduce a Reshuffle step to prevent fusion. Fusion optimization in Dataflow can lead to multiple transformations being 'fused' together into a single stage, which can limit parallelization and hinder performance. By introducing a Reshuffle step, you break the fusion, improving parallelism and potentially triggering the autoscaler to increase the number of workers. This approach helps to distribute work more effectively across available workers, enhancing overall job performance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are operating a Dataflow streaming pipeline that utilizes both the Streaming Engine and Horizontal Autoscaling features. The pipeline's maximum worker limit has been set to 1,000 workers. The input for this pipeline consists of Pub/Sub messages that relay notifications from Cloud Storage. One of the transformations in the pipeline involves reading CSV files, and it emits an element for each line in the CSV. However, you are experiencing low job performance; the pipeline is currently only employing 10 workers, and the autoscaler isn't scaling up to add more workers. What steps should you take to enhance the performance of the pipeline?
A
Enable Vertical Autoscaling to let the pipeline use larger workers.
B
Change the pipeline code, and introduce a Reshuffle step to prevent fusion.
C
Update the job to increase the maximum number of workers.
D
Use Dataflow Prime, and enable Right Fitting to increase the worker resources.