
Explanation:
Correct Answer: B - Inserting a Reshuffle operation after each processing step ensures data is evenly distributed, optimizing processing and potentially alleviating bottlenecks. Monitoring execution details in the Dataflow console helps identify where the bottleneck occurs.
Why other options are incorrect:
Ultimate access to all questions.
As a maintainer of ETL pipelines, you notice a Dataflow streaming pipeline is lagging due to automatic optimization and graph merging into a single step. How can you identify the bottleneck in this scenario?
A
Log debug information in each ParDo function, and analyze the logs at execution time.
B
Insert a Reshuffle operation after each processing step, and monitor the execution details in the Dataflow console.
C
Verify that the Dataflow service accounts have appropriate permissions to write the processed data to the output sinks.
D
Insert output sinks after each key processing step, and observe the writing throughput of each block.
No comments yet.