
Explanation:
The correct answer is D. Replace SideInput with CoGroupByKey. Utilizing CoGroupByKey can be more efficient than SideInputs in certain scenarios, as it optimizes the processing of data across multiple collections without the overhead associated with side inputs. This change can potentially speed up the Dataflow job.
In summary, switching to CoGroupByKey from SideInput is the most effective strategy to enhance the pipeline's performance in this context.
Ultimate access to all questions.
No comments yet.
You're assessing a Dataflow pipeline that processes gzip-compressed text files, manages errors by directing them to a dead-letter queue, and employs SideInputs for data joining. The pipeline is running slower than expected. What strategies can you implement to accelerate the Dataflow job?
A
Retry records that encounter errors.
B
Decrease the batch size.
C
Opt for compressed Avro files instead.
D
Replace SideInput with CoGroupByKey.