
Answer-first summary for fast verification
Answer: Add a try... catch block to your DoFn that transforms the data, write erroneous rows to PubSub directly from the DoFn.
## Explanation Option C is the correct answer because: - **Try-catch block in DoFn**: This handles errors during data transformation gracefully without stopping the entire pipeline - **Write erroneous rows to PubSub directly**: This ensures that all failing data is captured and stored in a durable message queue for later reprocessing - **Reprocessing capability**: PubSub allows you to replay messages, enabling reprocessing of failed data - **Reliability**: The pipeline continues processing valid data while capturing errors separately **Why other options are less optimal:** - **Option A**: Filtering skips errors but doesn't capture them for reprocessing - **Option B**: Extracting from logs is unreliable and difficult to reprocess systematically - **Option D**: Using sideOutputs is good for error handling, but storing to PubSub "later" adds complexity and potential data loss This approach ensures pipeline reliability while maintaining the ability to reprocess all failing data efficiently.
Author: LeetQuiz .
Ultimate access to all questions.
NO.43 Your team is responsible for developing and maintaining ETLs in your company. One of your Dataflow jobs is failing because of some errors in the input data, and you need to improve reliability of the pipeline (incl. being able to reprocess all failing data). What should you do?
A
Add a filtering step to skip these types of errors in the future, extract erroneous rows from logs.
B
Add a try... catch block to your DoFn that transforms the data, extract erroneous rows from logs.
C
Add a try... catch block to your DoFn that transforms the data, write erroneous rows to PubSub directly from the DoFn.
D
Add a try... catch block to your DoFn that transforms the data, use a sideOutput to create a PCollection that can be stored to PubSub later.
No comments yet.