
Answer-first summary for fast verification
Answer: Create a side output for the erroneous data.
A. Incorrect. Re-reading the input data for processing only erroneous data is not efficient. B. Incorrect. Using separate pipelines for the same data (one to compute good data and the other for erroneous data) is not efficient. C. Incorrect. Erroneous data is not automatically available in the logs. D. Correct. Using side outputs can collect the erroneous data efficiently and is a recommended approach.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are running a Dataflow pipeline in production. The input data for this pipeline is occasionally inconsistent. Separately from processing the valid data, you want to efficiently capture the erroneous input data for analysis.
A
Re-read the input data and create separate outputs for valid and erroneous data.
B
Read the data once, and split it into two pipelines, one to output valid data and another to output erroneous data.
C
Check for the erroneous data in the logs.
D
Create a side output for the erroneous data.
No comments yet.