
Answer-first summary for fast verification
Answer: They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created
The correct answer is D. The default windowing behavior in Apache Beam (and by extension, Google Cloud Dataflow) is to assign all elements to a single global window. When working with unbounded data sources such as streaming data from Google Cloud Pub/Sub, the default global window will cause the job to wait indefinitely until all data is received, which is impractical for streaming applications. Thus, you must set a non-global windowing function to properly process the streaming data. If a non-global windowing function is not set, the pipeline will generate an error upon construction and fail.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Your company is in the process of constructing data pipelines for an upcoming campaign. A crucial business requirement involves periodically identifying the inputs and their respective timings throughout the campaign from streaming data using Google Cloud Pub/Sub. To meet this requirement, your engineering team has opted to implement windowing and transformation using Google Cloud Dataflow. Despite this implementation strategy, testing indicates that the Cloud Dataflow job consistently fails for all streaming inserts. What is the most probable reason for this issue?
A
They have not assigned the timestamp, which causes the job to fail
B
They have not set the triggers to accommodate the data coming in late, which causes the job to fail
C
They have not applied a global windowing function, which causes the job to fail when the pipeline is created
D
They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created
No comments yet.