
Answer-first summary for fast verification
Answer: Use watermarks and timestamps to capture the lagged data.
The correct answer is C: Use watermarks and timestamps to capture the lagged data. Watermarks are a way to indicate that some data may still be in transit and not yet processed. By setting a watermark, you can define a time period during which Dataflow will continue to accept late or out-of-order data and incorporate it into your processing. This allows you to maintain a predictable time period for processing while still allowing for some flexibility in the arrival of data. Timestamps, on the other hand, are used to order events correctly, even if they arrive out of order. By assigning timestamps to each event, you can ensure that they are processed in the correct order, even if they don't arrive in that order.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Your company consistently deals with both batch- and stream-based event data for various applications. To handle and process this data effectively, you intend to use Google Cloud Dataflow within a predictable time frame. Nevertheless, you have identified that, at times, the data might arrive later than expected or in a non-sequential manner. In light of these potential delays and disorder, how should you structure your Cloud Dataflow pipeline to appropriately manage and process data that arrives late or out of order?
A
Set a single global window to capture all the data.
B
Set sliding windows to capture all the lagged data.
C
Use watermarks and timestamps to capture the lagged data.
D
Ensure every datasource type (stream or batch) has a timestamp, and use the timestamps to define the logic for lagged data.