
Answer-first summary for fast verification
Answer: Utilize watermarks to establish expected data arrival times, enabling the inclusion of late data within the correct window.
**Correct Answer: D. Utilize watermarks to establish expected data arrival times, enabling the inclusion of late data within the correct window.** Watermarks in Dataflow serve as timestamps that track event time progress, allowing you to define when data is expected to arrive. By configuring watermarks, you can specify how late data can still be considered for inclusion in its proper window. This approach directly addresses the issue of late data by providing a mechanism to account for delays without altering the windowing strategy. **Why the other options are incorrect:** - **A. Switch to tumbling windows to eliminate overlapping window periods, ensuring data is grouped within fixed intervals.** Tumbling windows create non-overlapping, fixed-size windows. While they simplify window management, they do not inherently solve the problem of late data, as they lack the flexibility to accommodate delays. - **B. Implement session windows, grouping data based on specific activity patterns to better manage late arrivals.** Session windows are designed to group data based on activity bursts or sessions. This method is not suited for handling late data, as it focuses on activity patterns rather than temporal delays. - **C. Adjust the hopping window size to allow more time for late data to be included in the grouping.** Expanding the hopping window may provide more time for late data to arrive, but it does not offer a precise mechanism for identifying and correctly placing late data within its intended window. Watermarks provide a more targeted solution.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are managing a streaming pipeline using Dataflow with hopping windows to group incoming data. However, some late data is not being recognized as such, causing inaccuracies in downstream aggregations. What measures should you implement to correctly capture and process this late data within its intended window?
A
Switch to tumbling windows to eliminate overlapping window periods, ensuring data is grouped within fixed intervals.
B
Implement session windows, grouping data based on specific activity patterns to better manage late arrivals.
C
Adjust the hopping window size to allow more time for late data to be included in the grouping.
D
Utilize watermarks to establish expected data arrival times, enabling the inclusion of late data within the correct window.
No comments yet.