
Answer-first summary for fast verification
Answer: Data arriving within the specified watermark threshold is processed, while data arriving outside this threshold is discarded.
Watermarking is a feature in Structured Streaming that defines a threshold for how late data is allowed to be relative to the maximum event time observed. If data arrives within the watermark duration, the engine updates the state and incorporates the records. However, if data arrives with an event time older than the watermark, it is dropped to prevent the state from growing indefinitely and to maintain processing efficiency.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
When utilizing watermarking within a Spark Structured Streaming pipeline integrated with Delta Lake, how is late-arriving data managed?
A
The system guarantees that all late-arriving data is processed by dynamically extending the state duration.
B
Data arriving within the specified watermark threshold is processed, while data arriving outside this threshold is discarded.
C
Late-arriving data is automatically rerouted to a separate Delta table for manual reconciliation.
D
Watermarking ensures that late-arriving data is re-sorted into the correct historical partitions without affecting the current state.