
Ultimate access to all questions.
In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing, which of the following two approaches is used by Spark to record the offset range of the data being processed in each trigger?
A
Checkpointing and Write-ahead Logs
B
Structured Streaming cannot record the offset range of the data being processed in each trigger.
C
Replayable Sources and Idempotent Sinks
D
Write-ahead Logs and Idempotent Sinks
E
Checkpointing and Idempotent Sinks
Explanation:
Structured Streaming uses two key mechanisms to reliably track processing progress and handle failures:
1. Checkpointing - Stores the current state of the streaming query, including:
2. Write-ahead Logs (WAL) - Records the offset ranges being processed in each trigger before the actual processing begins. This ensures:
Why other options are incorrect:
The combination of checkpointing and write-ahead logs provides the fault tolerance and exactly-once semantics that Structured Streaming is known for.