
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing, which of the following two approaches is used by Spark to record the offset range of the data being processed in each trigger?
A
Checkpointing and Write-ahead Logs
B
Structured Streaming cannot record the offset range of the data being processed in each trigger.
C
Replayable Sources and Idempotent Sinks
D
Write-ahead Logs and Idempotent Sinks
E
Checkpointing and Idempotent Sinks
Explanation:
Structured Streaming uses two key mechanisms to reliably track processing progress and handle failures:
1. Checkpointing - Stores the current state of the streaming query, including:
2. Write-ahead Logs (WAL) - Records the offset ranges being processed in each trigger before the actual processing begins. This ensures:
Why other options are incorrect:
The combination of checkpointing and write-ahead logs provides the fault tolerance and exactly-once semantics that Structured Streaming is known for.