
Ultimate access to all questions.
What is used by Spark to record the offset range of the data being processed in each trigger in order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing?
A
Checkpointing and Write-ahead Logs
B
Replayable Sources and Idempotent Sinks
C
Write-ahead Logs and Idempotent Sinks
D
Checkpointing and Idempotent Sinks
Explanation:
Correct Answer: A (Checkpointing and Write-ahead Logs)
Spark Structured Streaming uses Checkpointing and Write-ahead Logs to reliably track the exact progress of data processing. Here's why:
Checkpointing:
Write-ahead Logs (WAL):
Checkpointing is specifically designed to record the progress of streaming queries (including offset ranges), while write-ahead logs ensure data durability. Together, they enable Structured Streaming to handle failures reliably by restarting and/or reprocessing from known checkpoints.