
Answer-first summary for fast verification
Answer: Checkpointing and Write-ahead Logs
In Structured Streaming, Spark uses checkpointing and write-ahead logs to record the offset range of the data being processed in each trigger. Checkpointing is a mechanism to persist the intermediate state of the streaming computation, thus enabling fault-tolerance by recovering this state and starting from where it left off in case of a failure. Write-ahead logs ensure that any modifications to the state are saved before the state is updated, thus providing reliability and consistency in tracking the exact progress of the processing. This combination ensures that Spark can reliably track and handle any kind of failure by either restarting the process or reprocessing the data.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
What is used by Spark to record the offset range of the data being processed in each trigger in order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing?
A
Checkpointing and Write-ahead Logs
B
Replayable Sources and Idempotent Sinks
C
Write-ahead Logs and Idempotent Sinks
D
Checkpointing and Idempotent Sinks