
Answer-first summary for fast verification
Answer: Checkpointing and Write-ahead Logs
In Apache Spark Structured Streaming, fault tolerance and progress tracking are achieved mainly through: Checkpointing: Saves the progress (offsets, state) of the streaming query to a reliable storage location (e.g., DBFS, S3, HDFS). Allows Spark to restart from the last saved state after a failure. Write-ahead logs (WAL) For some sources (like Kafka), Spark can store the data or metadata before processing, ensuring that it can recover exactly the same data after a failure. Idempotent sinks are important for avoiding duplicates when reprocessing, but they are not the mechanism for logging the offset range. The offset range logging is handled by checkpointing (and sometimes WAL). For logging the offset range of the data processed during each trigger, Spark uses: ✅ A: Checkpointing and Write-ahead Logs
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
To ensure that Structured Streaming can accurately track the progress of its data processing and manage any potential failures by restarting or reprocessing, which of the following two methods does Spark utilize to log the offset range of the data processed during each trigger?
A
Checkpointing and Write-ahead Logs
B
Structured Streaming cannot record the offset range of the data being processed in each trigger.
C
Replayable Sources and Idempotent Sinks
D
Write-ahead Logs and Idempotent Sinks
E
Checkpointing and Idempotent Sinks
No comments yet.