
Answer-first summary for fast verification
Answer: No; each of the streams needs to have its own checkpoint directory.
The checkpoint directory structure is invalid because each Structured Streaming job requires its own dedicated checkpoint directory to track its progress and ensure fault tolerance. Sharing a checkpoint directory between two concurrent streams can lead to conflicts and corruption of checkpoint data, as each stream's progress would interfere with the other. Delta Lake does not manage streaming checkpoints in the transaction log (eliminating option A), and multiple streams can write to the same Delta table if configured properly (eliminating option C). Option B is incorrect because sharing checkpoints is unsafe. Thus, the correct answer is D.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data architect has designed a system where two Structured Streaming jobs will concurrently write to the same bronze Delta table. Each job consumes data from a different Apache Kafka topic but writes records with identical schemas. To simplify the directory structure, a data engineer proposes using a shared checkpoint directory for both streams, with the following layout:
/bronze
_checkpoint
delta_log
year_week=2020_01
year_week=2020_02
/bronze
_checkpoint
delta_log
year_week=2020_01
year_week=2020_02
Is this checkpoint directory structure valid for the given scenario, and why?
A
No; Delta Lake manages streaming checkpoints in the transaction log.
B
Yes; both of the streams can share a single checkpoint directory.
C
No; only one stream can write to a Delta Lake table.
D
No; each of the streams needs to have its own checkpoint directory.
No comments yet.