
Answer-first summary for fast verification
Answer: No; each of the streams needs to have its own checkpoint directory.
The checkpoint directory structure is invalid. In Structured Streaming, each stream must have its own checkpoint directory to track its progress independently. Sharing a checkpoint directory between two streams can lead to conflicts, such as overwriting metadata or incorrect offset tracking. While Delta Lake supports concurrent writers (option D), this refers to writing to the table itself, not sharing checkpoints. Option E correctly identifies that each stream requires a separate checkpoint directory, which is a fundamental requirement in Structured Streaming.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data architect has designed a system where two Structured Streaming jobs concurrently write to a single bronze Delta table. Each job consumes data from a different Apache Kafka topic but writes records with identical schemas. To simplify the directory structure, a data engineer proposes using a shared nested checkpoint directory for both streams, as shown below:
/bronze
-_checkpoint
- delta_log
- year_week=2020_01
- year_week=2020_02
/bronze
-_checkpoint
- delta_log
- year_week=2020_01
- year_week=2020_02
Is this checkpoint directory structure valid for the given scenario? Explain why or why not.
A
No; Delta Lake manages streaming checkpoints in the transaction log.
B
Yes; both of the streams can share a single checkpoint directory.
C
No; only one stream can write to a Delta Lake table.
D
Yes; Delta Lake supports infinite concurrent writers.
E
No; each of the streams needs to have its own checkpoint directory.
No comments yet.