
Answer-first summary for fast verification
Answer: No, each Structured Streaming job must use its own unique checkpoint directory to maintain independent state and track offsets correctly.
Each Structured Streaming query **must** have its own dedicated checkpoint directory. The checkpoint folder stores the query's offsets, state information, and metadata. If two independent queries share the same folder, they will overwrite each other's progress files, leading to metadata corruption, stalled streams, or `ConcurrentTransactionException` errors. Databricks documentation is explicit on this: 'Each query must have a different checkpoint location. Multiple queries should never share the same location.' While Delta Lake's transaction log handles concurrent writes to the table data itself, the streaming engine requires isolated checkpoints to manage the specific progress of each individual stream.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data engineer is configuring two independent Structured Streaming jobs. Both jobs consume data from different Kafka topics but write to the same Delta Lake bronze table using an identical schema. The proposed directory structure is as follows:
./bronze/_checkpoint (shared)./bronze/_delta_log./bronze/year_week=2023.02Can both streaming queries safely share the single ./bronze/_checkpoint folder? Why or why not?
A
Yes, it is a supported practice for multiple streaming jobs writing to the same destination Delta table to share a single checkpoint location.
B
No, each Structured Streaming job must use its own unique checkpoint directory to maintain independent state and track offsets correctly.
C
No, because Delta Lake leverages its internal transaction log for state tracking, an external checkpoint directory is not required and will cause conflicts.
D
Yes, sharing a checkpoint is possible as long as the year_week partitions are distinct for each job's write operations.
E
Technically, this layout works for small batches, but assigning each job its own checkpoint folder is only recommended to improve fault isolation and debugging.