
Ultimate access to all questions.
A data architect has designed a system where two Structured Streaming jobs concurrently write to a single bronze Delta table. Each job consumes data from a different Apache Kafka topic but writes records with identical schemas. To simplify the directory structure, a data engineer proposes using a shared nested checkpoint directory for both streams, as shown below:
/bronze
-_checkpoint
- delta_log
- year_week=2020_01
- year_week=2020_02
/bronze
-_checkpoint
- delta_log
- year_week=2020_01
- year_week=2020_02
Is this checkpoint directory structure valid for the given scenario? Explain why or why not.
A
No; Delta Lake manages streaming checkpoints in the transaction log.
B
Yes; both of the streams can share a single checkpoint directory.
C
No; only one stream can write to a Delta Lake table.
D
Yes; Delta Lake supports infinite concurrent writers.
E
No; each of the streams needs to have its own checkpoint directory.