
Answer-first summary for fast verification
Answer: Ingesting all raw data and metadata from Kafka to a bronze Delta table creates a permanent, replayable history of the data state.
The scenario describes data loss due to a critical field being omitted during ingestion from Kafka to Delta Lake. Kafka's 7-day retention means the original data is no longer available. Delta Lake can prevent this by using a bronze table to ingest all raw data and metadata from Kafka, creating a permanent, replayable history. This ensures that even if downstream processes omit fields, the raw data remains accessible in Delta Lake for reprocessing. Options A, B, C, and D are incorrect because Delta Lake does not inherently record Kafka producer history, retroactively calculate missing fields, enforce source field inclusion, or prevent data deletion. Option E addresses the root cause by emphasizing the importance of capturing all raw data upfront.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data engineer observes that a critical field present in the Kafka source was inadvertently omitted when writing to Delta Lake, causing it to also be missing in downstream long-term storage. The Kafka retention period is seven days, and the pipeline has been running in production for three months.
How can Delta Lake help prevent this type of data loss in the future?
A
The Delta log and Structured Streaming checkpoints record the full history of the Kafka producer.
B
Delta Lake schema evolution can retroactively calculate the correct value for newly added fields, as long as the data was in the original source.
C
Delta Lake automatically checks that all fields present in the source data are included in the ingestion layer.
D
Data can never be permanently dropped or deleted from Delta Lake, so data loss is not possible under any circumstance.
E
Ingesting all raw data and metadata from Kafka to a bronze Delta table creates a permanent, replayable history of the data state.
No comments yet.