
Answer-first summary for fast verification
Answer: Ingesting all raw data and metadata into a bronze Delta table to create a permanent, replayable history of the data state.
The correct strategy is to ingest all raw data and metadata into a **bronze Delta table**. This creates a permanent, immutable, and replayable history of the data as it arrived from the source. By capturing every record (including every field) in its raw state, you build a full history that allows for 'time travel' and data re-processing even after Kafka's retention window has expired. * **Why other options are incorrect:** * **Schema Evolution:** While it allows the schema to change over time, it cannot retroactively generate or 'backfill' data that was never actually ingested into the system. * **Delta Log/Checkpoints:** These track the state of the Delta table and the progress of the stream (what has been written), not the entire history of the external Kafka broker. * **Automatic Inclusion:** Delta Lake does not automatically capture fields that the pipeline code does not explicitly read and write. The engineer must design the bronze layer to capture the raw payload (e.g., as a JSON string or binary) to ensure no fields are lost.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data engineer discovers that a critical field from a Kafka source was omitted during ingestion into Delta Lake, causing it to be absent in all downstream storage. Although the field existed in the Kafka source, the Kafka service has a retention period of only seven days, while the pipeline has been running for three months.
How can Delta Lake be utilized to prevent this type of data loss in the future?
A
Ingesting all raw data and metadata into a bronze Delta table to create a permanent, replayable history of the data state.
B
Utilizing Delta Lake schema evolution to retroactively compute values for newly added fields from the original source.
C
Relying on the Delta transaction log and Structured Streaming checkpoints to maintain a complete history of the Kafka producer.
D
Enabling a setting in Delta Lake that ensures all fields from the source data are automatically included in the ingestion layer.