
Answer-first summary for fast verification
Answer: The newly added columns will not be backfilled; historical records will contain `NULL` for these fields.
The correct answer is that **Delta Lake schema evolution is not retroactive**. When columns are added to an existing Delta table (via `ALTER TABLE ADD COLUMN` or `mergeSchema` during a write), the change applies only to new data. For all existing records already stored in the table, the new columns are assigned `NULL` values. Because this pipeline has already been running, the team will not be able to analyze delays for historical data using these new fields unless they reprocess/replay the raw source data. **Why the other options are incorrect:** * **Metadata extraction:** Structured Streaming natively exposes Kafka metadata (topic, partition, offset, etc.) as columns. * **Production Support:** Adding columns is a routine operation in Delta Lake and is fully supported via schema evolution. * **Transaction Log:** Schema updates are recorded as new actions in the transaction log; they do not invalidate or corrupt the log. * **Default Values:** While some SQL engines require defaults, Delta Lake handles new columns by treating them as nullable and assigning `NULL` to existing rows by default.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A senior data engineer is troubleshooting performance delays in a Structured Streaming pipeline that ingests data from Apache Kafka into a Delta Lake table. To improve observability, the engineer updates the ingestion logic and the Delta table schema to include the ingestion timestamp, the Kafka topic name, and the partition ID.
What specific limitation will the team face when implementing this change on the existing table?
A
The Delta transaction log metadata will be invalidated, necessitating a manual recovery of the table state.
B
A non-null default value must be explicitly provided for every new column during the schema evolution process.
C
Spark's Kafka source connector is natively unable to extract metadata fields like topic and partition into DataFrame columns.
D
The newly added columns will not be backfilled; historical records will contain NULL for these fields.
E
Delta Lake does not support adding new columns to existing production tables without a full table overwrite.