
Ultimate access to all questions.
A data pipeline uses Structured Streaming to ingest data from Apache Kafka into Delta Lake, storing data in a bronze table that includes Kafka's timestamp, key, and value. After three months, the data engineering team observes intermittent latency issues during peak hours.
A senior data engineer modifies the Delta table's schema and ingestion logic to include the current timestamp (recorded by Spark), Kafka topic, and partition. The team intends to use these additional metadata fields to troubleshoot the transient delays.
What limitation will the team encounter when diagnosing this issue?
A
New fields will not be computed for historic records.
B
Spark cannot capture the topic and partition fields from a Kafka source.
C
Updating the table schema requires a default value provided for each field added.
D
Updating the table schema will invalidate the Delta transaction log metadata.