
Answer-first summary for fast verification
Answer: Use Delta Lake’s change data feed to automatically process CDC data from an external system, propagating all changes to all dependent tables in the Lakehouse.
The question requires maintaining a full history for auditing and the latest records for analytics. Option D leverages Delta Lake's Change Data Feed (CDF) to track all changes made to a Delta table when CDC logs are ingested. By applying external CDC data to the Delta table (e.g., via MERGE), CDF automatically records all row-level changes, providing a full audit trail. The current state of the Delta table serves analytical needs. While option A manually tracks changes, option D uses built-in Delta features, aligning with best practices for scalability and efficiency. Other options either fail to maintain history (B, C) or lack native integration (A).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
An upstream system generates change data capture (CDC) logs that are stored in a cloud object storage directory. Each log entry specifies the change type (insert, update, or delete) along with the post-change field values. The source table has a primary key field named pk_id.
For auditing, the data governance team requires a complete history of all valid values from the source system. For analytics, only the latest value for each record must be retained. The Databricks job ingests these records hourly, but individual records may have undergone multiple changes within that hour.
Which solution fulfills these requirements?
A
Iterate through an ordered set of changes to the table, applying each in turn to create the current state of the table, (insert, update, delete), timestamp of change, and the values.
B
Use merge into to insert, update, or delete the most recent entry for each pk_id into a table, then propagate all changes throughout the system.
C
Deduplicate records in each batch by pk_id and overwrite the target table.
D
Use Delta Lake’s change data feed to automatically process CDC data from an external system, propagating all changes to all dependent tables in the Lakehouse.
No comments yet.