
Answer-first summary for fast verification
Answer: Ingest all log information into a bronze table; use MERGE INTO to insert, update, or delete the most recent entry for each pk_id into a silver table to recreate the current table state.
The question requires maintaining a full audit trail of all historical values and the most recent state for analytics. Option E correctly addresses both needs by ingesting all CDC logs into a bronze table (retaining full history for auditing) and using MERGE INTO in a silver table to maintain the current state. This aligns with the medallion architecture best practices. Other options like C and D were considered but found lacking: C's reliance on Delta versioning may not capture individual changes within an hourly batch, and D's CDF approach focuses on propagating changes from Delta tables, not directly handling external CDC logs.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
An upstream system generates change data capture (CDC) logs that are stored in a cloud object storage directory. Each log entry specifies the change type (insert, update, or delete) along with the post-change values for all fields. The source table has a primary key field named pk_id.
The data governance team requires a complete historical record of all valid values from the source system for auditing. For analytical purposes, only the latest value for each record needs to be retained. The Databricks job ingests these records hourly, but individual records may have undergone multiple changes within that hour.
What solution fulfills these requirements?
A
Create a separate history table for each pk_id resolve the current state of the table by running a union all filtering the history tables for the most recent state.
B
Use MERGE INTO to insert, update, or delete the most recent entry for each pk_id into a bronze table, then propagate all changes throughout the system.
C
Iterate through an ordered set of changes to the table, applying each in turn; rely on Delta Lake's versioning ability to create an audit log.
D
Use Delta Lake's change data feed to automatically process CDC data from an external system, propagating all changes to all dependent tables in the Lakehouse.
E
Ingest all log information into a bronze table; use MERGE INTO to insert, update, or delete the most recent entry for each pk_id into a silver table to recreate the current table state.