
Explanation:
To satisfy both the requirement for a historical audit trail and a 'current state' analytical table, the Medallion Architecture pattern is ideal.
MERGE INTO to upsert records from the Bronze table into the Silver table allows you to collapse multiple changes for the same pk_id into a single 'latest' row. This provides an efficient, authoritative view for analytics.Why other options are incorrect:
Ultimate access to all questions.
No comments yet.
An upstream system provides Change Data Capture (CDC) logs representing INSERT, UPDATE, and DELETE operations for a source table with a primary key (pk_id). The data governance team requires a solution that preserves a full historical audit trail while also maintaining a table containing only the most recent records for analytical queries. Data is ingested hourly, and individual records may undergo multiple changes within a single ingestion window.
Which solution effectively meets these requirements while following Databricks Medallion Architecture best practices?
A
Ingest all incoming CDC logs into an append-only bronze table to maintain a full history, then use MERGE INTO to upsert the most recent version of each pk_id into a silver table to represent the current state.
B
Utilize MERGE INTO directly on a bronze table to handle incoming CDC operations for each pk_id, then propagate these changes downstream to minimize storage requirements.
C
Implement Delta Lake Change Data Feed (CDF) to automatically ingest and process external CDC logs directly from the source system into the Lakehouse.
D
Sequence the incoming changes chronologically and apply them to a target table, relying on Delta Lake’s internal table versioning and time travel to serve as the required audit log.
E
Create separate history tables for every unique pk_id and resolve the current state by performing a UNION operation of the most recent entries during query execution.