
Answer-first summary for fast verification
Answer: Leverage Delta Lake's time travel feature to query the table at its current and previous versions, then use a set-based comparison like `EXCEPT` to find differences.
Delta Lake automatically maintains a transaction log that enables **Time Travel**. This allows you to query any historical snapshot of a table by specifying a version number or timestamp. To identify differences between two versions, you can query both snapshots (e.g., `SELECT * FROM table VERSION AS OF N` and `VERSION AS OF N-1`) and perform a set-based operation like `EXCEPT` or a `FULL OUTER JOIN` to pinpoint exactly which rows changed. * **DESCRIBE HISTORY** provides metadata about the transaction (who, when, what operation), but it does not contain the actual data rows that were changed. * **Transaction Logs** (the `_delta_log` folder) are intended for the Delta engine to maintain ACID compliance and are not designed for manual row-level diffing. * **Spark Event Logs** focus on job execution metrics (stages, tasks, and memory) rather than the data values being transformed.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
The data engineering team performs a nightly full overwrite of the customer_churn_params Delta Lake table used for machine learning. To ensure data quality, the team must identify the specific row-level differences between the current table version and the version immediately preceding the update.
Which method should be used to achieve this?
A
Directly parse the JSON files within the _delta_log directory to identify and extract row-level changes from the underlying Parquet data files.
B
Execute DESCRIBE HISTORY customer_churn_params to retrieve the operation metrics and extract a detailed log of the specific records that were modified.
C
Analyze the Spark event logs in the cluster UI to identify the specific records processed during the overwrite operation.
D
Leverage Delta Lake's time travel feature to query the table at its current and previous versions, then use a set-based comparison like EXCEPT to find differences.