
Explanation:
Delta Lake automatically maintains a transaction log that enables Time Travel. This allows you to query any historical snapshot of a table by specifying a version number or timestamp. To identify differences between two versions, you can query both snapshots (e.g., SELECT * FROM table VERSION AS OF N and VERSION AS OF N-1) and perform a set-based operation like EXCEPT or a FULL OUTER JOIN to pinpoint exactly which rows changed.
_delta_log folder) are intended for the Delta engine to maintain ACID compliance and are not designed for manual row-level diffing.Ultimate access to all questions.
No comments yet.
The data engineering team performs a nightly full overwrite of the customer_churn_params Delta Lake table used for machine learning. To ensure data quality, the team must identify the specific row-level differences between the current table version and the version immediately preceding the update.
Which method should be used to achieve this?
A
Directly parse the JSON files within the _delta_log directory to identify and extract row-level changes from the underlying Parquet data files.
B
Execute DESCRIBE HISTORY customer_churn_params to retrieve the operation metrics and extract a detailed log of the specific records that were modified.
C
Analyze the Spark event logs in the cluster UI to identify the specific records processed during the overwrite operation.
D
Leverage Delta Lake's time travel feature to query the table at its current and previous versions, then use a set-based comparison like EXCEPT to find differences.