
Answer-first summary for fast verification
Answer: Use Delta Lake’s versioning and time travel features to execute a query (e.g., using `EXCEPT` or a join) that compares the current version with the previous version.
Delta Lake automatically versions every commit to a table in its transaction log. Using **Time Travel**, you can query any historical snapshot by specifying a version number or timestamp. By querying the table 'as of' two successive versions and performing a set-based comparison (such as using the `EXCEPT` operator or a `FULL OUTER JOIN`), you can pinpoint the exact rows that changed between versions. * **Why the others are incorrect:** * **DESCRIBE HISTORY**: While this command provides high-level metadata (who changed the table, when, and high-level metrics like `numOutputRows`), it does not return the actual data content of the changed records. * **Spark Event Logs**: These logs capture performance metrics and execution details (stages, tasks) for the Spark engine; they do not contain the record-level data stored in the table. * **Transaction Log Parsing**: Although the log records file additions and removals, parsing these low-level JSON files is a complex, brittle, and unsupported method for data comparison. Time travel is the intended feature for this use case.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data engineering team updates a Delta Lake table named customer_churn_params nightly by overwriting it with new data. Following each successful update, the team needs to identify the record-level differences between the current version and the previous version. Which method should be used to achieve this?
A
Execute the DESCRIBE HISTORY customer_churn_params command to retrieve the operation metrics and a log of all added or modified records.
B
Analyze the Spark event logs to identify the specific rows that were updated, inserted, or deleted during the job execution.
C
Use Delta Lake’s versioning and time travel features to execute a query (e.g., using EXCEPT or a join) that compares the current version with the previous version.
D
Manually parse the Delta Lake transaction log files (_delta_log) to identify and decode the specific Parquet files containing the new data.