
Answer-first summary for fast verification
Answer: Execute a query to calculate the difference between the new version and the previous version using Delta Lake’s built-in versioning and lime travel functionality.
The question asks how to determine the difference between the new and previous versions of a Delta Lake table after each overwrite. Delta Lake's time travel functionality (Option A) allows querying previous versions of the table, making it possible to compare the current and previous versions directly. This is the most straightforward and efficient method given the current implementation. Option E, using Delta Lake's change data feed, is also correct but only if the change data feed is enabled, which is not mentioned in the question. Therefore, the most reliable method based on the information provided is Option A. Options B, C, and D are incorrect as they do not directly provide the means to compare the differences between the two versions of the table.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A Delta Lake table named customer_churn_params in the Lakehouse is utilized for churn prediction by the machine learning team. This table contains customer data aggregated from multiple upstream sources. The data engineering team currently refreshes this table nightly by completely overwriting it with the latest valid values from upstream sources.
After each successful update, the team needs to identify the differences between the newly written version and the previous version of the table.
Given this implementation, which approach can be used to accomplish this?
A
Execute a query to calculate the difference between the new version and the previous version using Delta Lake’s built-in versioning and lime travel functionality.
B
Parse the Delta Lake transaction log to identify all newly written data files.
C
Parse the Spark event logs to identify those rows that were updated, inserted, or deleted.
D
Execute DESCRIBE HISTORY customer_churn_params to obtain the full operation metrics for the update, including a log of all records that have been added or modified.
E
Use Delta Lake’s change data feed to identify those records that have been updated, inserted, or deleted.