Ultimate access to all questions.
The data engineering team maintains a Delta Lake table named 'daily_activities', which is completely overwritten each night with new data from the source system. For auditing purposes, they aim to use Delta Lake's Time Travel functionality to identify differences between the new and previous versions of the table. After retrieving the current table version with the code: current_version = spark.sql('SELECT max(version) FROM (DESCRIBE HISTORY daily_activities)').collect()[0][0]
, which query should they use to accomplish this task?
Explanation:
Delta Lake's Time Travel functionality allows for auditing operations or querying a table at a specific point in time by using version numbers or timestamps. The EXCEPT
set operator is particularly useful for identifying the differences between two versions of a table. This approach enables the team to compare the current version of the 'daily_activities' table with its previous version effectively. For more details, refer to the Delta Lake documentation on history and SQL reference for set operations.