
Ultimate access to all questions.
A data engineer has realized that they made a mistake when making a daily update to a table. They need to use Delta time travel to restore the table to a version that is 3 days old. However, when the data engineer attempts to time travel to the older version, they are unable to restore the data because the data files have been deleted.
Which of the following explains why the data files are no longer present?
A
The VACUUM command was run on the table
B
The TIME TRAVEL command was run on the table
C
The DELETE HISTORY command was run on the table
D
The OPTIMIZE command was run on the table
E
The HISTORY command was run on the table
Explanation:
The correct answer is A. The VACUUM command was run on the table.
VACUUM command purpose: The VACUUM command in Delta Lake is used to physically delete data files that are no longer referenced by the Delta table and are older than the retention threshold.
Time travel retention: By default, Delta tables retain data files for 7 days to enable time travel. However, when VACUUM is run, it removes files that are older than the retention period (default is 7 days, but can be configured differently).
The scenario: In this case, the data engineer is trying to restore to a version that is 3 days old. If VACUUM was run recently, it may have cleaned up the data files associated with that 3-day-old version, making time travel impossible.
B. The TIME TRAVEL command was run on the table: There is no TIME TRAVEL command in Delta Lake. Time travel is achieved using SQL syntax like SELECT * FROM table_name VERSION AS OF version_number or RESTORE TABLE table_name TO VERSION AS OF version_number.
C. The DELETE HISTORY command was run on the table: There is no DELETE HISTORY command in Delta Lake. History information is managed automatically by Delta Lake.
D. The OPTIMIZE command was run on the table: The OPTIMIZE command compacts small files into larger ones to improve query performance, but it does not delete data files needed for time travel. It creates new files and marks old ones for deletion, but they remain available for time travel until cleaned by VACUUM.
E. The HISTORY command was run on the table: The DESCRIBE HISTORY command only displays the history of operations on a table; it does not delete any data files.
To prevent this issue, data engineers should:
spark.databricks.delta.retentionDurationCheck.enabledVACUUM on tables where time travel might be neededVACUUM table_name RETAIN 168 HOURS (or longer) to specify custom retention periods