
Answer-first summary for fast verification
Answer: The VACUUM command was run on the table
## Explanation The correct answer is **A. The VACUUM command was run on the table**. ### Why VACUUM causes this issue: 1. **VACUUM command purpose**: The `VACUUM` command in Delta Lake is used to physically delete data files that are no longer referenced by the Delta table and are older than the retention threshold. 2. **Time travel retention**: By default, Delta tables retain data files for 7 days to enable time travel. However, when `VACUUM` is run, it removes files that are older than the retention period (default is 7 days, but can be configured differently). 3. **The scenario**: In this case, the data engineer is trying to restore to a version that is 3 days old. If `VACUUM` was run recently, it may have cleaned up the data files associated with that 3-day-old version, making time travel impossible. ### Why other options are incorrect: - **B. The TIME TRAVEL command was run on the table**: There is no `TIME TRAVEL` command in Delta Lake. Time travel is achieved using SQL syntax like `SELECT * FROM table_name VERSION AS OF version_number` or `RESTORE TABLE table_name TO VERSION AS OF version_number`. - **C. The DELETE HISTORY command was run on the table**: There is no `DELETE HISTORY` command in Delta Lake. History information is managed automatically by Delta Lake. - **D. The OPTIMIZE command was run on the table**: The `OPTIMIZE` command compacts small files into larger ones to improve query performance, but it does not delete data files needed for time travel. It creates new files and marks old ones for deletion, but they remain available for time travel until cleaned by `VACUUM`. - **E. The HISTORY command was run on the table**: The `DESCRIBE HISTORY` command only displays the history of operations on a table; it does not delete any data files. ### Key Takeaway: To prevent this issue, data engineers should: 1. Configure appropriate retention periods using `spark.databricks.delta.retentionDurationCheck.enabled` 2. Be cautious when running `VACUUM` on tables where time travel might be needed 3. Consider increasing the retention period for critical tables 4. Use `VACUUM table_name RETAIN 168 HOURS` (or longer) to specify custom retention periods
Author: Keng Suppaseth
Ultimate access to all questions.
No comments yet.
A data engineer has realized that they made a mistake when making a daily update to a table. They need to use Delta time travel to restore the table to a version that is 3 days old. However, when the data engineer attempts to time travel to the older version, they are unable to restore the data because the data files have been deleted.
Which of the following explains why the data files are no longer present?
A
The VACUUM command was run on the table
B
The TIME TRAVEL command was run on the table
C
The DELETE HISTORY command was run on the table
D
The OPTIMIZE command was run on the table
E
The HISTORY command was run on the table