
Answer-first summary for fast verification
Answer: With the default 7-day data retention threshold, files containing deleted records persist until the `VACUUM` job is performed approximately 8 days after the deletion.
The correct answer is **A**. Delta Lake maintains data files that have been logically removed (e.g., via a `DELETE` statement) until they have been obsolete for the duration defined by the retention threshold, which defaults to **7 days (168 hours)**. In this scenario: 1. The deletion occurs Sunday at 1:00 AM. 2. The `VACUUM` job runs Monday at 3:00 AM (only 26 hours later). 3. Since 26 hours is less than the 7-day threshold, `VACUUM` will skip those files. 4. The files will remain in storage and be accessible via time travel until the *following* Monday's `VACUUM` run, roughly 8 days after the deletion. **Why other options are incorrect:** * **B & C:** The default retention period is 7 days, not 24 hours. Furthermore, once `VACUUM` deletes a file, it can no longer be accessed via time travel. * **D:** A `DELETE` operation only adds a `RemoveFile` action to the Delta log; it does not physically remove the Parquet files from storage immediately. * **E:** Time travel depends on the underlying files existing. Once `VACUUM` purges the files and the log retention period passes, the data cannot be recovered using time travel.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data engineering pipeline executes a batch job to delete customer records every Sunday at 1:00 AM. Subsequently, a VACUUM command is run on all Delta Lake tables every Monday at 3:00 AM. There is a concern that Delta Lake's time travel feature might still allow access to the deleted data.
Assuming the delete logic is correctly implemented, which of the following statements accurately addresses this concern?
A
With the default 7-day data retention threshold, files containing deleted records persist until the VACUUM job is performed approximately 8 days after the deletion.
B
Data files containing deleted records are retained only until the following day's VACUUM job due to the default 24-hour data retention threshold.
C
The VACUUM command permanently removes files containing deleted records, making them accessible via time travel for a maximum of 24 hours after the job finishes.
D
Delta Lake's DELETE statements ensure the permanent purging of records across all storage systems immediately upon completion of the deletion job.
E
Delta Lake time travel functionality grants full access to a table's history regardless of VACUUM operations, enabling users with admin privileges to recreate deleted records indefinitely.