
Explanation:
The correct answer is A.
Delta Lake maintains data files that have been logically removed (e.g., via a DELETE statement) until they have been obsolete for the duration defined by the retention threshold, which defaults to 7 days (168 hours).
In this scenario:
VACUUM job runs Monday at 3:00 AM (only 26 hours later).VACUUM will skip those files.VACUUM run, roughly 8 days after the deletion.Why other options are incorrect:
VACUUM deletes a file, it can no longer be accessed via time travel.DELETE operation only adds a RemoveFile action to the Delta log; it does not physically remove the Parquet files from storage immediately.VACUUM purges the files and the log retention period passes, the data cannot be recovered using time travel.Ultimate access to all questions.
No comments yet.
A data engineering pipeline executes a batch job to delete customer records every Sunday at 1:00 AM. Subsequently, a VACUUM command is run on all Delta Lake tables every Monday at 3:00 AM. There is a concern that Delta Lake's time travel feature might still allow access to the deleted data.
Assuming the delete logic is correctly implemented, which of the following statements accurately addresses this concern?
A
With the default 7-day data retention threshold, files containing deleted records persist until the VACUUM job is performed approximately 8 days after the deletion.
B
Data files containing deleted records are retained only until the following day's VACUUM job due to the default 24-hour data retention threshold.
C
The VACUUM command permanently removes files containing deleted records, making them accessible via time travel for a maximum of 24 hours after the job finishes.
D
Delta Lake's DELETE statements ensure the permanent purging of records across all storage systems immediately upon completion of the deletion job.
E
Delta Lake time travel functionality grants full access to a table's history regardless of VACUUM operations, enabling users with admin privileges to recreate deleted records indefinitely.