
Explanation:
Executing a DELETE statement in Delta Lake removes the specified rows from the current version of the table. However, it does not physically delete the underlying Parquet files from cloud storage immediately. This architecture is what enables Delta Lake's Time Travel feature, allowing users to query older versions of the table (e.g., SELECT * FROM users VERSION AS OF <ID>).
To ensure the data is permanently and physically removed from storage (for example, to comply with GDPR/CCPR 'Right to be Forgotten' requests), a VACUUM command must be executed. VACUUM removes data files that are no longer referenced by the current snapshot of the table and are older than the retention period (default 168 hours or 7 days).
Ultimate access to all questions.
No comments yet.
A data privacy team is auditing a process for deleting user records from a Delta Lake table named users. The process uses a secondary table, delete_requests_table, to identify the records to be purged.
Consider the following SQL command:
DELETE FROM users
WHERE user_id IN (SELECT user_id FROM delete_requests_table)
DELETE FROM users
WHERE user_id IN (SELECT user_id FROM delete_requests_table)
Assuming user_id is a unique identifier, does successfully executing this code guarantee that the data for these users is permanently inaccessible from the storage layer? Why or why not?
A
Yes, because Delta Lake’s ACID transactions ensure that once a deletion is committed, the data is immediately and completely purged from the underlying storage.
B
No, because the Delta cache may continue to serve the deleted data to users for a period of time before the cache is automatically refreshed.
C
No, because Delta Lake supports time travel. The deleted data remains in the underlying Parquet files and can still be accessed via historical version queries until a VACUUM command is run.
D
No, because the DELETE operation in Delta Lake is only transactionally safe if it is executed as part of a MERGE INTO statement.
E
No, because Delta Lake only marks the rows as deleted in the transaction log; the data remains accessible to anyone with read permissions on the storage container regardless of time travel settings.