
Ultimate access to all questions.
A data privacy team is auditing a process for deleting user records from a Delta Lake table named users. The process uses a secondary table, delete_requests_table, to identify the records to be purged.
Consider the following SQL command:
DELETE FROM users
WHERE user_id IN (SELECT user_id FROM delete_requests_table)
DELETE FROM users
WHERE user_id IN (SELECT user_id FROM delete_requests_table)
Assuming user_id is a unique identifier, does successfully executing this code guarantee that the data for these users is permanently inaccessible from the storage layer? Why or why not?_
A
Yes, because Delta Lake’s ACID transactions ensure that once a deletion is committed, the data is immediately and completely purged from the underlying storage.
B
No, because the Delta cache may continue to serve the deleted data to users for a period of time before the cache is automatically refreshed.
C
No, because Delta Lake supports time travel. The deleted data remains in the underlying Parquet files and can still be accessed via historical version queries until a VACUUM command is run.
D
No, because the DELETE operation in Delta Lake is only transactionally safe if it is executed as part of a MERGE INTO statement.
E
No, because Delta Lake only marks the rows as deleted in the transaction log; the data remains accessible to anyone with read permissions on the storage container regardless of time travel settings.