
Explanation:
In Delta Lake, a DELETE operation is an atomic transaction that updates the table's state by marking specific data files as "removed" (tombstoned) in the transaction log and creating new files if necessary. However, the original data files containing the deleted records remain in underlying storage to support Time Travel.
To comply with GDPR's "right to be forgotten" and ensure the data is physically and permanently removed, a VACUUM command must be executed. VACUUM deletes files that are no longer referenced by the current table version and are older than the retention threshold (the default is 168 hours or 7 days).
VACUUM runs.DELETE is fully ACID-compliant on its own; MERGE is not a requirement for atomic deletions.insert, update_preimage, update_postimage, and delete.Ultimate access to all questions.
No comments yet.
A data governance team is reviewing a process for removing user data from a Databricks Lakehouse to comply with GDPR requirements. The following code snippet is used to propagate deletions from a lookup_table to an aggregates_table using the Delta Lake Change Data Feed (CDF):
(spark.read
.format("delta")
.option("readChangeData", True)
.option("startingTimestamp", '2024-08-22 00:00:00')
.option("endingTimestamp", '2024-08-29 00:00:00')
.table("lookup_table")
.createOrReplaceTempView("changes"))
spark.sql("""
DELETE FROM aggregates_table
WHERE user_id IN (
SELECT user_id
FROM changes
WHERE change_type='delete'
)
""")
(spark.read
.format("delta")
.option("readChangeData", True)
.option("startingTimestamp", '2024-08-22 00:00:00')
.option("endingTimestamp", '2024-08-29 00:00:00')
.table("lookup_table")
.createOrReplaceTempView("changes"))
spark.sql("""
DELETE FROM aggregates_table
WHERE user_id IN (
SELECT user_id
FROM changes
WHERE change_type='delete'
)
""")
After this code executes successfully, will the deleted records in aggregates_table be permanently inaccessible? Why or why not?
A
No; Delta Lake’s DELETE operation requires a combined MERGE INTO command to ensure full ACID guarantees and physical removal.
B
Yes; Delta Lake’s ACID transactions guarantee that once a DELETE command is committed, the records are immediately and permanently removed from storage.
C
No; Deleted data files may still be accessible through Delta Lake’s time travel feature until a VACUUM operation is performed to purge them.
D
Yes; Utilizing the Change Data Feed with foreign key logic ensures that deletions are consistently and irrevocably applied across the Lakehouse.
E
No; The Change Data Feed implementation shown only captures inserts and updates, failing to identify the deletion events from the source table.