
Answer-first summary for fast verification
Answer: No; Deleted data files may still be accessible through Delta Lake’s time travel feature until a `VACUUM` operation is performed to purge them.
In Delta Lake, a `DELETE` operation is an atomic transaction that updates the table's state by marking specific data files as "removed" (tombstoned) in the transaction log and creating new files if necessary. However, the original data files containing the deleted records remain in underlying storage to support **Time Travel**. To comply with GDPR's "right to be forgotten" and ensure the data is physically and permanently removed, a `VACUUM` command must be executed. `VACUUM` deletes files that are no longer referenced by the current table version and are older than the retention threshold (the default is 168 hours or 7 days). * **Why C is correct**: It correctly identifies that while rows are removed from the current snapshot, physical files remain accessible via time travel until `VACUUM` runs. * **Why A is incorrect**: `DELETE` is fully ACID-compliant on its own; `MERGE` is not a requirement for atomic deletions. * **Why B is incorrect**: ACID guarantees atomicity and consistency, but Delta's architecture specifically preserves files for versioning unless purged. * **Why D is incorrect**: CDF identifies changes but does not manage the physical storage lifecycle of those changes. * **Why E is incorrect**: Delta CDF specifically captures four types of changes: `insert`, `update_preimage`, `update_postimage`, and `delete`.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data governance team is reviewing a process for removing user data from a Databricks Lakehouse to comply with GDPR requirements. The following code snippet is used to propagate deletions from a lookup_table to an aggregates_table using the Delta Lake Change Data Feed (CDF):
(spark.read
.format("delta")
.option("readChangeData", True)
.option("startingTimestamp", '2024-08-22 00:00:00')
.option("endingTimestamp", '2024-08-29 00:00:00')
.table("lookup_table")
.createOrReplaceTempView("changes"))
spark.sql("""
DELETE FROM aggregates_table
WHERE user_id IN (
SELECT user_id
FROM changes
WHERE change_type='delete'
)
""")
(spark.read
.format("delta")
.option("readChangeData", True)
.option("startingTimestamp", '2024-08-22 00:00:00')
.option("endingTimestamp", '2024-08-29 00:00:00')
.table("lookup_table")
.createOrReplaceTempView("changes"))
spark.sql("""
DELETE FROM aggregates_table
WHERE user_id IN (
SELECT user_id
FROM changes
WHERE change_type='delete'
)
""")
After this code executes successfully, will the deleted records in aggregates_table be permanently inaccessible? Why or why not?
A
No; Delta Lake’s DELETE operation requires a combined MERGE INTO command to ensure full ACID guarantees and physical removal.
B
Yes; Delta Lake’s ACID transactions guarantee that once a DELETE command is committed, the records are immediately and permanently removed from storage.
C
No; Deleted data files may still be accessible through Delta Lake’s time travel feature until a VACUUM operation is performed to purge them.
D
Yes; Utilizing the Change Data Feed with foreign key logic ensures that deletions are consistently and irrevocably applied across the Lakehouse.
E
No; The Change Data Feed implementation shown only captures inserts and updates, failing to identify the deletion events from the source table.