
Ultimate access to all questions.
A data governance team is reviewing a process for removing user data from a Databricks Lakehouse to comply with GDPR requirements. The following code snippet is used to propagate deletions from a lookup_table to an aggregates_table using the Delta Lake Change Data Feed (CDF):
(spark.read
.format("delta")
.option("readChangeData", True)
.option("startingTimestamp", '2024-08-22 00:00:00')
.option("endingTimestamp", '2024-08-29 00:00:00')
.table("lookup_table")
.createOrReplaceTempView("changes"))
spark.sql("""
DELETE FROM aggregates_table
WHERE user_id IN (
SELECT user_id
FROM changes
WHERE change_type='delete'
)
""")
(spark.read
.format("delta")
.option("readChangeData", True)
.option("startingTimestamp", '2024-08-22 00:00:00')
.option("endingTimestamp", '2024-08-29 00:00:00')
.table("lookup_table")
.createOrReplaceTempView("changes"))
spark.sql("""
DELETE FROM aggregates_table
WHERE user_id IN (
SELECT user_id
FROM changes
WHERE change_type='delete'
)
""")
After this code executes successfully, will the deleted records in aggregates_table be permanently inaccessible? Why or why not?
A
No; Delta Lake’s DELETE operation requires a combined MERGE INTO command to ensure full ACID guarantees and physical removal.
B
Yes; Delta Lake’s ACID transactions guarantee that once a DELETE command is committed, the records are immediately and permanently removed from storage.
C
No; Deleted data files may still be accessible through Delta Lake’s time travel feature until a VACUUM operation is performed to purge them.
D
Yes; Utilizing the Change Data Feed with foreign key logic ensures that deletions are consistently and irrevocably applied across the Lakehouse.
E
No; The Change Data Feed implementation shown only captures inserts and updates, failing to identify the deletion events from the source table.