
Ultimate access to all questions.
The data governance team is evaluating code for GDPR compliance regarding record deletion. The following logic is used to propagate delete requests from the user_lookup table to the user_aggregates table:
(spark.read
.format("delta")
.option("readChangeData", True)
.option("startingTimestamp", '2021-08-22 00:00:00')
.option("endingTimestamp", '2021-08-29 00:00:00')
.table("user_lookup")
.createOrReplaceTempView("changes"))
spark.sql("""
DELETE FROM user_aggregates
WHERE user_id IN (
SELECT user_id
FROM changes
WHERE _change_type = 'delete'
)
""")
(spark.read
.format("delta")
.option("readChangeData", True)
.option("startingTimestamp", '2021-08-22 00:00:00')
.option("endingTimestamp", '2021-08-29 00:00:00')
.table("user_lookup")
.createOrReplaceTempView("changes"))
spark.sql("""
DELETE FROM user_aggregates
WHERE user_id IN (
SELECT user_id
FROM changes
WHERE _change_type = 'delete'
)
""")
Assuming user_id is a unique key and all users requesting deletion have been removed from user_lookup, does successfully executing this logic ensure that the records deleted from user_aggregates are no longer accessible? Explain why._
A
No; the Delta Lake DELETE command only provides ACID guarantees when combined with the MERGE INTO command.
B
No; files containing deleted records may still be accessible with time travel until a VACUUM command is used to remove invalidated data files.
C
Yes; the change data feed uses foreign keys to ensure delete consistency throughout the Lakehouse.
D
Yes; Delta Lake ACID guarantees provide assurance that the DELETE command succeeded fully and permanently purged these records.
E
No; the change data feed only tracks inserts and updates, not deleted records.