
Ultimate access to all questions.
The data governance team is reviewing code used for deleting records for compliance with GDPR. The following logic has been implemented to propagate delete requests from the user_lookup table to the user_aggregates table.
(spark.read
.format("delta")
.option("readChangeData", True)
.option("startingTimestamp", '2021-08-22 00:00:00')
.option("endingTimestamp", '2021-08-29 00:00:00')
.table("user_lookup")
.createOrReplaceTempView("changes"))
(spark.read
.format("delta")
.option("readChangeData", True)
.option("startingTimestamp", '2021-08-22 00:00:00')
.option("endingTimestamp", '2021-08-29 00:00:00')
.table("user_lookup")
.createOrReplaceTempView("changes"))
spark.sql("""
DELETE FROM user_aggregates
WHERE user_id IN (
SELECT user_id
FROM changes
WHERE _change_type='delete'
)
""")
spark.sql("""
DELETE FROM user_aggregates
WHERE user_id IN (
SELECT user_id
FROM changes
WHERE _change_type='delete'
)
""")
Assuming that user_id is a unique identifying key and that all users that have requested deletion have been removed from the user_lookup table, which statement describes whether successfully executing the above logic guarantees that the records to be deleted from the user_aggregates table are no longer accessible and why?
A
No; the Delta Lake DELETE command only provides ACID guarantees when combined with the MERGE INTO command.
B
No; files containing deleted records may still be accessible with time travel until a VACUUM command is used to remove invalidated data files.
C
Yes; the change data feed uses foreign keys to ensure delete consistency throughout the Lakehouse.
D
Yes; Delta Lake ACID guarantees provide assurance that the DELETE command succeeded fully and permanently purged these records.
E
No; the change data feed only tracks inserts and updates, not deleted records.
Explanation:
Correct Answer: B
Why B is correct:
Delta Lake's Time Travel Feature: Delta Lake maintains a transaction log that enables time travel functionality. When records are deleted using the DELETE command, they are marked as deleted in the transaction log but the actual data files containing those records are not immediately removed from storage.
Data Accessibility via Time Travel: Deleted records remain accessible through time travel queries until the data files are physically removed by running the VACUUM command. This means that even after a successful DELETE operation, the data can still be accessed by:
SELECT * FROM table VERSION AS OF <version> or SELECT * FROM table TIMESTAMP AS OF <timestamp>GDPR Compliance Implications: For GDPR compliance, organizations need to ensure that personal data is truly inaccessible, not just logically deleted. The VACUUM command must be executed with appropriate retention settings to physically remove the data files containing deleted records.
Change Data Feed (CDF) Context: The code uses CDF to read delete operations from the user_lookup table, which correctly identifies records marked for deletion. However, the DELETE operation on user_aggregates only logically removes records - the physical data files remain until VACUUM.
Why other options are incorrect:
_change_type='delete'), so this statement is factually wrong.Key Takeaway: For GDPR compliance, organizations must not only execute DELETE operations but also manage data retention policies and run VACUUM to ensure deleted data is physically inaccessible.