Reddit

The data governance team is evaluating code for GDPR compliance regarding record deletion. The following logic is used to propagate delete requests from the user_lookup table to the user_aggregates table:

(spark.read
    .format("delta")
    .option("readChangeData", True)
    .option("startingTimestamp", '2021-08-22 00:00:00')
    .option("endingTimestamp", '2021-08-29 00:00:00')
    .table("user_lookup")
    .createOrReplaceTempView("changes"))

spark.sql("""
    DELETE FROM user_aggregates
    WHERE user_id IN (
        SELECT user_id
        FROM changes
        WHERE _change_type = 'delete'
    )
""")

(spark.read
    .format("delta")
    .option("readChangeData", True)
    .option("startingTimestamp", '2021-08-22 00:00:00')
    .option("endingTimestamp", '2021-08-29 00:00:00')
    .table("user_lookup")
    .createOrReplaceTempView("changes"))

spark.sql("""
    DELETE FROM user_aggregates
    WHERE user_id IN (
        SELECT user_id
        FROM changes
        WHERE _change_type = 'delete'
    )
""")

Assuming user_id is a unique key and all users requesting deletion have been removed from user_lookup, does successfully executing this logic ensure that the records deleted from user_aggregates are no longer accessible? Explain why._

Exam-Like

No; the Delta Lake DELETE command only provides ACID guarantees when combined with the MERGE INTO command.

7.2%

No; files containing deleted records may still be accessible with time travel until a VACUUM command is used to remove invalidated data files.

63.2%

Yes; the change data feed uses foreign keys to ensure delete consistency throughout the Lakehouse.

8.8%

Yes; Delta Lake ACID guarantees provide assurance that the DELETE command succeeded fully and permanently purged these records.

16.0%

No; the change data feed only tracks inserts and updates, not deleted records.

4.8%

Databricks Certified Data Engineer - Professional

Comments

Get started today