Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


In a real-time data processing scenario using Delta Lake, you are tasked with efficiently handling updates and deletes by leveraging the Change Data Feed (CDF). The solution must comply with the following constraints: minimize operational overhead, ensure data integrity, and support scalability. Considering these constraints, which of the following approaches BEST utilizes the CDF to achieve these goals? (Choose one option)




Explanation:

The BEST approach is to leverage the merge method to perform an upsert operation with a deletion flag within a single transaction. This method efficiently handles updates and deletes by utilizing the CDF, ensuring data integrity and minimizing operational overhead. Option A is incorrect because it introduces unnecessary operational overhead by scanning the entire dataset periodically. Option B is incorrect because performing a separate delete operation after the upsert is redundant and increases operational overhead. Option D is incorrect because it bypasses the CDF, risking data integrity and not leveraging Delta Lake's capabilities for efficient change data capture.