
Databricks Certified Data Engineer - Professional
Get started today
Ultimate access to all questions.
In a real-time data processing scenario using Delta Lake, you are tasked with efficiently handling updates and deletes by leveraging the Change Data Feed (CDF). The solution must comply with the following constraints: minimize operational overhead, ensure data integrity, and support scalability. Considering these constraints, which of the following approaches BEST utilizes the CDF to achieve these goals? (Choose one option)
In a real-time data processing scenario using Delta Lake, you are tasked with efficiently handling updates and deletes by leveraging the Change Data Feed (CDF). The solution must comply with the following constraints: minimize operational overhead, ensure data integrity, and support scalability. Considering these constraints, which of the following approaches BEST utilizes the CDF to achieve these goals? (Choose one option)
Explanation:
The BEST approach is to leverage the merge
method to perform an upsert operation with a deletion flag within a single transaction. This method efficiently handles updates and deletes by utilizing the CDF, ensuring data integrity and minimizing operational overhead. Option A is incorrect because it introduces unnecessary operational overhead by scanning the entire dataset periodically. Option B is incorrect because performing a separate delete operation after the upsert is redundant and increases operational overhead. Option D is incorrect because it bypasses the CDF, risking data integrity and not leveraging Delta Lake's capabilities for efficient change data capture.