
Answer-first summary for fast verification
Answer: Use CDF to automatically track deletes and propagate them in downstream systems.
Leveraging CDF for deletes involves enabling CDF on the Delta Lake table and then using the change data to identify delete operations. The Spark job can filter these deletes and propagate them to downstream systems, ensuring data consistency. This approach is efficient and leverages the native capabilities of CDF to handle deletes.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In a data processing environment, you are tasked with leveraging CDF to propagate deletes effectively. Describe how you would implement this in a Spark job, including the steps to ensure that deletes are correctly propagated across downstream systems. Provide a code snippet illustrating the key parts of this implementation.
A
Manually track and update delete operations in a separate log file.
B
Use CDF to automatically track deletes and propagate them in downstream systems.
C
Ignore deletes; focus only on inserts and updates.
D
Periodically recompute the entire dataset to reflect deletes.