
Ultimate access to all questions.
In a data processing environment, you are tasked with leveraging CDF to propagate deletes effectively. Describe how you would implement this in a Spark job, including the steps to ensure that deletes are correctly propagated across downstream systems. Provide a code snippet illustrating the key parts of this implementation.
A
Manually track and update delete operations in a separate log file.
B
Use CDF to automatically track deletes and propagate them in downstream systems.
C
Ignore deletes; focus only on inserts and updates.
D
Periodically recompute the entire dataset to reflect deletes.