
Ultimate access to all questions.
In a real-time data processing scenario using Delta Lake, you are tasked with efficiently handling updates and deletes by leveraging the Change Data Feed (CDF). The solution must comply with the following constraints: minimize operational overhead, ensure data integrity, and support scalability. Considering these constraints, which of the following approaches BEST utilizes the CDF to achieve these goals? (Choose one option)
A
Implement a batch process that periodically scans the entire dataset to identify changes, marks records for deletion, and then performs a delete operation, ensuring data integrity at the cost of increased operational overhead.
B
Use the merge method to perform an upsert operation with a deletion flag, followed by a separate delete operation to remove the flagged records, balancing operational overhead and data integrity.
C
Leverage the merge method to perform an upsert operation with a deletion flag within a single transaction, efficiently handling updates and deletes while minimizing operational overhead and ensuring data integrity.
D
Directly delete records from the Delta table without using the CDF, relying on external logging for change tracking, which minimizes operational overhead but risks data integrity.