
Ultimate access to all questions.
The data engineering team aims to construct a pipeline that processes customer data via a Change Data Capture (CDC) feed from a source system. This CDC feed includes both the data records and metadata, indicating actions like insertions, updates, or deletions, alongside a timestamp column (update_time) that orders these changes. Each record is uniquely identified by a customer_id. Given that a single batch may contain multiple changes for the same customer with different update_time values, the team's goal is to store only the most recent information per customer in a target Delta Lake table. Which solution best fulfills these requirements?_
A
Enable Delta Lake's Change Data Feed (CDF) on the target table to automatically merge the received CDC feed
B
Use the dropDuplicates function to remove duplicates by customer_id, then merge the duplicate records into the table_
C
Use MERGE INTO with SEQUENCE BY clause on the update_time for ordering how operations should be applied_
D
Use MERGE INTO to upsert the most recent entry for each customer_id into the table_
E
Use the option mergeSchema when writing the CDC data into the table to automatically merge the changed data with its most recent schema