Databricks Certified Data Engineer - Professional

Ultimate access to all questions.

The data engineering team aims to construct a pipeline that processes customer data via a Change Data Capture (CDC) feed from a source system. This CDC feed includes both the data records and metadata, indicating actions like insertions, updates, or deletions, alongside a timestamp column (`update_time`) that orders these changes. Each record is uniquely identified by a `customer_id`. Given that a single batch may contain multiple changes for the same customer with different `update_time` values, the team's goal is to store only the most recent information per customer in a target Delta Lake table. Which solution best fulfills these requirements?

Real Exam

Enable Delta Lake's Change Data Feed (CDF) on the target table to automatically merge the received CDC feed

13.6%

Use the dropDuplicates function to remove duplicates by customer_id, then merge the duplicate records into the table

Loading comments...

Use MERGE INTO with SEQUENCE BY clause on the update_time for ordering how operations should be applied

17.8%

Use MERGE INTO to upsert the most recent entry for each customer_id into the table

53.9%