
Ultimate access to all questions.
You have a Delta Lake table named customer_purchases with duplicate records based on customer_id. Your goal is to deduplicate these records by retaining only the latest purchase record for each customer, determined by purchase_date. How would you use the MERGE statement in Databricks SQL to accomplish this task?_
A
MERGE INTO customer_purchases AS targetUSING customer_purchases AS sourceON target.customer_id = source.customer_idWHEN MATCHED AND target.purchase_date < source.purchase_date THEN DELETE;
B
MERGE INTO customer_purchases AS targetUSING ( SELECT customer_id, purchase_date FROM customer_purchases WHERE purchase_date = (SELECT MAX(purchase_date) FROM customer_purchases GROUP BY customer_id)) AS sourceON target.customer_id = source.customer_idWHEN MATCHED THEN UPDATE SET target.* = source.*WHEN NOT MATCHED THEN INSERT ;
C
MERGE INTO customer_purchases AS targetUSING ( SELECT customer_id, MAX(purchase_date) AS latest_purchase_date FROM customer_purchases GROUP BY customer_id) AS sourceON target.customer_id = source.customer_idWHEN MATCHED AND target.purchase_date < source.latest_purchase_date THEN DELETEWHEN NOT MATCHED THEN INSERT ;
D
MERGE INTO customer_purchases AS targetUSING ( SELECT customer_id, MAX(purchase_date) AS latest_purchase_date FROM customer_purchases GROUP BY customer_id) AS sourceON target.customer_id = source.customer_id AND target.purchase_date < source.latest_purchase_dateWHEN MATCHED THEN DELETE;