
Answer-first summary for fast verification
Answer: Employ the MERGE INTO statement to efficiently update existing records and insert new ones from 'staging_customer_updates' into 'customer_data' based on 'customer_id'.
The MERGE INTO statement is the recommended approach in Databricks Delta Lake for performing upserts (update + insert operations). It efficiently handles updates to existing records and the insertion of new records in a single atomic operation, making it ideal for scenarios where deletions are not a concern. Other methods, such as complete deletion and insertion, manual updates, or the non-existent APPLY CHANGES INTO statement, are either inefficient, error-prone, or invalid.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
As a data engineer, you're responsible for keeping the Delta Lake table 'customer_data' synchronized with 'staging_customer_updates', which continuously receives new and updated customer profiles. Both tables use 'customer_id' as a unique identifier. Your goal is to ensure 'customer_data' reflects the latest updates without considering deletions from the source. What is the most efficient method to achieve this synchronization in Databricks?
A
Perform a complete deletion of all records in 'customer_data' followed by inserting all data from 'staging_customer_updates' to ensure the latest data is reflected.
B
Employ the MERGE INTO statement to efficiently update existing records and insert new ones from 'staging_customer_updates' into 'customer_data' based on 'customer_id'.
C
Conduct a daily manual review of 'staging_customer_updates' and write custom SQL scripts to update or insert records into 'customer_data' as needed.
D
Use the APPLY CHANGES INTO statement with 'customer_id' as the key to automatically update and insert records from 'staging_customer_updates' into 'customer_data'.