Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

As a data engineer, you're responsible for keeping the Delta Lake table 'customer_data' synchronized with 'staging_customer_updates', which continuously receives new and updated customer profiles. Both tables use 'customer_id' as a unique identifier. Your goal is to ensure 'customer_data' reflects the latest updates without considering deletions from the source. What is the most efficient method to achieve this synchronization in Databricks?

Real Exam

Perform a complete deletion of all records in 'customer_data' followed by inserting all data from 'staging_customer_updates' to ensure the latest data is reflected.

1.9%

Employ the MERGE INTO statement to efficiently update existing records and insert new ones from 'staging_customer_updates' into 'customer_data' based on 'customer_id'.

Comments

Loading comments...

Use the APPLY CHANGES INTO statement with 'customer_id' as the key to automatically update and insert records from 'staging_customer_updates' into 'customer_data'.

27.3%