
Databricks Certified Data Engineer - Associate
Get started today
Ultimate access to all questions.
Given a scenario where you need to load data from an external source into a Delta Lake table, but the data might contain duplicates that should not be written to the table. Which SQL command would you use to ensure that only new, non-duplicate records are added to the table?
Given a scenario where you need to load data from an external source into a Delta Lake table, but the data might contain duplicates that should not be written to the table. Which SQL command would you use to ensure that only new, non-duplicate records are added to the table?
Explanation:
MERGE INTO in Delta Lake allows you to conditionally insert, update, or delete records based on a matching condition between the source and target tables. For example:
MERGE INTO target_table AS t
USING source_table AS s
ON t.id = s.id
WHEN NOT MATCHED THEN
INSERT *
Reference: From the Databricks documentation on MERGE INTO: https://docs.databricks.com/aws/en/delta/merge#data-deduplication-when-writing-into-delta-tables The MERGE INTO statement allows you to merge a set of updates, insertions, and deletions into a Delta table based on a source table. This is useful for deduplication and upserts.