
Ultimate access to all questions.
In which of the following scenarios should a data engineer use the MERGE INTO command instead of the INSERT INTO command?
A
When the location of the data needs to be changed
B
When the target table is an external table
C
When the source table can be deleted
D
When the target table cannot contain duplicate records
E
When the source is not a Delta table
Explanation:
The MERGE INTO command should be used when the target table cannot contain duplicate records. This is because MERGE allows for deduplication by matching records between source and target tables and performing conditional operations (INSERT, UPDATE, DELETE). Unlike INSERT INTO which simply appends data, MERGE can check for existing records and avoid inserting duplicates. The explanation provided in the text confirms this: "With merge, you can avoid inserting the duplicate records. The dataset containing the new logs needs to be deduplicated within itself. By the SQL semantics of merge, it matches and deduplicates the new data with the existing data in the table, but if there is duplicate data within the new data set, it is inserted." This makes MERGE particularly useful for maintaining data integrity when dealing with incremental data loads where duplicates must be prevented.