
Answer-first summary for fast verification
Answer: Use MERGE to conditionally insert or update records.
Answer: C — Use MERGE to conditionally insert or update records. Explanations for each option A: CREATE OR REPLACE TABLE to replace the entire table with new data. Incorrect. CREATE OR REPLACE TABLE drops and recreates the table (or replaces its contents). That will remove existing rows and replace them with the supplied data — it does not perform a selective upsert and can destroy historical data. It does not prevent duplicates; it simply replaces the entire table. B: INSERT OVERWRITE to selectively overwrite partitions. Incorrect for the stated goal. INSERT OVERWRITE replaces the data in the specified partitions (or entire table if no partition is specified). That is destructive for those partitions and not an upsert: it will not merge new rows with existing rows outside the overwritten partitions and thus is not the right mechanism when you want to add only new records while preserving and updating existing ones. C: MERGE to conditionally insert or update records. Correct. MERGE (Delta Lake MERGE INTO) lets you provide a match condition (typically on a business key) and then specify WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT actions. That enables upserts — updating existing rows and inserting only new rows — which prevents duplicate records when the match key is chosen correctly. Example pattern: MERGE INTO target t USING source s ON t.id = s.id WHEN MATCHED THEN UPDATE SET ... WHEN NOT MATCHED THEN INSERT ... Using MERGE you can also handle deletes or multiple conditional branches, making it the standard way in Delta Lake to avoid creating duplicate records during loads. D: COPY INTO to load data from external sources without duplication. Incorrect (common misconception). COPY INTO is a fast, file-based bulk load command to ingest data files into a table, but it does not itself perform conditional upserts against existing table rows. COPY INTO will load the data provided; it doesn’t inherently deduplicate against existing rows in the target table. To avoid duplicates when using COPY INTO you would need separate logic (e.g., staging + MERGE, or deduping after load). COPY INTO is useful for performance and file ingestion, but not sufficient by itself to prevent duplicates. Reference answer C — Use MERGE to conditionally insert or update records, because MERGE performs upserts based on a match condition and prevents inserting duplicate rows when you match on the correct business key.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.

You are designing a data pipeline where data from an external source needs to be loaded into a Delta Lake table without duplicating existing records. Which SQL command would you use for this purpose, and why is it effective in preventing data duplication?
A
Use CREATE OR REPLACE TABLE to replace the entire table with new data.
B
Use INSERT OVERWRITE to selectively overwrite partitions.
C
Use MERGE to conditionally insert or update records.
D
Use COPY INTO to load data from external sources without duplication.