
Answer-first summary for fast verification
Answer: Perform an insert-only merge with a matching condition on a unique key.
The correct approach is **C** because performing an insert-only merge (using `MERGE` command) with a matching condition on a unique key ensures that new records are only inserted if they do not already exist in the Delta table. This effectively deduplicates against previously processed records. - **A** is incorrect because schema enforcement checks data types and structure, not duplicates. - **B** (`VACUUM`) manages file storage and does not address deduplication. - **D** suggests a full outer join and overwrite, which is inefficient and risky compared to the atomic `MERGE` operation.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Which approach enables a data engineer to deduplicate incoming records against previously processed data when inserting into a Delta table, in addition to handling intra-batch deduplication for late-arriving records?
A
Rely on Delta Lake schema enforcement to prevent duplicate records.
B
VACUUM the Delta table after each batch completes.
C
Perform an insert-only merge with a matching condition on a unique key.
D
Perform a full outer join on a unique key and overwrite existing data.
No comments yet.