
Explanation:
Correct Choice: Use a MERGE INTO operation with a WHEN NOT MATCHED clause based on a unique key.
MERGE INTO statement allows for complex conditional logic within a single transaction. By specifying a WHEN NOT MATCHED THEN INSERT clause based on a primary or unique key, the system checks incoming records against the target table. If the key already exists, the record is ignored; if not, it is inserted. This ensures that late-arriving duplicates or re-run batches do not result in duplicate entries in the target table.Why other options are incorrect:
VACUUM command is used for data retention and storage optimization by deleting old data files that are no longer in the current table state. It has no functionality related to data deduplication.MERGE operation.Ultimate access to all questions.
No comments yet.
A data engineer is designing a pipeline to handle late-arriving and duplicate records. Beyond de-duplicating data within the current micro-batch, which technique effectively prevents duplicate records from being inserted into an existing Delta table by checking against previously processed data?
A
Perform a full outer join on a unique key and overwrite the entire target table with the result.
B
Enable Delta Lake schema enforcement to block duplicate records during the write operation.
C
Use a MERGE INTO operation with a WHEN NOT MATCHED clause based on a unique key.
D
Execute the VACUUM command on the Delta table after each batch to remove stale duplicates.