
Ultimate access to all questions.
A data engineer is designing a pipeline to handle late-arriving and duplicate records. Beyond de-duplicating data within the current micro-batch, which technique effectively prevents duplicate records from being inserted into an existing Delta table by checking against previously processed data?
A
Perform a full outer join on a unique key and overwrite the entire target table with the result.
B
Enable Delta Lake schema enforcement to block duplicate records during the write operation.
C
Use a MERGE INTO operation with a WHEN NOT MATCHED clause based on a unique key.
D
Execute the VACUUM command on the Delta table after each batch to remove stale duplicates.