
Ultimate access to all questions.
A data engineer is architecting a pipeline that must handle late-arriving records containing potential duplicates. Beyond deduplicating within each incoming batch, which strategy should be used to ensure that records already stored in a target Delta table are not duplicated during ingestion?
A
Perform a full outer join between the incoming batch and the target table on a unique key, followed by a full table overwrite.
B
Enable Delta Lake schema enforcement to automatically identify and block records with duplicate keys.
C
Utilize a MERGE INTO operation with a WHEN NOT MATCHED THEN INSERT clause based on a unique identifier.
D
Execute the VACUUM command on the Delta table after every batch to remove redundant data entries from the transaction log.