Databricks Certified Data Engineer - Professional

Ultimate access to all questions.

A data engineer is architecting a pipeline that must handle late-arriving records containing potential duplicates. Beyond deduplicating within each incoming batch, which strategy should be used to ensure that records already stored in a target Delta table are not duplicated during ingestion?

Real Exam

Last updated: January 6, 2026 at 15:39

Perform a full outer join between the incoming batch and the target table on a unique key, followed by a full table overwrite.

0.0%

Enable Delta Lake schema enforcement to automatically identify and block records with duplicate keys.

0.0%

Loading comments...

Utilize a MERGE INTO operation with a WHEN NOT MATCHED THEN INSERT clause based on a unique identifier.

100.0%

Execute the VACUUM command on the Delta table after every batch to remove redundant data entries from the transaction log.