
Ultimate access to all questions.
A data engineer is designing an ETL workflow to handle late-arriving and potentially duplicate records from a single data source. While batch-level deduplication is feasible, the engineer needs a method to deduplicate incoming data against records already residing in the target Delta table. Which approach allows the engineer to deduplicate data against previously processed records during the insertion process?
A
Configure the table property delta.deduplicate to true.
B
Execute a VACUUM operation on the Delta table after each batch completes.
C
Utilize Delta Lake schema enforcement to prevent the insertion of duplicate records.
D
Perform a full outer join on a unique key and overwrite existing data.
E
Implement an 'insert-only' MERGE operation with a matching condition on a unique key.