
Explanation:
The correct approach is C because performing an insert-only merge (using MERGE command) with a matching condition on a unique key ensures that new records are only inserted if they do not already exist in the Delta table. This effectively deduplicates against previously processed records.
VACUUM) manages file storage and does not address deduplication.MERGE operation.Ultimate access to all questions.
No comments yet.
Which approach enables a data engineer to deduplicate incoming records against previously processed data when inserting into a Delta table, in addition to handling intra-batch deduplication for late-arriving records?
A
Rely on Delta Lake schema enforcement to prevent duplicate records.
B
VACUUM the Delta table after each batch completes.
C
Perform an insert-only merge with a matching condition on a unique key.
D
Perform a full outer join on a unique key and overwrite existing data.