Ultimate access to all questions.
Which approach enables a data engineer to deduplicate incoming records against previously processed data when inserting into a Delta table, in addition to handling intra-batch deduplication for late-arriving records?
Explanation:
The correct approach is C because performing an insert-only merge (using MERGE
command) with a matching condition on a unique key ensures that new records are only inserted if they do not already exist in the Delta table. This effectively deduplicates against previously processed records.
VACUUM
) manages file storage and does not address deduplication.MERGE
operation.