Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
How can you ensure that only new files are processed in Delta Lake using the COPY INTO statement for incremental data loading to avoid duplicates?
A
Delta Lake automatically ignores files that have been previously loaded, requiring no additional configuration.
B
Manually track processed files in a separate Delta table and filter them out in the COPY INTO command.
C
Utilize the IGNORE_EXISTING option to skip over files that have already been processed.
D
Implement a custom Spark function to compare the contents of incoming files with existing data before loading.