
Answer-first summary for fast verification
Answer: MERGE, because it allows you to update the target table with the latest changes from the source table, including inserts, updates, and deletes, while maintaining existing records and avoiding duplication, making it efficient for incremental processing.
Option C is the correct answer because the MERGE command is designed to handle incremental data processing efficiently. It supports inserts, updates, and deletes in the target table based on the source data, ensuring the target table is always up-to-date without duplicating data. This approach is both cost-effective and scalable for large volumes of data. Option A is incorrect because recreating the entire table each time is not efficient or scalable. Option B is incorrect because overwriting the table does not handle incremental updates or deletes efficiently. Option D is incorrect because COPY INTO does not support deletes or updates natively, making it unsuitable for ensuring the target table reflects all latest changes from the source.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are designing a data pipeline in Azure Databricks to incrementally process data from a source table to a target table. The pipeline must ensure that the target table always reflects the latest changes from the source table, including updates, inserts, and deletes, without duplicating data. Additionally, the solution must be cost-effective and scalable to handle large volumes of data. Given these requirements, which command should you use and why? (Choose one option.)
A
CREATE OR REPLACE TABLE, because it allows you to create a new table or replace an existing table with the latest data, ensuring no duplication but at the cost of recreating the entire table each time.
B
INSERT OVERWRITE, because it allows you to overwrite the target table with the latest data from the source table in a single operation, which is simple but does not efficiently handle incremental updates.
C
MERGE, because it allows you to update the target table with the latest changes from the source table, including inserts, updates, and deletes, while maintaining existing records and avoiding duplication, making it efficient for incremental processing.
D
COPY INTO, because it efficiently loads data into the target table without duplication and can be used for incremental updates, but it does not natively support deletes or updates in the target table.