Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

You are designing a data pipeline in Azure Databricks to incrementally process data from a source table to a target table. The pipeline must ensure that the target table always reflects the latest changes from the source table, including updates, inserts, and deletes, without duplicating data. Additionally, the solution must be cost-effective and scalable to handle large volumes of data. Given these requirements, which command should you use and why? (Choose one option.)

Simulated

CREATE OR REPLACE TABLE, because it allows you to create a new table or replace an existing table with the latest data, ensuring no duplication but at the cost of recreating the entire table each time.

5.0%

INSERT OVERWRITE, because it allows you to overwrite the target table with the latest data from the source table in a single operation, which is simple but does not efficiently handle incremental updates.

Comments

Loading comments...

MERGE, because it allows you to update the target table with the latest changes from the source table, including inserts, updates, and deletes, while maintaining existing records and avoiding duplication, making it efficient for incremental processing.

76.7%

COPY INTO, because it efficiently loads data into the target table without duplication and can be used for incremental updates, but it does not natively support deletes or updates in the target table.

10.3%