Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

You are designing a data pipeline in Azure Databricks to incrementally process data from a source table to a target table. The pipeline must ensure that only new or updated records are inserted into the target table, while maintaining all existing records that have not changed. Additionally, the solution must be cost-effective and scalable to handle large volumes of data. Which of the following commands should you use to achieve this requirement, and why? (Choose one option.)

Simulated

CREATE OR REPLACE TABLE, because it allows you to create a new table or replace an existing table with new data, ensuring a fresh start for each pipeline run.

3.6%

INSERT OVERWRITE, because it allows you to overwrite the target table with new data from the source, which is efficient for full refreshes but does not support incremental updates.

Comments

Loading comments...

MERGE, because it enables you to insert new or updated records into the target table while keeping unchanged records intact, making it ideal for incremental data processing.

78.0%