Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

In a scenario where you are working with a large dataset in a source table that contains duplicates, and you need to ensure that the data written to a target table is deduplicated to meet compliance and scalability requirements. Considering the need for efficiency and the ability to handle large volumes of data, which command should you use and why? Choose the best option from the following:

Simulated

CREATE OR REPLACE TABLE, because it allows you to create a new table with the deduplicated data, but it requires additional steps to ensure data integrity and does not inherently deduplicate data.

5.9%

INSERT OVERWRITE, because it allows you to overwrite the target table with the deduplicated data, but it lacks the capability to deduplicate data during the insertion process.

Comments

Loading comments...

MERGE, because it is specifically designed to handle deduplication by combining data from two tables and writing the result to a target table, efficiently removing duplicates in the process.

71.4%