Databricks Certified Data Engineer - Associate

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

Which of the following commands will return records from an existing Delta table my_table where duplicates have been removed?

Exam-Like

Community

KKeng

Last updated: January 13, 2026 at 09:02

DROP DUPLICATES FROM my_table;

SELECT * FROM my_table WHERE duplicate = False;

SELECT DISTINCT * FROM my_table;

MERGE INTO my_table a USING new_records b ON a.id = b.id WHEN NOT MATCHED THEN INSERT *;

MERGE INTO my_table a USING new_records b;

Explanation:

Explanation

Let's analyze each option:

A. DROP DUPLICATES FROM my_table;

This is not a valid SQL command in Databricks/Spark SQL. There is no DROP DUPLICATES statement.

B. SELECT * FROM my_table WHERE duplicate = False;

This assumes there is a column named duplicate in the table, which is not guaranteed. This would only filter rows where a specific column has value False, not remove duplicates.

C. SELECT DISTINCT * FROM my_table; ✓

This is the correct answer. The SELECT DISTINCT * statement returns all unique rows from the table by comparing all columns. It removes duplicate rows where all column values are identical.

D. MERGE INTO my_table a USING new_records b ON a.id = b.id WHEN NOT MATCHED THEN INSERT *;

This is a MERGE/UPSERT operation that inserts new records from new_records into my_table when they don't already exist (based on id match). This doesn't remove duplicates from the existing table; it only prevents inserting duplicates from a source table.

E. MERGE INTO my_table a USING new_records b;

This is an incomplete MERGE statement missing the ON clause and WHEN conditions. It's syntactically invalid.

Key Points:

SELECT DISTINCT * is the standard SQL way to return unique rows from a table.
In Databricks Delta Lake, you can also use DROP DUPLICATES as part of a Delta Live Tables pipeline or use DELETE with window functions to remove duplicates, but for simply returning records without duplicates, SELECT DISTINCT * is the correct choice.
The question specifically asks about "returning records" (querying), not modifying the table, so options that modify the table (like MERGE) are incorrect.

Powered ByGPT-5.2

Loading comments...