Databricks Certified Data Engineer - Professional

Ultimate access to all questions.

A data engineer is examining the `distinct()` and `dropDuplicates()` methods in Spark for de-duplicating a DataFrame. Which statement accurately describes the use of these methods for de-duplication?

Real Exam

The distinct() method can be used to remove duplicates based on specific columns by passing column names as arguments.

8.9%

In Databricks, the distinct() method is deprecated, leaving dropDuplicates() as the only supported method for de-duplication.

10.5%

Loading comments...

The methods dropDuplicates() and drop_duplicates() are interchangeable, as per the official Spark documentation.

Both distinct() and dropDuplicates() methods allow for the removal of duplicates based on specific columns.

21.6%

The dropDuplicates() method is restricted to RDDs, while the distinct() method is exclusively for DataFrames.

14.2%