Databricks Certified Associate Developer for Apache Spark

Databricks Certified Associate Developer for Apache Spark

Get started today

Ultimate access to all questions.


Which of the following operations does not return a DataFrame with distinct (non-duplicate) rows?





Explanation:

The question asks which operation fails to return a DataFrame with no duplicate rows. The methods DataFrame.dropDuplicates(), DataFrame.distinct(), DataFrame.drop_duplicates(), and DataFrame.drop_duplicates(subset = None) are all valid and will return a DataFrame with no duplicate rows by considering all columns. However, DataFrame.drop_duplicates(subset = 'all') is problematic because 'all' is not a valid subset parameter value; it expects a list of column names. This would cause the operation to fail or not work as intended, thus not guaranteeing a DataFrame with no duplicate rows.