
Databricks Certified Associate Developer for Apache Spark
Get started today
Ultimate access to all questions.
Which of the following operations does not return a DataFrame with distinct (non-duplicate) rows?
Which of the following operations does not return a DataFrame with distinct (non-duplicate) rows?
Exam-Like
Explanation:
The question asks which operation fails to return a DataFrame with no duplicate rows. The methods DataFrame.dropDuplicates()
, DataFrame.distinct()
, DataFrame.drop_duplicates()
, and DataFrame.drop_duplicates(subset = None)
are all valid and will return a DataFrame with no duplicate rows by considering all columns. However, DataFrame.drop_duplicates(subset = 'all')
is problematic because 'all' is not a valid subset parameter value; it expects a list of column names. This would cause the operation to fail or not work as intended, thus not guaranteeing a DataFrame with no duplicate rows.