
Answer-first summary for fast verification
Answer: pyspark.sql.DataFrame.dropDuplicates
The `pyspark.sql.DataFrame.dropDuplicates` function is the correct choice for returning a new DataFrame that excludes duplicate rows. It offers the flexibility to consider only certain columns when determining duplicates. For more details, refer to the [official documentation](https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.dropDuplicates.html).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In the context of PySpark, which function is designed to generate a new DataFrame by eliminating duplicate rows, with the option to consider only specific columns for identifying duplicates?
A
pyspark.sql.DataFrame.drop
B
pyspark.sql.DataFrame.distinct
C
pyspark.sql.DataFrame.dropDuplicates
D
pyspark.sql.DataFrame.na.drop
E
pyspark.sql.DataFrame.dropna
No comments yet.