
Answer-first summary for fast verification
Answer: storesDF.dropDuplicates()
The correct method to remove duplicate rows from a DataFrame in Apache Spark is `dropDuplicates()`. Option E uses this method. Other options are incorrect: `removeDuplicates()` (A) does not exist, `getDistinct()` (B) is not a valid method, `duplicates.drop()` (C) and `duplicates()` (D) are invalid as there is no `duplicates` method. Only option E is correct.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Which of the following code blocks returns a new DataFrame from DataFrame storesDF with duplicate rows removed?
A
storesDF.removeDuplicates()
B
storesDF.getDistinct()
C
storesDF.duplicates.drop()
D
storesDF.duplicates()
E
storesDF.dropDuplicates()
No comments yet.