
Explanation:
The correct method to remove duplicate rows from a DataFrame in Apache Spark is dropDuplicates(). Option E uses this method. Other options are incorrect: removeDuplicates() (A) does not exist, getDistinct() (B) is not a valid method, duplicates.drop() (C) and duplicates() (D) are invalid as there is no duplicates method. Only option E is correct.
Ultimate access to all questions.
No comments yet.
Which of the following code blocks returns a new DataFrame from DataFrame storesDF with duplicate rows removed?
A
storesDF.removeDuplicates()
B
storesDF.getDistinct()
C
storesDF.duplicates.drop()
D
storesDF.duplicates()
E
storesDF.dropDuplicates()