
Answer-first summary for fast verification
Answer: [assessPerformance(row) for row in storesDF.collect()]
The correct answer is D because it uses the collect() method to retrieve all rows from the DataFrame as a list of Row objects and then applies the assessPerformance() function to each row in a list comprehension. This is the proper way to apply a function to each row of a DataFrame in PySpark. Option A is incorrect because it only processes the first three rows. Option B is incorrect because it does not pass the row to the function. Option C is incorrect because it uses an invalid method apply() on a list. Option E is incorrect because you cannot iterate over a DataFrame directly in PySpark.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Which of the following code blocks applies the assessPerformance() function to each row of the DataFrame storesDF?
A
[assessPerformance(row) for row in storesDF.take(3)]
B
[assessPerformance() for row in storesDF]
C
storesDF.collect().apply(lambda: assessPerformance)
D
[assessPerformance(row) for row in storesDF.collect()]
E
[assessPerformance(row) for row in storesDF]
No comments yet.