
Explanation:
The correct answer is determined by understanding how the groupBy method works in Spark. To count rows for each combination of columns, groupBy must receive the column names as separate strings. Option C correctly passes the column names as individual string arguments. Option A uses a Seq of Column objects, which is invalid syntax. Option B references undefined variables. Option D chains groupBy incorrectly, and Option E uses a Seq of strings, which isn't the correct parameter type.
Ultimate access to all questions.
No comments yet.
Which of the following code blocks correctly counts the number of rows in DataFrame storesDF for each unique combination of values in the columns division and storeCategory?
A
storesDF.groupBy(Seq(col(“division”), col(“storeCategory”))).count()
B
storesDF.groupBy(division, storeCategory).count()
C
storesDF.groupBy(“division”, “storeCategory”).count()
D
storesDF.groupBy(“division”).groupBy(“StoreCategory”).count()
E
storesDF.groupBy(Seq(“division”, “storeCategory”)).count()