
Answer-first summary for fast verification
Answer: from pyspark.sql.functions import mode result = df.select(mode('category')) print(A)
The correct approach to compute the mode of a categorical column is to use the 'mode' function from the 'pyspark.sql.functions' module. Option A does this correctly. Option B is incorrect because it calculates the mode by grouping the data and counting the occurrences, but it does not return the mode value. Option C is incorrect because it uses the wrong syntax for the 'groupBy' and 'orderBy' methods. Option D is incorrect because the 'agg' method is not used for computing the mode.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are given a Spark DataFrame 'df' with a categorical column 'category'. Write a code snippet that computes the mode (most frequently occurring value) of the 'category' column using the 'mode' function, and explain the steps involved.
A
from pyspark.sql.functions import mode result = df.select(mode('category')) print(A)
B
result = df.groupBy('category').count().orderBy('count', ascending=False).first()[0] print(B)
C
result = df.category.groupBy().count().orderBy(count(), ascending=False).first() print(C)
D
result = df.category.agg({'category': 'mode'}) print(D)
No comments yet.