
Answer-first summary for fast verification
Answer: result = df.selectExpr('VARIANCE(score) as variance', 'SKEWNESS(score) as skewness') print(B)
The correct approach to compute variance and skewness using Spark SQL functions is to use the 'selectExpr' method with the respective SQL expressions for variance and skewness. Option B does this correctly. Option A is incorrect because the 'variance' and 'skewness' functions are not available in Spark SQL. Option C is incorrect because the 'summary' method is not applicable for computing variance and skewness. Option D is incorrect because the 'describe' method does not provide variance and skewness.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are given a Spark DataFrame 'df' with a numerical column 'score'. Write a code snippet that computes the variance and skewness of the 'score' column using Spark SQL functions, and explain the steps involved.
A
from pyspark.sql.functions import variance, skewness result = df.select(variance('score'), skewness('score')) print(A)
B
result = df.selectExpr('VARIANCE(score) as variance', 'SKEWNESS(score) as skewness') print(B)
C
result = df.select('score').summary('variance', 'skewness') print(C)
D
result = df.describe() print(result['score']['variance'], result['score']['skewness'])