Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
You are given a Spark DataFrame 'df' with a numerical column 'age'. Write a code snippet that computes the mean, median, and standard deviation of the 'age' column using dbutils data summaries, and explain the steps involved.
A
from pyspark.sql.functions import mean, median, stddev
result = df.select(mean('age'), median('age'), stddev('age'))
print(A)
B
result = dbutils.data.Summary(df, 'age')
print(result['mean'], result['median'], result['stddev'])
C
result = df.describe()
print(result['age']['mean'], result['age']['50%'], result['age']['stddev'])
D
result = df.agg(mean('age'), median('age'), stddev('age'))
print(D)