Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
You are given a Spark DataFrame 'df' with a numerical column 'price'. Write a code snippet that computes the covariance between the 'price' column and another numerical column 'quantity', and explain the steps involved.
A
from pyspark.sql.functions import cov
result = df.select(cov('price', 'quantity'))
print(A)
B
result = df.stat.corr('price', 'quantity')
print(B)
C
result = df.withColumn('price_quantity', df.price * df.quantity)
result = result.select(cov('price_quantity', 'price'))
print(C)
D
result = df.groupBy().agg(cov('price', 'quantity'))
print(D)