
Ultimate access to all questions.
You are given a Spark DataFrame 'df' with a numerical column 'price'. Write a code snippet that computes the covariance between the 'price' column and another numerical column 'quantity', and explain the steps involved.
A
from pyspark.sql.functions import cov result = df.select(cov('price', 'quantity')) print(A)
B
result = df.stat.corr('price', 'quantity') print(B)
C
result = df.withColumn('price_quantity', df.price * df.quantity) result = result.select(cov('price_quantity', 'price')) print(C)*
D
result = df.groupBy().agg(cov('price', 'quantity')) print(D)