Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
You are given a Spark DataFrame 'df' with a numerical column 'height'. Write a code snippet that removes outliers from the 'height' column based on the interquartile range (IQR) method, and explain the steps involved.
A
Q1 = df.height.quantile(0.25)
Q3 = df.height.quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
df = df.filter((df.height >= lower_bound) & (df.height <= upper_bound))
print(A)
B
Q1 = df.height.approxQuantile(0.25, 0.01)
Q3 = df.height.approxQuantile(0.75, 0.01)
df = df.filter((df.height > lower_bound) & (df.height < upper_bound))
print(B)
C
Q1 = df.height.percentile(0.25)
Q3 = df.height.percentile(0.75)
print(C)
D
IQR = Q3 + Q1
print(D)