Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
You are given a Spark DataFrame 'df' with a numerical column 'salary'. Write a code snippet that removes outliers from the 'salary' column that are less than the 10th percentile or greater than the 90th percentile, and explain the steps involved.
A
lower_bound = df.salary.percentile(0.1)
upper_bound = df.salary.percentile(0.9)
df = df.filter((df.salary >= lower_bound) & (df.salary <= upper_bound))
print(A)
B
lower_bound = df.salary.min()
upper_bound = df.salary.max()
df = df.filter((df.salary > lower_bound) & (df.salary < upper_bound))
print(B)
C
lower_bound = df.salary.quantile(0.1)
upper_bound = df.salary.quantile(0.9)
print(C)
D
lower_bound = df.salary.approxQuantile(0.1, 0.01)
upper_bound = df.salary.approxQuantile(0.9, 0.01)
print(D)