
Ultimate access to all questions.
You are given a Spark DataFrame 'df' with a numerical column 'salary'. Write a code snippet that removes outliers from the 'salary' column that are less than the 10th percentile or greater than the 90th percentile, and explain the steps involved.
A
lower_bound = df.salary.percentile(0.1) upper_bound = df.salary.percentile(0.9) df = df.filter((df.salary >= lower_bound) & (df.salary <= upper_bound)) print(A)
B
lower_bound = df.salary.min() upper_bound = df.salary.max() df = df.filter((df.salary > lower_bound) & (df.salary < upper_bound)) print(B)
C
lower_bound = df.salary.quantile(0.1) upper_bound = df.salary.quantile(0.9) df = df.filter((df.salary > lower_bound) & (df.salary < upper_bound)) print(C)
D
lower_bound = df.salary.approxQuantile(0.1, 0.01) upper_bound = df.salary.approxQuantile(0.9, 0.01) df = df.filter((df.salary >= lower_bound) & (df.salary <= upper_bound)) print(D)