
Ultimate access to all questions.
You are given a Spark DataFrame 'df' with a numerical column 'temperature'. Write a code snippet that removes outliers from the 'temperature' column that are beyond 3 standard deviations from the mean, and explain the steps involved.
A
mean = df.temperature.mean() std_dev = df.temperature.stddev() threshold = mean + 3 * std_dev df = df.filter(df.temperature < threshold) print(A)*
B
mean = df.temperature.mean() std_dev = df.temperature.stddev() lower_bound = mean - 3 * std_dev upper_bound = mean + 3 * std_dev df = df.filter((df.temperature > lower_bound) & (df.temperature < upper_bound)) print(B)_
C
mean = df.temperature.agg('mean') std_dev = df.temperature.agg('stddev') df = df.filter((df.temperature > (mean - 3 * std_dev)) & (df.temperature < (mean + 3 * std_dev))) print(C)_
D
mean = df.temperature.summary('mean') std_dev = df.temperature.summary('stddev') df = df.filter((df.temperature > (mean - 3 * std_dev)) & (df.temperature < (mean + 3 * std_dev))) print(D)_