
Ultimate access to all questions.
A data scientist is attempting to use Spark ML to impute missing values in their PySpark DataFrame 'features_df'. The goal is to replace missing values in all numeric columns with the median of each column. However, the provided code snippet does not achieve this. What is the primary reason the code fails to perform the intended imputation?
my_imputer = imputer(strategy = 'median', inputCols = input_columns, outputCols = output_columns)
imputed_df = my_imputer.transform(features_df)
```_
my_imputer = imputer(strategy = 'median', inputCols = input_columns, outputCols = output_columns)
imputed_df = my_imputer.transform(features_df)
```_
A
Imputing using a median value is not supported in Spark ML.
B
The code does not handle imputation for both training and test datasets at the same time.
C
The 'inputCols' and 'outputCols' parameters must have identical column names.
D
The imputer must first be fitted to the data to create an 'ImputerModel' before transforming.
E
The 'transform' method should be replaced with the 'fit' method.