
Answer-first summary for fast verification
Answer: The 'fit' method should be used before 'transform' to create an 'ImputerModel'.
In Spark ML, the correct approach involves first fitting the imputer to the data to create an 'ImputerModel', which learns the median values of the columns. Only after this model is created can the 'transform' method be used to impute missing values in the dataset. The other options either misrepresent the functionality or are irrelevant to the described problem.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data scientist is attempting to use Spark ML for imputing missing values in a PySpark DataFrame named 'features_df'. The goal is to replace missing values in all numeric columns with their respective median values. However, the provided code snippet fails to achieve this. What is the underlying issue with the code?
A
The 'inputCols' and 'outputCols' parameters must be identical.
B
The 'fit' method should be used before 'transform' to create an 'ImputerModel'.
C
Median value imputation is not supported in Spark ML.
D
The code does not apply the imputer to both training and test datasets simultaneously.
E
The imputer needs to be configured with a different strategy other than 'median'.