
Explanation:
An algorithm which can be fit on a data frame to produce a Transformer. For example, a learning algorithm is an Estimator which trains on a data frame and produces a model. It has a .fit() method because it learns (or 'fits') parameters from your data frame. The correct code snippet is:
from pyspark.ml.feature import Imputer
imputer = Imputer(strategy='median', inputCols=impute_cols, outputCols=impute_cols)
imputer_model = imputer.fit(doubles_df)
imputed_df = imputer_model.transform(doubles_df)
from pyspark.ml.feature import Imputer
imputer = Imputer(strategy='median', inputCols=impute_cols, outputCols=impute_cols)
imputer_model = imputer.fit(doubles_df)
imputed_df = imputer_model.transform(doubles_df)
Ultimate access to all questions.
What is the correct line of code to complete the snippet for imputing missing values using the median strategy in PySpark?
A
No need to add anything
B
imputer_model = imputer.fit(doubles_df)
C
imputer_model = doubles_df.fit()
D
Change the imputer constructor to most_frequent
E
Imputer(strategy=“most_frequent“, inputCols=impute_cols, outputCols=impute_cols)
No comments yet.