
Databricks Certified Machine Learning - Associate
Get started today
Ultimate access to all questions.
What is the correct line of code to complete the snippet for imputing missing values using the median strategy in PySpark?
What is the correct line of code to complete the snippet for imputing missing values using the median strategy in PySpark?
Real Exam
Explanation:
An algorithm which can be fit on a data frame to produce a Transformer. For example, a learning algorithm is an Estimator which trains on a data frame and produces a model. It has a .fit()
method because it learns (or 'fits') parameters from your data frame. The correct code snippet is:
from pyspark.ml.feature import Imputer
imputer = Imputer(strategy='median', inputCols=impute_cols, outputCols=impute_cols)
imputer_model = imputer.fit(doubles_df)
imputed_df = imputer_model.transform(doubles_df)