Databricks Certified Machine Learning - Associate

Databricks Certified Machine Learning - Associate

Get started today

Ultimate access to all questions.


What is the correct line of code to complete the snippet for imputing missing values using the median strategy in PySpark?





Explanation:

An algorithm which can be fit on a data frame to produce a Transformer. For example, a learning algorithm is an Estimator which trains on a data frame and produces a model. It has a .fit() method because it learns (or 'fits') parameters from your data frame. The correct code snippet is:

from pyspark.ml.feature import Imputer
imputer = Imputer(strategy='median', inputCols=impute_cols, outputCols=impute_cols)
imputer_model = imputer.fit(doubles_df)
imputed_df = imputer_model.transform(doubles_df)