Databricks Certified Machine Learning - Associate

Get started today

Ultimate access to all questions.

In the context of handling missing data using Spark ML, explain the process of imputing or removing missing values in a dataset. Provide a code snippet demonstrating the use of Spark ML's `Imputer` or `DataFrameNaFunctions` for handling missing data and explain the key considerations to keep in mind during this process.

Simulated

Use the Imputer class from the pyspark.ml.feature module to fill in missing values with the mean, median, or mode of the column.

71.4%

Use the fillna method of the Spark DataFrame API to fill in missing values with a specified value or using various strategies like forward fill or backward fill.

Comments

Loading comments...

Use the StringIndexer class from the pyspark.ml.feature module to handle missing values by converting categorical features with missing values to a separate category.

11.4%