
Ultimate access to all questions.
In the context of handling missing data using Spark ML, explain the process of imputing or removing missing values in a dataset. Provide a code snippet demonstrating the use of Spark ML's Imputer or DataFrameNaFunctions for handling missing data and explain the key considerations to keep in mind during this process.
A
Use the Imputer class from the pyspark.ml.feature module to fill in missing values with the mean, median, or mode of the column.
B
Use the fillna method of the Spark DataFrame API to fill in missing values with a specified value or using various strategies like forward fill or backward fill.
C
Use the dropna method of the Spark DataFrame API to remove rows with missing values from the dataset.
D
Use the StringIndexer class from the pyspark.ml.feature module to handle missing values by converting categorical features with missing values to a separate category.