Databricks Certified Machine Learning - Associate

Get started today

Ultimate access to all questions.

In the context of feature selection using Spark ML, explain the process of selecting the most relevant features for a machine learning model. Provide a code snippet demonstrating the use of Spark ML's feature selection techniques, such as `ChiSqSelector` or `RFE` (Recursive Feature Elimination), and explain the key considerations to keep in mind during this process.

Simulated

Use the ChiSqSelector class from the pyspark.ml.feature module to select the top k features with the highest chi-squared statistics for categorical features.

66.7%

Comments

Loading comments...

Use the RFE class from the pyspark.ml.feature module to perform recursive feature elimination based on the importance of features learned by a machine learning model.

Use the VectorSlicer class from the pyspark.ml.feature module to select a subset of features from a vector column.

7.4%

Use the MinMaxScaler class from the pyspark.ml.feature module to scale the features to a specific range, without performing feature selection.

3.7%