LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Databricks Certified Machine Learning - Associate

Databricks Certified Machine Learning - Associate

Get started today

Ultimate access to all questions.


A data scientist is working on one-hot encoding categorical attributes in a PySpark DataFrame named 'features_df' using Spark ML. The string column names are stored in the variable 'input_columns'. The provided code snippet is causing an error. What change is necessary to correctly perform one-hot encoding?

Real Exam



Explanation:

In Spark ML, categorical string attributes must first be converted to numerical indices using StringIndexer before one-hot encoding can be applied. This is because OneHotEncoder does not directly process string columns. The correct approach involves two steps: 1) Use StringIndexer to convert string columns into indices, and 2) Apply OneHotEncoder to these indices to produce one-hot encoded vectors. This preparation is essential for the one-hot encoding process in Spark ML and addresses the error encountered. Option A is incorrect as OneHotEncoder does not require a 'method' parameter. Option B is incorrect because the 'fit' operation is necessary for OneHotEncoder to learn the category mappings. Option D is incorrect since OneHotEncoder needs distinct names for output columns to store the encoded vectors.

Powered ByGPT-5