
Answer-first summary for fast verification
Answer: StringIndexer assigns numerical indices to string labels, facilitating the OHE process.
StringIndexer is a pivotal preprocessing step in Spark for One-Hot Encoding (OHE), though it does not perform OHE directly. It converts categorical variables with string labels into numerical indices, making the data suitable for machine learning algorithms that require numerical input. This conversion is a prerequisite for OHE, which then creates binary (dummy) variables for each category. The correct answer highlights StringIndexer's role in mapping string labels to numerical indices, a foundational step before OHE can be applied. Other options either misrepresent StringIndexer's functionality or confuse it with the actual OHE process.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
What role does StringIndexer play in the One Hot Encoding (OHE) process within Spark?
A
StringIndexer is utilized to generate dummy variables directly from categorical data.
B
StringIndexer assigns numerical indices to string labels, facilitating the OHE process.
C
StringIndexer replaces categorical variables with their numerical counterparts.
D
StringIndexer executes OHE on categorical variables without any preprocessing.
No comments yet.