
Answer-first summary for fast verification
Answer: Use a Pipeline object with each stage as an individual step.
Using a Pipeline object ensures that each stage is executed in the correct order and that the output of one stage is correctly fed into the next. This approach helps in maintaining the integrity of the data transformation and model training process. Combining stages into a single Transformer or ignoring the order can lead to incorrect data processing and model training.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are developing a Spark ML pipeline for a text classification task. The pipeline includes several stages such as tokenization, removing stop words, TF-IDF transformation, and a logistic regression model. Describe how you would implement this pipeline using Spark ML, including the use of Estimators and Transformers. Additionally, discuss any potential pitfalls in developing such a pipeline and how you would mitigate them.
A
Use only Transformers and skip Estimators.
B
Combine all stages into a single Transformer.
C
Use a Pipeline object with each stage as an individual step.
D
Ignore the order of stages and apply them randomly.
No comments yet.