
Ultimate access to all questions.
You are developing a Spark ML pipeline for a natural language processing (NLP) task that involves text classification. The pipeline includes text preprocessing, feature extraction, and model training. Describe the specific stages you would include in this pipeline, the components you would use for each stage, and how they interact. Additionally, discuss any challenges specific to NLP tasks and how you would address them.
A
Use only bag-of-words for feature extraction and a simple logistic regression model.
B
Include tokenization, stop word removal, TF-IDF transformation, and a Naive Bayes model.
C
Ignore text preprocessing and directly apply a deep learning model.
D
Use only the most frequent words for feature extraction.