
Answer-first summary for fast verification
Answer: Include tokenization, stop word removal, TF-IDF transformation, and a Naive Bayes model.
Including tokenization, stop word removal, and TF-IDF transformation ensures that the text data is properly preprocessed and transformed into a suitable format for model training. Using a Naive Bayes model is appropriate for sentiment analysis tasks. Ignoring text preprocessing or using only the most frequent words can lead to suboptimal model performance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are developing a Spark ML pipeline for a sentiment analysis task that involves classifying text into positive, negative, or neutral sentiments. The pipeline includes text preprocessing, feature extraction, and model training. Describe the specific stages you would include in this pipeline, the components you would use for each stage, and how they interact. Additionally, discuss any challenges specific to sentiment analysis tasks and how you would address them.
A
Use only bag-of-words for feature extraction and a simple logistic regression model.
B
Include tokenization, stop word removal, TF-IDF transformation, and a Naive Bayes model.
C
Ignore text preprocessing and directly apply a deep learning model.
D
Use only the most frequent words for feature extraction.
No comments yet.