You are developing a Spark ML pipeline for a sentiment analysis task that involves classifying text into positive, negative, or neutral sentiments. The pipeline includes text preprocessing, feature extraction, and model training. Describe the specific stages you would include in this pipeline, the components you would use for each stage, and how they interact. Additionally, discuss any challenges specific to sentiment analysis tasks and how you would address them.

Simulated

Use only bag-of-words for feature extraction and a simple logistic regression model.

5.9%

Include tokenization, stop word removal, TF-IDF transformation, and a Naive Bayes model.

88.2%

Ignore text preprocessing and directly apply a deep learning model.

2.9%

Use only the most frequent words for feature extraction.

2.9%

Databricks Certified Machine Learning - Associate