
Ultimate access to all questions.
In a Databricks project focused on natural language processing (NLP), a data scientist is tasked with preprocessing text data, which includes tokenization and the removal of stop words. Which Spark MLlib feature is most appropriate for scalable text preprocessing?
A
StringIndexer
B
CountVectorizer
C
StopWordsRemover
D
Tokenizer