In a Databricks project focused on natural language processing (NLP), a data scientist is tasked with preprocessing text data, which includes tokenization and the removal of stop words. Which Spark MLlib feature is most appropriate for scalable text preprocessing?