
Google Professional Machine Learning Engineer
Get started today
Ultimate access to all questions.
You need to train a natural language model to perform text classification on a dataset containing millions of product descriptions, with a vocabulary size of 100,000 unique words. To effectively preprocess the words so they can be fed into a recurrent neural network (RNN) for better performance, what preprocessing step should you take?
You need to train a natural language model to perform text classification on a dataset containing millions of product descriptions, with a vocabulary size of 100,000 unique words. To effectively preprocess the words so they can be fed into a recurrent neural network (RNN) for better performance, what preprocessing step should you take?
Explanation:
The correct answer is B. Identifying word embeddings from a pre-trained model and using these embeddings in your model is the recommended approach. Word embeddings provide a dense vector representation of words, capturing their semantic meanings and relationships to other words, which recurrent neural networks (RNNs) can effectively utilize to understand context. This method is more efficient and effective compared to one-hot encoding or manual frequency sorting.