Google Professional Machine Learning Engineer

Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.


Your team is working on a Natural Language Processing (NLP) research project aimed at predicting the political affiliation of authors based on the articles they have written. You have a large training dataset consisting of numerous articles penned by various authors. Following best practices, you decided to split your dataset into training, testing, and evaluation subsets with an 80%-10%-10% distribution, respectively. Considering this setup, how should you distribute the training examples across the train-test-eval subsets while maintaining the 80-10-10 proportion?