Google Professional Data Engineer

Ultimate access to all questions.

You are working on a regression problem in the domain of natural language processing, utilizing a dataset consisting of 100 million labeled examples. After randomly shuffling the data, you have partitioned it into training and test sets, maintaining a 90/10 ratio respectively. Upon training your neural network model and subsequently evaluating its performance on the test set, you observe that the root-mean-squared error (RMSE) is twice as high on the train set compared to the test set. What approaches can you take to enhance the performance of your model?

Exam-Like

Increase the share of the test sample in the train-test split.

13.8%

Try to collect more data and increase the size of your dataset.

12.3%

Loading comments...