
Ultimate access to all questions.
You are working on a regression problem in the domain of natural language processing, utilizing a dataset consisting of 100 million labeled examples. After randomly shuffling the data, you have partitioned it into training and test sets, maintaining a 90/10 ratio respectively. Upon training your neural network model and subsequently evaluating its performance on the test set, you observe that the root-mean-squared error (RMSE) is twice as high on the train set compared to the test set. What approaches can you take to enhance the performance of your model?
A
Increase the share of the test sample in the train-test split.
B
Try to collect more data and increase the size of your dataset.
C
Try out regularization techniques (e.g., dropout of batch normalization) to avoid overfitting.
D
Increase the complexity of your model by, e.g., introducing an additional layer or increasing the size of vocabularies or n-grams used.