
Answer-first summary for fast verification
Answer: Increase the volume of data that is used in training.
## Analysis of the Problem The scenario describes a classic case of **overfitting**: the model performs well on training data but poorly on production (unseen) data. This indicates the model has learned patterns specific to the training dataset rather than generalizable patterns that apply to real-world data. ## Evaluation of Options **A: Reduce the volume of data that is used in training.** - **Not optimal**: Reducing training data would likely worsen overfitting by giving the model fewer examples to learn from, making it more prone to memorizing noise or specific patterns in the limited dataset. - This approach contradicts established machine learning best practices for addressing overfitting. **B: Add hyperparameters to the model.** - **Misleading**: While hyperparameter tuning (adjusting existing parameters like learning rate, regularization, or dropout) is a valid technique to combat overfitting, the phrasing "add hyperparameters" is imprecise. Models already have hyperparameters; the key is to optimize them through tuning, not add new ones arbitrarily. - This option could be misinterpreted and doesn't directly address the core issue of insufficient data diversity. **C: Increase the volume of data that is used in training.** - **Optimal solution**: Increasing training data volume helps the model encounter more diverse examples, reducing its tendency to overfit to specific patterns in the original dataset. This improves generalization to unseen production data. - More data allows the model to learn robust features that better represent the underlying distribution of real-world item prices. - This aligns with AWS AI/ML best practices for improving model generalization and reducing overfitting. **D: Increase the model training time.** - **Not optimal**: Longer training time without other adjustments could actually exacerbate overfitting, as the model might continue to memorize training data rather than generalize. - Training time alone doesn't address the fundamental issue of data insufficiency or lack of diversity. ## Recommended Approach The most effective immediate action is **C: Increase the volume of training data**. This should be complemented by: 1. **Data augmentation** techniques if additional raw data isn't available 2. **Hyperparameter tuning** (not "adding" but optimizing existing parameters) 3. **Regularization methods** like dropout or L1/L2 regularization 4. **Cross-validation** to ensure the model generalizes well However, among the given options, increasing training data volume provides the most direct and reliable path to improving generalization from training to production environments.
Ultimate access to all questions.
No comments yet.
Author: LeetQuiz Editorial Team
A company has a model that accurately predicts item prices on training data, but its performance drops substantially after deployment to production. What steps should the company take to address this issue?
A
Reduce the volume of data that is used in training.
B
Add hyperparameters to the model.
C
Increase the volume of data that is used in training.
D
Increase the model training time.