
Answer-first summary for fast verification
Answer: There is training-serving skew in your production environment.
The most likely cause of the problem is training-serving skew in your production environment. Training-serving skew occurs when there is a discrepancy between the data and features used during training and those used in the production environment, leading to a drop in model performance. Although other options like C might seem plausible, as indicated by community votes, the correct answer is A because it directly addresses the performance drop observed after deployment, which is a common symptom of training-serving skew.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are working on a machine learning project using Vertex AI Workbench and experimenting with a distributed XGBoost model. To split your dataset into training and validation sets, you use BigQuery and run the following SQL queries: CREATE OR REPLACE TABLE ‘myproject.mydataset.training‘ AS (SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() <= 0.8); CREATE OR REPLACE TABLE ‘myproject.mydataset.validation‘ AS (SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() <= 0.2); After training the model with these sets, you achieve an area under the receiver operating characteristic curve (AUC ROC) value of 0.8. However, upon deploying the model to production, the AUC ROC value drops significantly to 0.65. What is the most likely cause of this problem?
A
There is training-serving skew in your production environment.
B
There is not a sufficient amount of training data.
C
The tables that you created to hold your training and validation records share some records, and you may not be using all the data in your initial table.
D
The RAND() function generated a number that is less than 0.2 in both instances, so every record in the validation table will also be in the training table.