
Ultimate access to all questions.
You are working on a classification problem involving time series data and have quickly achieved an AUC ROC value of 99% on your training data with minimal experimentation, without employing sophisticated algorithms or hyperparameter tuning. Given the high performance with little effort, you suspect there might be an underlying issue. The dataset is large, and computational resources are not a constraint. However, the project has strict compliance requirements, and the solution must be scalable for future data. Which of the following steps is the MOST appropriate to identify and resolve the underlying issue? Choose the best option.
A
Combat model overfitting by opting for a simpler algorithm, considering the high AUC ROC value might be due to the model capturing noise in the training data.
B
Implement nested cross-validation during model training to tackle potential data leakage, ensuring the model's performance is assessed without the influence of future data.
C
Reduce data leakage by eliminating features that are highly correlated with the target variable, as these might be providing the model with future information.
D
Mitigate model overfitting by adjusting hyperparameters to decrease the AUC ROC value, aiming for a more realistic performance metric.