
Answer-first summary for fast verification
Answer: Address data leakage by applying nested cross-validation during model training.
The correct answer is B. In time series data, using random cross-validation can lead to data leakage because it does not account for the sequential nature of the data. Nested cross-validation is a more appropriate technique for time series data because it helps prevent data leakage by ensuring that temporal order is maintained, thereby providing a more reliable estimate of model performance. Overfitting is a concern here, but the unusually high AUC ROC value on the training data suggests that data leakage might be influencing the results, making it imperative to address this first by using nested cross-validation.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are working on a classification problem with time series data, such as predicting stock prices or classifying events based on historical data. After conducting just a few experiments using random cross-validation, you achieved an Area Under the Receiver Operating Characteristic Curve (AUC ROC) value of 99% on the training data. This high performance was achieved without exploring any sophisticated algorithms or spending time on hyperparameter tuning. Given the nature of your data and the approach taken so far, what should your next step be to correctly address and fix any underlying issues?
A
Address the model overfitting by using a less complex algorithm and use k-fold cross-validation.
B
Address data leakage by applying nested cross-validation during model training.
C
Address data leakage by removing features highly correlated with the target value.
D
Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.
No comments yet.