
Answer-first summary for fast verification
Answer: Implement nested cross-validation during model training to mitigate data leakage., Conduct feature importance analysis to identify and remove redundant features that do not contribute to the model's predictive power.
The correct answers are to implement nested cross-validation during model training to mitigate data leakage and to conduct feature importance analysis to identify and remove redundant features. Nested cross-validation is crucial for time-series data to ensure the model's performance is evaluated without the training data influencing the testing process, maintaining the chronological order of data. Feature importance analysis helps in understanding the contribution of each feature towards the model's predictions, allowing for the removal of non-informative features that could lead to overfitting. While simplifying the algorithm or adjusting hyperparameters to lower the AUC ROC might seem like viable options, they do not directly address the root causes of overfitting or data leakage as effectively as the chosen options.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In the context of developing a machine learning model for a classification problem involving time series data, your team has achieved an AUC ROC value of 99% on the training data with minimal experimentation. While this result is promising, the team is concerned about potential overfitting and data leakage. Beyond exploring advanced algorithms and hyperparameter tuning, what additional steps should be prioritized to ensure the model's robustness and generalizability to unseen data? (Choose two correct options)
A
Address data leakage by removing features that are highly correlated with the target variable.
B
Reduce model overfitting by opting for a simpler algorithm.
C
Adjust hyperparameters to lower the AUC ROC value as a strategy to combat overfitting.
D
Implement nested cross-validation during model training to mitigate data leakage.
E
Conduct feature importance analysis to identify and remove redundant features that do not contribute to the model's predictive power.