Ultimate access to all questions.
In the context of developing a machine learning model for a classification problem involving time series data, your team has achieved an AUC ROC value of 99% on the training data with minimal experimentation. While this result is promising, the team is concerned about potential overfitting and data leakage. Beyond exploring advanced algorithms and hyperparameter tuning, what additional steps should be prioritized to ensure the model's robustness and generalizability to unseen data? (Choose two correct options)
Explanation:
The correct answers are to implement nested cross-validation during model training to mitigate data leakage and to conduct feature importance analysis to identify and remove redundant features. Nested cross-validation is crucial for time-series data to ensure the model's performance is evaluated without the training data influencing the testing process, maintaining the chronological order of data. Feature importance analysis helps in understanding the contribution of each feature towards the model's predictions, allowing for the removal of non-informative features that could lead to overfitting. While simplifying the algorithm or adjusting hyperparameters to lower the AUC ROC might seem like viable options, they do not directly address the root causes of overfitting or data leakage as effectively as the chosen options.