
Answer-first summary for fast verification
Answer: Implement a time-based split for training and test data to prevent data leakage and ensure the model is not trained on future data., Apply all data transformations before splitting the dataset and use cross-validation to uniformly assess model performance across different subsets of the data.
In time series forecasting, it's crucial to avoid data leakage by ensuring the model does not have access to future data during training. A time-based split (Option B) effectively addresses this by chronologically separating training and test data. Additionally, applying transformations before splitting the data (Option D) ensures that the same preprocessing steps are uniformly applied, preventing inconsistencies that could affect model performance. Combining these strategies (Option E) provides a comprehensive approach to improving the model's accuracy in production by addressing both data leakage and transformation consistency.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are tasked with developing a machine learning model to forecast daily temperatures for a weather forecasting application. The dataset includes hourly temperature readings over several years. Initially, you randomly split the data into training and test sets, applied necessary transformations, and achieved a testing accuracy of 97%. However, upon deployment, the model's accuracy significantly dropped to 66%. Considering the time-sensitive nature of the data and the need for high accuracy in production, which strategies would you implement to improve the model's production accuracy? (Choose two options)
A
Normalize the training and test datasets separately to ensure each set's distribution is independently scaled.
B
Implement a time-based split for training and test data to prevent data leakage and ensure the model is not trained on future data.
C
Increase the size of the test set by adding more recent data to ensure it is representative of current weather patterns.
D
Apply all data transformations before splitting the dataset and use cross-validation to uniformly assess model performance across different subsets of the data.
E
Combine options B and D to both prevent data leakage and ensure uniform application of transformations across training and test sets.