
Ultimate access to all questions.
You are tasked with developing a machine learning model to forecast daily temperatures for a weather forecasting application. The dataset includes hourly temperature readings over several years. Initially, you randomly split the data into training and test sets, applied necessary transformations, and achieved a testing accuracy of 97%. However, upon deployment, the model's accuracy significantly dropped to 66%. Considering the importance of accurate forecasts for planning and safety, and the need to comply with data privacy regulations, which of the following strategies would BEST improve the production model's accuracy? (Choose one correct option)
A
Apply data transformations before splitting the dataset and employ cross-validation to ensure that transformations are consistently applied across training and test sets, thereby minimizing discrepancies.
B
Normalize the training and test datasets independently to maintain the integrity of each dataset's distribution, potentially improving model generalization.
C
Increase the size of the test set to ensure it is more representative of the overall data distribution, aiming for a more reliable evaluation of the model's performance.
D
Implement a time-based split for the training and test data instead of a random split to prevent data leakage and ensure the model is evaluated on future, unseen data, mimicking real-world deployment scenarios.
E
Combine options A and D, applying data transformations before splitting and using a time-based split to both prevent data leakage and ensure uniform transformation application.