
Answer-first summary for fast verification
Answer: Submit the data for training without manual transformations. Manually split the data based on the columns with a time signal. Ensure that the data in the validation set is from 30 days after the data in the training set and that the data in the testing set is from 30 days after the validation set.
The correct approach is to manually split the data based on the time signal, ensuring the validation set follows the training set chronologically, and the testing set follows the validation set. This prevents data leakage and respects the temporal sequence, crucial for time series data. AutoML Tables can manage transformations, but the chronological integrity of data splits must be manually maintained.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
As a Professional Machine Learning Engineer for a large hotel chain, you're tasked with predicting user lifetime value (LTV) to support the marketing team's strategy for the next 20 days. The dataset, stored in BigQuery, includes a time signal across various columns. The marketing team emphasizes the importance of avoiding data leakage and ensuring the model's predictions are based on chronological data to reflect real-world scenarios accurately. Given these constraints, how should you prepare the data for AutoML Tables to fit the optimal model? Choose the best option.
A
Combine all time-related columns into an array and let AutoML interpret it. Split the data automatically into training, validation, and testing sets.
B
Submit the data for training without manual transformations and let AutoML handle the appropriate transformations. Split the data automatically into training, validation, and testing sets.
C
Submit the data for training without manual transformations and indicate an appropriate column as the Time column. Let AutoML split the data based on the time signal. Reserve the most recent data for validation and testing sets.
D
Submit the data for training without manual transformations. Manually split the data based on the columns with a time signal. Ensure that the data in the validation set is from 30 days after the data in the training set and that the data in the testing set is from 30 days after the validation set.