As a data scientist at a retail company, you are tasked with training a sales prediction model using a managed tabular dataset in Vertex AI. The dataset contains sales data from three different stores, including features such as store name and sale timestamp. The goal is to make accurate sales predictions for a new store that will open soon. To achieve this, you need to split the data between training, validation, and test sets. Which approach should you take to split the dataset effectively?

Exam-Like

Use Vertex AI manual split, using the store name feature to assign one store for each set

8.0%

Use Vertex AI default data split

12.6%

Use Vertex AI chronological split, and specify the sales timestamp feature as the time variable

60.9%

Use Vertex AI random split, assigning 70% of the rows to the training set, 10% to the validation set, and 20% to the test set

18.4%

Google Professional Machine Learning Engineer