
Answer-first summary for fast verification
Answer: Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/sales.csv' file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments, identifying the version to be used based on the month tag as necessary.
Option D is correct because it uses dataset versioning with tags, which satisfies all requirements: (1) It loads all sales data to date by referencing all 'sales/mm-yyyy/sales.csv' files, creating a structure easily convertible to a DataFrame. (2) It enables filtering data before a specific month by using the 'month' tag to identify versions, allowing experiments to use only data up to a certain point. (3) It registers the minimum number of datasets (one entity) by creating new versions of the same dataset each month, rather than multiple distinct datasets. This approach aligns with Azure ML best practices for versioning, as referenced in community discussions and official documentation. Option B is incorrect because it uses a wildcard path ('sales/*/sales.csv') but lacks versioning, making it difficult to filter data by month without additional steps. Option A replaces the dataset each month, losing historical data. Option C registers multiple datasets, violating the 'minimum number' requirement.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You have multiple CSV files with the same schema named sales.csv, each stored in a folder named for the month and year (e.g., 2020/01/sales.csv, 2020/02/sales.csv), within a parent folder named sales in an Azure blob container. A new folder is added each month. You need to register a dataset for training a model with these requirements:
What should you do?

A
Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/sales.csv' file every month. Register the dataset with the name sales_dataset each month, replacing the existing dataset and specifying a tag named month indicating the month and year it was registered. Use this dataset for all experiments.
B
Create a tabular dataset that references the datastore and specifies the path 'sales/*/sales.csv', register the dataset with the name sales_dataset and a tag named month indicating the month and year it was registered, and use this dataset for all experiments.
C
Create a new tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/sales.csv' file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with appropriate MM and YYYY values for the month and year. Use the appropriate month-specific dataset for experiments.
D
Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/sales.csv' file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments, identifying the version to be used based on the month tag as necessary.