Microsoft Certified Azure Data Scientist Associate - DP-100

Get started today

Ultimate access to all questions.

Explanation:

The solution meets the goal because: 1) The path pattern '/data//.csv' correctly matches all the required CSV files in both the 2018 and 2019 subdirectories, 2) Dataset.Tabular.from_delimited_files() creates a tabular dataset from multiple delimited files, and 3) to_pandas_dataframe() loads all the data from the dataset into a single pandas DataFrame. The community discussion shows conflicting opinions, but the most recent and upvoted comments (including one with 9 upvotes) confirm that the solution is correct based on current Azure ML documentation. Earlier comments suggesting 'No' appear to be based on outdated versions where the code might have created a FileDataset instead of a TabularDataset, but the current implementation correctly creates a TabularDataset that supports to_pandas_dataframe().

Explanation:

Comments (0)

No comments yet.

You create an Azure Machine Learning datastore containing the following files:

/data/2018/Q1.csv
/data/2018/Q2.csv
/data/2018/Q3.csv
/data/2018/Q4.csv
/data/2019/Q1.csv

All files have the following format:

id,f1,f2,l
1,1,2,0
2,1,1,1
3,2,1,0
4,2,2,1

id,f1,f2,l
1,1,2,0
2,1,1,1
3,2,1,0
4,2,2,1

You run the following code:

from azureml.core import Dataset, Datastore, Workspace

ws = Workspace.from_config()
datastore = Datastore.get(ws, 'workspaceblobstore')

dataset = Dataset.Tabular.from_delimited_files(path=(datastore, '/data/*/*.csv'))
training_data = dataset.to_pandas_dataframe()

from azureml.core import Dataset, Datastore, Workspace

ws = Workspace.from_config()
datastore = Datastore.get(ws, 'workspaceblobstore')

dataset = Dataset.Tabular.from_delimited_files(path=(datastore, '/data/*/*.csv'))
training_data = dataset.to_pandas_dataframe()

You need to create a dataset named training_data that loads the data from all files into a single DataFrame.

Solution: Run the following code:

from azureml.core import Dataset, Datastore, Workspace

ws = Workspace.from_config()
datastore = Datastore.get(ws, 'workspaceblobstore')

dataset = Dataset.Tabular.from_delimited_files(path=(datastore, '/data/*/*.csv'))
training_data = dataset.to_pandas_dataframe()

from azureml.core import Dataset, Datastore, Workspace

ws = Workspace.from_config()
datastore = Datastore.get(ws, 'workspaceblobstore')

dataset = Dataset.Tabular.from_delimited_files(path=(datastore, '/data/*/*.csv'))
training_data = dataset.to_pandas_dataframe()

Does the solution meet the goal?

Exam-Like

Yes

50.0%