Microsoft Azure Data Engineer Associate - DP-203

Get started today

Ultimate access to all questions.

In a scenario where you are working with a data lake in Azure Data Lake Storage Gen2, you need to implement a partition strategy for optimizing the performance of machine learning workloads. What partitioning approach would you recommend, and how would you implement it to ensure efficient data processing and model training?

Simulated

Implement a partition strategy based on the data's size, as this is the most critical factor for machine learning workloads.

0.0%

Create a partition strategy based on the data's natural partition key, such as feature attributes or categorical variables, to improve query performance and reduce the amount of data scanned.

Comments

Loading comments...

Use a hash-based partitioning method to distribute the data evenly across multiple partitions, regardless of the data's characteristics.