
Ultimate access to all questions.
In a scenario where you are working with a data lake in Azure Data Lake Storage Gen2, you need to implement a partition strategy for optimizing the performance of machine learning workloads. What partitioning approach would you recommend, and how would you implement it to ensure efficient data processing and model training?
A
Implement a partition strategy based on the data's size, as this is the most critical factor for machine learning workloads.
B
Create a partition strategy based on the data's natural partition key, such as feature attributes or categorical variables, to improve query performance and reduce the amount of data scanned.
C
Use a hash-based partitioning method to distribute the data evenly across multiple partitions, regardless of the data's characteristics.
D
Do not implement any partition strategy, as it is not necessary for machine learning workloads in Azure Data Lake Storage Gen2.