
Ultimate access to all questions.
You are a data engineer working on a project that involves managing a large dataset stored in Delta Lake on Microsoft Azure. The dataset contains several years of sales data, and your task is to optimize the storage and retrieval processes while ensuring efficient archiving or deletion of old data to comply with data retention policies. The solution must be cost-effective, scalable, and minimize the impact on query performance. Considering these requirements, which partitioning strategy would you implement? Choose the best option from the following:
A
Partition the data based on the product category column to improve query performance for product-related analytics.
B
Partition the data based on the customer ID column to enhance customer segmentation and personalization efforts.
C
Partition the data based on the transaction date column to facilitate the archiving or deletion of old data in compliance with retention policies.
D
Partition the data randomly across the dataset to distribute the data evenly and improve overall query performance.