Databricks Certified Data Engineer - Professional

Ultimate access to all questions.

You are a data engineer working on a project that involves managing a large dataset stored in Delta Lake on Microsoft Azure. The dataset contains several years of sales data, and your task is to optimize the storage and retrieval processes while ensuring efficient archiving or deletion of old data to comply with data retention policies. The solution must be cost-effective, scalable, and minimize the impact on query performance. Considering these requirements, which partitioning strategy would you implement? Choose the best option from the following:

Simulated

Partition the data based on the product category column to improve query performance for product-related analytics.

6.5%

Partition the data based on the customer ID column to enhance customer segmentation and personalization efforts.

Loading comments...