Databricks Certified Data Engineer - Professional

Ultimate access to all questions.

You are designing a data lake solution for a multinational retail company that processes millions of sales transactions daily. The dataset includes columns such as 'transaction_id', 'product_id', 'date', 'region', and 'amount'. The company requires efficient query performance for analyzing sales trends over time and across different regions, while also considering cost optimization and scalability. Given these requirements, which of the following partitioning strategies would you recommend to best meet the company's needs? Choose the single best option.

Simulated

Partition by 'transaction_id' and 'product_id' to ensure unique identification of each transaction and product.

8.8%

Partition by 'date' and 'region' to optimize query performance for time-based and geographical analysis.

Loading comments...