
Ultimate access to all questions.
You are designing a data lake solution for a multinational retail company that processes millions of sales transactions daily. The dataset includes columns such as 'transaction_id', 'product_id', 'date', 'region', and 'amount'. The company requires efficient query performance for analyzing sales trends over time and across different regions, while also considering cost optimization and scalability. Given these requirements, which of the following partitioning strategies would you recommend to best meet the company's needs? Choose the single best option.
A
Partition by 'transaction_id' and 'product_id' to ensure unique identification of each transaction and product.
B
Partition by 'date' and 'region' to optimize query performance for time-based and geographical analysis.
C
Partition by 'amount' to facilitate financial analysis and reporting.
D
Avoid partitioning to reduce complexity and storage costs, as modern query engines can handle large datasets efficiently.