Ultimate access to all questions.
You are designing a Delta Lake table in Databricks to store 5 years of sales data. The dataset contains the following columns:
sale_date (date of sale) region (geographic region, ~10 distinct values) customer_id (~10 million distinct values) amount (sale amount)
Your queries often: Filter by sale_date and region Occasionally filter or group by customer_id
Which of the following table definitions BEST balances performance and storage efficiency?