
Ultimate access to all questions.
You are a data engineer working on optimizing a large dataset stored in Delta Lake to improve query performance for analytical workloads. The dataset contains sales transactions with columns including 'date', 'region', 'user_id', 'product_id', and 'amount'. The queries frequently filter on 'date' and 'region', and often perform equality lookups on 'user_id'. Given the need to minimize query latency while considering cost and scalability, which of the following indexing and partitioning strategies would you implement? Choose the best option._
A
Partition the dataset solely on the 'date' column to simplify the partitioning scheme.
B
Apply z-ordering across all columns to evenly distribute the data without specific consideration for query patterns.
C
Implement partitioning on both 'date' and 'region' columns to reduce the data scanned per query and apply a bloom filter index on 'user_id' to accelerate equality lookups._
D
Set a uniform file size of 128MB for all data files, ignoring the specific characteristics and access patterns of the data.