
Answer-first summary for fast verification
Answer: Partition by customer ID, z-order by transaction date, apply bloom filters on customer ID, and set file size to 256 MB.
Partitioning by customer ID allows for more efficient querying when dealing with customer-specific data. Z-ordering by transaction date helps in clustering related data together, which is beneficial for time-series analysis. Bloom filters on customer ID improve the speed of lookups for specific customers. Setting file size to 256 MB is a balanced choice that ensures good performance without creating too many small files.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Consider a scenario where you have a large dataset of customer transactions that you need to optimize for querying using Delta Lake. You decide to apply partitioning, z-ordering, bloom filters, and adjust file sizes. Describe in detail how you would choose the appropriate partitioning column, implement z-ordering, apply bloom filters, and determine optimal file sizes for this dataset to enhance query performance.
A
Partition by transaction date, z-order by customer ID, apply bloom filters on transaction amount, and set file size to 128 MB.
B
Partition by customer ID, z-order by transaction date, apply bloom filters on customer ID, and set file size to 256 MB.
C
Partition by transaction amount, z-order by transaction date, apply bloom filters on transaction date, and set file size to 64 MB.
D
Partition by transaction type, z-order by transaction amount, apply bloom filters on transaction type, and set file size to 512 MB.