Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


Consider a dataset of financial transactions that includes columns like transaction_id, customer_id, amount, and timestamp. Describe how you would choose the appropriate partitioning strategy for this dataset to optimize query performance, considering the typical query patterns and the size of the dataset.




Explanation:

Partitioning by timestamp is often the most effective strategy for datasets where time-based queries are common. This allows for efficient pruning of partitions based on time ranges, which is a typical query pattern in datasets of financial transactions.