
Answer-first summary for fast verification
Answer: Partition the data by event timestamp at a daily granularity and employ Z-ordering on frequently queried columns.
Partitioning the data by event timestamp at a daily granularity allows for efficient data retrieval based on time intervals, a common query pattern for streaming event data. This method enables queries to easily filter and access data within specific time ranges, enhancing query performance. Additionally, applying Z-ordering to frequently queried columns improves performance by organizing data on disk according to these columns' values, reducing the data scanned during queries and speeding up processing. This approach is optimal for managing and querying streaming event data in Delta Lake with Databricks.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are tasked with designing a Delta Lake for storing streaming event data ingested via Databricks, with expectations of significant data growth. What strategy would you employ to optimize the Delta Lake table for both query performance and efficient data management?
A
Store all data in a single partition to simplify data management and rely on Delta Lake‘s automatic optimization.
B
Partition the data by event type and ensure all queries include a filter on this partition key.
C
Use a single column for partitioning and avoid Z-ordering to keep the data model simple.
D
Partition the data by event timestamp at a daily granularity and employ Z-ordering on frequently queried columns.
No comments yet.