
Explanation:
Over-partitioning or incorrect partitioning can significantly degrade performance. Since files cannot be combined or compacted across partition boundaries, tables with many small partitions incur higher storage costs and require scanning more files, leading to slower query performance. A table is likely over-partitioned if most of its partitions contain less than 1GB of data. Reference: Databricks Documentation on Partitions
Ultimate access to all questions.
No comments yet.
How can a data engineering team determine if their Delta Lake tables in the Lakehouse are over-partitioned?
A
If the partitioning columns are fields of low cardinality
B
If most partitions in the table have more than 1 GB of data
C
If the number of partitions in the table are too low
D
If most partitions in the table have less than 1 GB of data
E
If the data in the table continues to arrive indefinitely.