
Answer-first summary for fast verification
Answer: The table is over-partitioned or incorrectly partitioned. This requires a full rewrite of all data files to resolve the issue.
Data that is over-partitioned or incorrectly partitioned will suffer greatly. Files cannot be combined or compacted across partition boundaries, so partitioned small tables increase storage costs and total number of files to scan. This leads to slowdowns for most general queries. Such an issue requires a full rewrite of all data files to remedy. Reference: https://docs.databricks.com/tables/partitions.html
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
The data engineering team has observed significant slowdowns in most general queries on a partitioned Delta Lake table. Despite running an OPTIMIZE command, the issue persists. What is the most likely cause of these slowdowns?
A
The transaction log is too large. Log files older than a certain age must be deleted or archived at partition boundaries.
B
They are applying the OPTIMIZE command without ZORDER. Z-ordering is needed on the partitioning columns.
C
The table has too many old data files that need to be purged. They need to run a VACUUM command instead.
D
The table is over-partitioned or incorrectly partitioned. This requires a full rewrite of all data files to resolve the issue.
E
They are applying the OPTIMIZE command on the whole table. It must be applied at each partition separately.
No comments yet.