
Answer-first summary for fast verification
Answer: Z-Ordering
Z-Ordering is a method that rearranges the rows in a table to physically group related items closer together on the disk. This approach is especially beneficial for queries that filter on multiple columns, as it decreases the volume of data scanned by storing related information in the same files. In the given situation, where the rows that meet the query condition are spread out across the data files, Z-Ordering can markedly enhance query performance by reducing the amount of data that needs to be read. While data skipping and bin-packing are also optimization techniques, they are not as effective for this specific issue. Data skipping avoids irrelevant data based on statistics, and bin-packing optimizes file size and layout without specifically tackling the challenge of accessing dispersed data. Converting to a Parquet file and adjusting file size are general optimizations that do not directly solve the problem of efficiently retrieving sparsely located rows.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data engineering team is facing slow query performance when extracting specific rows from a Delta table, with the relevant rows spread across various data files. Even after optimizing the data file sizes, the problem continues. What optimization strategy could be employed to improve the query speed in this scenario?
A
Writing as a Parquet file
B
Data skipping
C
Z-Ordering
D
Bin-packing