
Answer-first summary for fast verification
Answer: Z-Ordering
## Explanation **Z-Ordering** is the correct optimization technique for this scenario because: - **Problem Context**: The query is slow even after tuning file sizes, and the qualifying rows are "sparsely located throughout each of the data files" - **How Z-Ordering Helps**: Z-Ordering (also known as multidimensional clustering) physically co-locates related data based on multiple columns, which improves data locality for filtering operations - **Why Other Options Don't Work**: - **A. Data skipping**: Already occurs automatically in Delta Lake but doesn't help when qualifying rows are scattered throughout files - **C. Bin-packing**: Helps with file size optimization (already done) but doesn't address data locality - **D. Write as a Parquet file**: Delta tables already use Parquet format internally, so this wouldn't help - **E. Tuning the file size**: The team has already done this optimization **Key Benefit**: By applying Z-ORDER BY on the columns used in the filter condition, rows that meet the condition will be physically clustered together, reducing the number of files that need to be scanned and improving query performance significantly when filtering on those columns.
Author: LeetQuiz .
Ultimate access to all questions.
No comments yet.
Question 11
A data engineering team needs to query a Delta table to extract rows that all meet the same condition. However, the team has noticed that the query is running slowly. The team has already tuned the size of the data files. Upon investigating, the team has concluded that the rows meeting the condition are sparsely located throughout each of the data files.
Based on the scenario, which of the following optimization techniques could speed up the query?
A
Data skipping
B
Z-Ordering
C
Bin-packing
D
Write as a Parquet file
E
Tuning the file size