
Answer-first summary for fast verification
Answer: OPTIMIZE transactions ZORDER BY (transactionId)
To optimize query performance on a Delta table with many small files and a high-cardinality column like `transactionId`, the `OPTIMIZE` command with `ZORDER BY` is recommended. This technique improves read speed by co-locating related data in the same files, which Delta Lake's data-skipping algorithms can leverage to reduce the amount of data read. The correct syntax is `OPTIMIZE TABLE tableName ZORDER BY (columnName)`. Thus, the correct answer is `OPTIMIZE transactions ZORDER BY (transactionId)`. For more details, refer to [Z-Ordering | Databricks](https://docs.databricks.com/aws/en/delta/data-skipping#what-is-z-ordering).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data engineer is troubleshooting a slow-running query on a large Delta table named transactions. The issue is identified as being due to the table consisting of many small files and the transactionId column, which has high cardinality and is sparsely distributed across these files, being frequently used in filters. Which command should the data engineer use to optimize performance?
A
OPTIMIZE transactions ZORDER BY COLUMN transactionId
B
OPTIMIZE TABLE transactions ZORDER BY (transactionId)
C
OPTIMIZE TABLE transactions ZORDER BY COLUMN (transactionId)
D
OPTIMIZE transactions ZORDER BY (transactionId)
E
OPTIMIZE TABLE transactions ZORDER BY transactionId
No comments yet.