
Answer-first summary for fast verification
Answer: OPTIMIZE orders WHERE order_date >= ‘2022-01-01‘
The `OPTIMIZE` command in Delta Lake on Databricks is used to improve query performance by coalescing small files into larger ones. To optimize a subset of data, you can use the `WHERE` clause to specify a partition predicate. The correct syntax for optimizing the `orders` table for data from the year 2022 is `OPTIMIZE orders WHERE order_date >= ‘2022-01-01‘`. This command will only compact files that meet the specified condition, improving the efficiency of read operations on that subset of data.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data engineering team is optimizing performance and notices a query on the orders Delta table, which contains a large amount of data, is running too slowly due to many small files. They aim to trigger compaction for the year 2022 data only using the OPTIMIZE command. Which command should they use?
A
OPTIMIZE TABLE orders WHEN order_date >= ‘2022-01-01‘
B
OPTIMIZE orders WHERE order_date >= ‘2022-01-01‘
C
OPTIMIZE TABLE orders WHERE order_date >= ‘2022-01-01‘
D
OPTIMIZE orders FILTER BY order_date >= ‘2022-01-01‘
E
OPTIMIZE TABLE orders USING order_date >= ‘2022-01-01‘
No comments yet.