
Answer-first summary for fast verification
Answer: OPTIMIZE
## Explanation The correct answer is **OPTIMIZE**. ### Why OPTIMIZE is correct: 1. **OPTIMIZE** is a Delta Lake command specifically designed to compact small files into larger ones to improve read performance. 2. When you run `OPTIMIZE` on a Delta table, it rewrites small files into larger, more efficient files. 3. This process is often referred to as "compaction" or "file compaction." ### Why other options are incorrect: - **VACUUM**: This command is used to remove old, unused data files that are no longer referenced by the Delta table (garbage collection), not for compacting small files. - **COMPACTIOON**: This appears to be a typo/misspelling of "compaction" and is not a valid Delta Lake command. - **REPARTITION**: While this can affect file sizes by changing the number of partitions, it's not specifically designed for compacting existing small files. `REPARTITION` is a Spark transformation that redistributes data across partitions, but `OPTIMIZE` is the dedicated command for file compaction in Delta Lake. ### Key Points: - Use `OPTIMIZE` to improve query performance by compacting small files - The syntax is: `OPTIMIZE table_name` - You can also use `OPTIMIZE table_name WHERE` to optimize specific partitions - This operation is idempotent and can be run multiple times without issues
Author: Keng Suppaseth
Ultimate access to all questions.
A data engineer has realized that the data files associated with a Delta table are incredibly small. They want to compact the small files to form larger files to improve performance.
Which keyword can be used to compact the small files?
A
OPTIMIZE
B
VACUUM
C
COMPACTIOON
D
REPARTITION
No comments yet.