
Answer-first summary for fast verification
Answer: A shuffle occurs prior to writing to try to group similar data together resulting in fewer files instead of each executor writing multiple files based on directory partitions.
Delta Lake's optimized writes aim to reduce the number of small files by reorganizing data before writing. Option C correctly describes that a shuffle occurs to group data by partition columns, ensuring each task writes to fewer files. This contrasts with the default behavior where executors may write multiple files per partition. Option D is incorrect because Delta Lake still uses directory-based partitions; the metadata tracks partition values, but optimized writes do not replace directory partitions with logical ones. Instead, they optimize file size through shuffling.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
What is the correct description of Delta Lake's optimized writes feature?
A
Before a Jobs cluster terminates, OPTIMIZE is executed on all tables modified during the most recent job.
B
An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 1 GB.
C
A shuffle occurs prior to writing to try to group similar data together resulting in fewer files instead of each executor writing multiple files based on directory partitions.
D
Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.