
Answer-first summary for fast verification
Answer: A shuffle occurs prior to writing to try to group similar data together resulting in fewer files instead of each executor writing multiple files based on directory partitions.
Delta Lake's optimized writes feature is designed to improve the efficiency of data writes by reducing the number of small files through a shuffle operation that groups similar data together before writing. This process is accurately described in option E. Option A incorrectly suggests that OPTIMIZE is executed on all tables modified during the most recent job before a Jobs cluster terminates, which is not a feature of optimized writes. Option B describes auto-compaction, a different feature, and inaccurately states the default target size as 1 GB instead of the correct 128 MB. Option C introduces an unrelated concept of a messaging bus, which is not part of Delta Lake's optimized writes. Option D mentions logical partitions instead of directory partitions, which is not how optimized writes function. Therefore, the correct answer is E.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
What is the correct description of Delta Lake optimized writes?
A
Before a Jobs cluster terminates, OPTIMIZE is executed on all tables modified during the most recent job.
B
An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 1 GB.
C
Data is queued in a messaging bus instead of committing data directly to memory; all data is committed from the messaging bus in one batch once the job is complete.
D
Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.
E
A shuffle occurs prior to writing to try to group similar data together resulting in fewer files instead of each executor writing multiple files based on directory partitions.