
Explanation:
The Optimized Write feature in Delta Lake improves file sizes by performing a synchronous shuffle of data across executors before the actual write phase. By grouping records destined for the same partition into the same task, it prevents the creation of many small files.
Key Distinctions:
OPTIMIZE is a separate command that must be invoked manually or scheduled, and it does not run automatically upon cluster termination.path/year=2023/).Ultimate access to all questions.
No comments yet.
Which of the following statements best describes the mechanism used by Delta Lake's Optimized Write feature?
A
Before the data is written, a shuffle process is implemented to consolidate similar data, reducing the total number of files generated per partition compared to executors writing files independently.
B
Following the completion of a write, an asynchronous background job examines if files can be further compacted and triggers an OPTIMIZE job, targeting a default size of 1 GB.
C
The OPTIMIZE command is automatically executed on all tables modified during the most recent job session immediately before a Jobs cluster terminates.
D
Optimized writes utilize logical partitions stored in metadata rather than physical directory partitions, effectively eliminating the 'small file' problem by managing boundaries logically.
E
Data is buffered in an external messaging bus instead of being directly committed to memory, with all data being flushed and committed once the job concludes.