
Answer-first summary for fast verification
Answer: Databricks has autotuned to a smaller target file size to reduce duration of MERGE operations
The scenario describes a Delta Lake table with Auto Optimize and Auto Compaction enabled, where files are smaller than expected despite initial optimization to 1 GB. This is likely due to Databricks autotuning the target file size to reduce the duration of MERGE operations, which are used for applying updates from the Change Data Capture feed. This autotuning overrides the default compaction behavior to optimize for the efficiency of MERGE operations, leading to smaller file sizes. Options B and C are unrelated to the issue as they involve indexing, which is not mentioned in the problem. Options D and E suggest autotuning based on the size of the table or partitions, which is not the documented behavior for this scenario.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A production workload uses a continuously running Structured Streaming job to incrementally apply updates from an external Change Data Capture feed to a Delta Lake table. During the initial data migration, OPTIMIZE was executed, resulting in most data files being resized to 1 GB. Both Auto Optimize and Auto Compaction were enabled for the streaming production job. However, a recent review reveals that most data files are now under 64 MB, despite each partition containing at least 1 GB of data and the total table size exceeding 10 TB.
What is the most likely explanation for these smaller file sizes?
A
Databricks has autotuned to a smaller target file size to reduce duration of MERGE operations
B
Z-order indices calculated on the table are preventing file compaction
C
Bloom filter indices calculated on the table are preventing file compaction
D
Databricks has autotuned to a smaller target file size based on the overall size of data in the table
E
Databricks has autotuned to a smaller target file size based on the amount of data in each partition