
Ultimate access to all questions.
A production workload uses a continuously running Structured Streaming job to incrementally apply updates from an external Change Data Capture feed to a Delta Lake table. During the initial data migration, OPTIMIZE was executed, resulting in most data files being resized to 1 GB. Both Auto Optimize and Auto Compaction were enabled for the streaming production job. However, a recent review reveals that most data files are now under 64 MB, despite each partition containing at least 1 GB of data and the total table size exceeding 10 TB.
What is the most likely explanation for these smaller file sizes?
A
Databricks has autotuned to a smaller target file size to reduce duration of MERGE operations
B
Z-order indices calculated on the table are preventing file compaction
C
Bloom filter indices calculated on the table are preventing file compaction
D
Databricks has autotuned to a smaller target file size based on the overall size of data in the table
E
Databricks has autotuned to a smaller target file size based on the amount of data in each partition