Ultimate access to all questions.
A data engineering team is working with a large external Delta table that undergoes frequent changes. They've enabled Optimized Writes and Auto Compaction to automatically compact small data files into target files of 128 MB. Despite this, most data files in the table directory remain smaller than 128 MB. What is the most likely reason for these smaller file sizes?
Explanation:
The presence of many small files can reduce the amount of data rewritten during operations such as merges and deletes. Databricks can automatically adjust the file size of Delta tables for these operations, resulting in files smaller than the standard 128MB. This adjustment aims to shorten the duration of subsequent MERGE operations. For more details, refer to the Databricks documentation.