Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


A production workload uses a continuously running Structured Streaming job to incrementally apply updates from an external Change Data Capture feed to a Delta Lake table. During the initial data migration, OPTIMIZE was executed, resulting in most data files being resized to 1 GB. Both Auto Optimize and Auto Compaction were enabled for the streaming production job. However, a recent review reveals that most data files are now under 64 MB, despite each partition containing at least 1 GB of data and the total table size exceeding 10 TB.

What is the most likely explanation for these smaller file sizes?