
Databricks Certified Data Engineer - Professional
Get started today
Ultimate access to all questions.
A production workload uses a continuously running Structured Streaming job to incrementally apply updates from an external Change Data Capture feed to a Delta Lake table. During the initial data migration, OPTIMIZE was executed, resulting in most data files being resized to 1 GB. Both Auto Optimize and Auto Compaction were enabled for the streaming production job. However, a recent review reveals that most data files are now under 64 MB, despite each partition containing at least 1 GB of data and the total table size exceeding 10 TB.
What is the most likely explanation for these smaller file sizes?
A production workload uses a continuously running Structured Streaming job to incrementally apply updates from an external Change Data Capture feed to a Delta Lake table. During the initial data migration, OPTIMIZE was executed, resulting in most data files being resized to 1 GB. Both Auto Optimize and Auto Compaction were enabled for the streaming production job. However, a recent review reveals that most data files are now under 64 MB, despite each partition containing at least 1 GB of data and the total table size exceeding 10 TB.
What is the most likely explanation for these smaller file sizes?
Exam-Like