
Answer-first summary for fast verification
Answer: Utilize Delta Lake's OPTIMIZE command to reorganize and merge small files into larger, more efficient files, improving query performance and reducing overhead.
The OPTIMIZE command in Delta Lake is specifically designed to address the issue of small files by merging them into larger files, which enhances performance without compromising data integrity or compliance. This approach is cost-effective, scalable, and aligns with data governance policies by maintaining transactional logs. Adjusting the file size limit (Option A) does not directly solve the existing small file issue, manually recreating the table (Option C) is inefficient and may not be feasible for large datasets, and disabling transaction logging (Option D) compromises data integrity and auditability, which are critical for compliance and governance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In your role as a data engineer, you are managing a Delta Lake table that has become inefficient due to a large number of small files, leading to performance degradation in your data processing pipeline. Considering the need for cost-effectiveness, compliance with data governance policies, and scalability, which of the following solutions BEST addresses this issue by leveraging Delta Lake's features? Choose the single best option.
A
Adjust the Delta Lake configuration to automatically increase the file size limit, thereby reducing the number of small files without manual intervention.
B
Utilize Delta Lake's OPTIMIZE command to reorganize and merge small files into larger, more efficient files, improving query performance and reducing overhead.
C
Manually delete the small files and then recreate the Delta table with a predefined larger file size to prevent the issue from recurring.
D
Disable Delta Lake's transaction logging feature to minimize the creation of small files, accepting the trade-off of losing transactional integrity and auditability.
No comments yet.