
Ultimate access to all questions.
A data engineering team notices that their Delta Lake table has accumulated a large number of small Parquet files due to frequent streaming inserts and batch upserts. They want to improve the performance of analytical queries that scan this table.
Which statement best describes how the Delta Lake OPTIMIZE command addresses this issue, and what is the primary benefit for query performance?
A
The OPTIMIZE command removes obsolete data files from the Delta table, freeing up storage and improving query speed by reducing table size.
B
The OPTIMIZE command consolidates many small data files into fewer, larger files, which reduces file management overhead and improves query performance by minimizing the number of file reads during scans.
C
The OPTIMIZE command sorts the data within files based on specified columns, but does not change the number or size of files, thus having minimal impact on query performance.
D
The OPTIMIZE command encrypts all data files in the Delta table, ensuring data security and indirectly improving query performance by reducing unauthorized access.