
Ultimate access to all questions.
A data science team needs to optimize queries on a free-form text column (review) within a Parquet table. The current schema includes item_id INT, user_id INT, review_id INT, rating FLOAT, and review STRING. The team frequently searches for specific keywords within the review field. A junior data engineer suggests that migrating to Delta Lake will automatically improve query performance for these keyword searches. How should you evaluate this suggestion?_
A
Applying ZORDER BY review will reorganize the data to provide significant performance gains for keyword-based predicates.
B
Delta Lake's data-skipping statistics are not optimized for high-cardinality, free-form text fields like the review column.
C
The Delta transaction log generates a term matrix for string fields, enabling highly selective filtering for specific keywords.
D
Delta Lake only collects file-level statistics for the first 4 columns of a table by default, meaning the review column (the 5th column) will be ignored.