
Ultimate access to all questions.
The data science team needs help optimizing queries on unstructured text from user reviews, stored in Parquet with this schema:
item_id INT,
user_id INT,
review_id INT,
rating FLOAT,
review STRING
item_id INT,
user_id INT,
review_id INT,
rating FLOAT,
review STRING
The review column contains full user review text, and the team wants to detect occurrences of 30 specific keywords in this field.
A junior data engineer proposes that converting this data to Delta Lake will enhance query performance.
What is the accurate response to this suggestion?_
A
Delta Lake statistics are not optimized for free text fields with high cardinality.
B
Delta Lake statistics are only collected on the first 4 columns in a table.
C
ZORDER ON review will need to be run to see performance gains.
D
The Delta log creates a term matrix for free text fields to support selective filtering.