Ultimate access to all questions.
The data science team needs help optimizing queries on unstructured text from user reviews, stored in Parquet with this schema:
item_id INT,
user_id INT,
review_id INT,
rating FLOAT,
review STRING
The review
column contains full user review text, and the team wants to detect occurrences of 30 specific keywords in this field.
A junior data engineer proposes that converting this data to Delta Lake will enhance query performance.
What is the accurate response to this suggestion?