Databricks Certified Data Engineer - Professional

Ultimate access to all questions.

The data science team needs help optimizing queries on unstructured text from user reviews, stored in Parquet with this schema:

item_id INT, 
user_id INT, 
review_id INT, 
rating FLOAT, 
review STRING

item_id INT, 
user_id INT, 
review_id INT, 
rating FLOAT, 
review STRING

The review column contains full user review text, and the team wants to detect occurrences of 30 specific keywords in this field.

A junior data engineer proposes that converting this data to Delta Lake will enhance query performance.

What is the accurate response to this suggestion?

Exam-Like

Delta Lake statistics are not optimized for free text fields with high cardinality.

49.2%

Comments

Loading comments...