
Answer-first summary for fast verification
Answer: The table does not leverage file skipping because Delta Lake statistics are uninformative for string fields with very high cardinality.
Delta Lake automatically captures statistics in the transaction log for each added data file of the table. By default, statistics are collected on the first 32 columns of each table. However, for string fields with very high cardinality (like free text fields), these statistics are generally uninformative. To improve performance, it's recommended to omit such fields from statistics collection by placing them after the first 32 columns in the schema. Reference: [Delta Lake Documentation](https://docs.databricks.com/delta/data-skipping.html).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
The data engineering team is experiencing performance issues with a large Delta table named 'user_messages' when queries include filters on the 'msg_body' field, which contains free-form text. What could be the reason for this performance issue?
A
The table does not leverage file skipping because Delta Lake statistics are uninformative for string fields with very high cardinality.
B
The table does not leverage file skipping because it's not partitioned on the 'msg_body' column.
C
The table does not leverage file skipping because Delta Lake statistics are not captured on columns of type STRING.
D
The table does not leverage file skipping because it's not optimized with Z-ORDER on the 'msg_body' column.
E
The table does not leverage file skipping because Delta Lake statistics are only captured on the first 3 columns in a table.