
Answer-first summary for fast verification
Answer: Delta Lake automatically collects file-level statistics for the first 32 columns defined in a table's schema, which are used for data skipping.
### Explanation * **Option B is correct**: By default, Delta Lake collects statistics (min, max, null counts, and row counts) for the first 32 columns defined in the table schema. These statistics are stored in the transaction log and are used by the engine to perform **data skipping**, which avoids reading files that fall outside of query filter predicates, significantly improving performance. * **Option A is incorrect**: Z-ordering is not limited to numeric columns; it can be applied to any column type that allows for comparison and has statistics collected, including strings and dates. * **Option C is incorrect**: While Delta Lake supports the declaration of primary and foreign keys for metadata purposes (often used by BI tools), it does **not enforce** these constraints. Users must use `MERGE` or other logic to ensure data integrity and uniqueness. * **Option D is incorrect**: Standard views are logical definitions and do not cache data. Only Materialized Views or explicitly cached DataFrames hold data in a persistent or semi-persistent state.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Which of the following statements regarding the features and behaviors of Delta Lake is true?
A
Z-ordering is a multi-dimensional clustering technique that is restricted to use with numeric data types only.
B
Delta Lake automatically collects file-level statistics for the first 32 columns defined in a table's schema, which are used for data skipping.
C
Primary and foreign key constraints are strictly enforced by the Delta engine to prevent the insertion of duplicate records into dimension tables.
D
Standard views in the Lakehouse architecture maintain a persistent, up-to-date cache of the source table's data to optimize read performance.