
Answer-first summary for fast verification
Answer: Delta Lake automatically collects statistics on the first 32 columns of each table, which are utilized for data skipping based on query filters.
### Explanation **Correct Answer: B** By default, when data is written into a Delta table, Delta Lake collects file-level statistics—such as minimum, maximum, null counts, and row counts—for the **first 32 columns** defined in the table schema. During query execution, the Delta engine leverages these statistics to perform **data skipping**, bypassing Parquet files whose values fall entirely outside the query's filter predicates. This significantly reduces I/O and accelerates query performance. **Why the other options are incorrect:** * **A:** While Delta Lake supports the declaration of primary and foreign key constraints, it **does not currently enforce** them. They are primarily used for metadata and documentation purposes; they do not prevent duplicate entries. * **C:** Z-ordering is **not restricted to numeric columns**. It can be applied to any column for which statistics are collected and that can be compared, including strings, dates, and timestamps. * **D:** In the Databricks Lakehouse, standard views are **logical definitions** rather than physical caches. They do not store data themselves. Only Materialized Views or explicitly cached DataFrames hold a snapshot of the data in memory or storage.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Which of the following statements regarding Delta Lake features and behaviors is true?
A
Delta Lake enforces primary and foreign key constraints to prevent duplicate records and maintain referential integrity within dimension tables.
B
Delta Lake automatically collects statistics on the first 32 columns of each table, which are utilized for data skipping based on query filters.
C
Z-ordering in Delta Lake is a data organization technique that is strictly limited to numeric columns.
D
Standard views in the Lakehouse always maintain an up-to-date, physical cache of the latest versions of their source tables to optimize performance.