
Explanation:
Delta Lake maintains a transaction log (the _delta_log) which contains JSON AddFile entries for every data file in the table. Each entry includes metadata such as column-level statistics (min/max/nulls) and the total numRecords for that specific file.
When a SELECT COUNT(*) query is executed, Databricks SQL leverages these pre-computed statistics by summing the numRecords values directly from the transaction log. This allows the engine to return the total row count without opening, reading, or scanning any actual data files, ensuring high performance even for massive datasets.
Why other options are incorrect:
Ultimate access to all questions.
No comments yet.
A Databricks SQL dashboard monitors the total record count of a Delta Lake table using the query SELECT COUNT(*) FROM table_name. How are the results efficiently generated when the dashboard is refreshed?
A
The record count is derived from the numRecords statistics stored within the Delta transaction log.
B
The row count is computed by performing a full data scan of all underlying Parquet files.
C
The record count is calculated by reading the metadata footers of every Parquet file in the table directory.
D
The record count is determined by querying statistics maintained in the Hive Metastore.
E
The results are exclusively returned from the Databricks SQL result cache, regardless of underlying data changes.