
Answer-first summary for fast verification
Answer: The Parquet file footers are scanned for min and max statistics for the latitude column
Delta Lake optimizes query performance using data skipping, which leverages min and max statistics stored in Parquet file footers for each column. When a query with a filter (e.g., `latitude > 66.3`) is executed, the Delta engine checks these statistics to determine if a file can be skipped. Since the table is partitioned by `date` (not `latitude`), partition pruning is not applicable. Instead, the engine scans the Parquet footers of relevant files (based on the Delta log's file list) to assess if their `latitude` ranges could include values matching the filter. Options A and C are incorrect because Delta avoids full data loading unless necessary. Options D and E are incorrect because the Delta log and Hive metastore do not store per-file column statistics.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
How does the Delta engine determine which files to load when querying a Delta table partitioned by date with the schema: date DATE, device_id INT, temp FLOAT, latitude FLOAT, longitude FLOAT, using the filter condition latitude > 66.3 to find records within the Arctic Circle?
A
All records are cached to an operational database and then the filter is applied
B
The Parquet file footers are scanned for min and max statistics for the latitude column
C
All records are cached to attached storage and then the filter is applied
D
The Delta log is scanned for min and max statistics for the latitude column
E
The Hive metastore is scanned for min and max statistics for the latitude column
No comments yet.