
Answer-first summary for fast verification
Answer: Statistics in the Delta Log will be used to identify data files that might include records in the filtered range.
Delta Lake maintains statistics (min/max values) for each data file in the Delta Log. When a query includes a filter on a non-partitioned column (longitude here), the Delta Engine uses these statistics to skip data files that cannot possibly contain matching rows. Partitioning by date does not directly help with longitude filtering, but file-level statistics still enable efficient pruning. Option D correctly describes this behavior. Options A and B are incorrect because partitioning (date) is unrelated to longitude filtering. Option C is incorrect because file skipping occurs via Delta Log stats, not Parquet footer scanning during initial pruning.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A Delta Lake table with the schema user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE is partitioned by the date column. When executing a query with the filter condition longitude < 20 AND longitude > -20, how will the data be filtered?
A
Statistics in the Delta Log will be used to identify partitions that might Include files in the filtered range.
B
No file skipping will occur because the optimizer does not know the relationship between the partition column and the longitude.
C
The Delta Engine will scan the parquet file footers to identify each row that meets the filter criteria.
D
Statistics in the Delta Log will be used to identify data files that might include records in the filtered range.