
Answer-first summary for fast verification
Answer: On the **Query Detail screen** within the SQL tab, by analyzing the **Physical Plan** for the presence of `PushedFilters`.
The most direct and reliable way to verify predicate push-down is by examining the **Physical Plan** on the **Query Detail screen** (under the SQL tab) of the Spark UI. ### Key Analysis: * **Physical Plan:** In the SQL tab, clicking on a query execution reveals the plan visualization. By inspecting the scan node (e.g., `FileScan parquet` or `BatchScan`), you can look for the `PushedFilters` attribute. If filters are being pushed to the source, they will appear there. If they appear in a separate `Filter` node *above* the scan node, it indicates Spark is reading the data first and then filtering it in memory. * **Stage Detail:** The "Input" metric shows how much data was read, which can provide an indirect hint (e.g., more data read than expected), but it cannot confirm if the optimizer successfully pushed a specific filter to the storage layer. * **Executor Logs:** Optimizer decisions made by the Catalyst Optimizer are not recorded in the executor logs, which are primarily used for task-level debugging, garbage collection, and errors. * **Storage Tab:** This tab is dedicated to monitoring cached (persisted) datasets and does not provide information regarding query optimization or filter push-down. * **Delta Transaction Log:** While Delta logs contain metadata used for data skipping, they do not show the actual execution plan or Spark's final decision on predicate push-down for a specific query instance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data engineer suspects that a query is suffering from poor performance because Spark is failing to utilize predicate push-down. Where in the Spark UI can they definitively diagnose whether predicates are being pushed down to the data source?
A
In the Executor log files by searching for specific "predicate push-down" log entries generated during task execution.
B
On the Storage Detail screen by identifying which RDDs or DataFrames are currently cached on disk versus in memory.
C
Within the Delta Lake transaction log by examining the column statistics and JSON commit files for the target table.
D
On the Stage Detail screen by observing the "Input" column in the Completed Stages table to see the total bytes read.
E
On the Query Detail screen within the SQL tab, by analyzing the Physical Plan for the presence of PushedFilters.