
Explanation:
The most direct and reliable way to verify predicate push-down is by examining the Physical Plan on the Query Detail screen (under the SQL tab) of the Spark UI.
FileScan parquet or BatchScan), you can look for the PushedFilters attribute. If filters are being pushed to the source, they will appear there. If they appear in a separate Filter node above the scan node, it indicates Spark is reading the data first and then filtering it in memory.Ultimate access to all questions.
A data engineer suspects that a query is suffering from poor performance because Spark is failing to utilize predicate push-down. Where in the Spark UI can they definitively diagnose whether predicates are being pushed down to the data source?
A
In the Executor log files by searching for specific "predicate push-down" log entries generated during task execution.
B
On the Storage Detail screen by identifying which RDDs or DataFrames are currently cached on disk versus in memory.
C
Within the Delta Lake transaction log by examining the column statistics and JSON commit files for the target table.
D
On the Stage Detail screen by observing the "Input" column in the Completed Stages table to see the total bytes read.
E
On the Query Detail screen within the SQL tab, by analyzing the Physical Plan for the presence of PushedFilters.
No comments yet.