
Ultimate access to all questions.
Where in the Spark UI can one diagnose a performance problem induced by not leveraging predicate push-down?
A
In the Executor's log file, by grepping for "predicate push-down"
B
In the Stage's Detail screen, in the Completed Stages table, by noting the size of data read from the Input column
C
In the Storage Detail screen, by noting which RDDs are not stored on disk
D
In the Delta Lake transaction log, by noting the column statistics
E
In the Query Detail screen, by interpreting the Physical Plan
Explanation:
The correct answer is E. In the Query Detail screen, by interpreting the Physical Plan.
Explanation:
Predicate push-down is a query optimization technique where filter conditions (predicates) are pushed down to the data source level to reduce the amount of data read from storage.
Spark UI's Query Detail screen provides detailed information about query execution, including the Physical Plan. The Physical Plan shows how Spark will execute the query, including whether filters are applied early (at scan time) or later in the execution pipeline.
Diagnosing predicate push-down issues:
Scan → Filter instead of Scan with PushedFilters, it indicates predicate push-down is not being leveraged.Why other options are incorrect:
Key indicators in Physical Plan:
PushedFilters in scan operationsBy analyzing the Physical Plan in the Query Detail screen, you can identify whether predicate push-down optimization is being applied and diagnose performance problems caused by its absence.