
Answer-first summary for fast verification
Answer: The overall cluster CPU utilization consistently hovers around 25%.
In Databricks, the cluster-level CPU metric in Ganglia displays an **aggregate average** across all nodes in the cluster, including the driver. In this scenario, there are **4 total nodes** (1 driver + 3 executors). If the driver is struggling with non-parallelized code (e.g., collecting too much data to the driver or using a non-distributed library like standard Pandas), its CPU usage will hit ~100%, while the 3 executors will sit idle at ~0%. **The Calculation:** `(100% [Driver] + 0% [Exec 1] + 0% [Exec 2] + 0% [Exec 3]) / 4 total nodes = 25%`. A persistent ~25% plateau in a 4-node cluster is a classic signature of a driver-side bottleneck. * **Other options (A, B, D, E)** are incorrect because they reflect general cluster activity or lack thereof, but they do not specifically isolate the driver's CPU contention relative to the executors. For instance, low network I/O or flat load averages could simply mean the job is waiting or has finished, whereas the specific 25% threshold mathematically identifies one active node out of four.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A production Databricks cluster is configured with one driver and three executor nodes, all utilizing the same virtual machine type. While monitoring the Ganglia Metrics dashboard, which of the following observations would most likely indicate a performance bottleneck caused by code executing sequentially on the driver rather than being parallelized?
A
Network I/O metrics remain stable without any significant spikes or fluctuations during the execution.
B
The 'Bytes Received' metric consistently stays below 80 million bytes per second.
C
The overall cluster CPU utilization consistently hovers around 25%.
D
The five-minute Load Average remains completely flat across the duration of the job.
E
The total disk space usage across all nodes in the cluster remains unchanged.