
Ultimate access to all questions.
While monitoring a Spark job, you observe in the Spark UI that for a specific stage, the minimum and median task durations are nearly identical. However, the maximum task duration is approximately 100 times longer than the minimum. What is the most likely cause of this performance bottleneck?
A
Network latency caused by cluster nodes residing in different geographic regions from the source data.
B
Task queuing delays stemming from an incorrectly configured executor thread pool.
C
Data skew resulting from uneven distribution, where certain Spark partitions contain significantly more records than others.
D
Disk spillover caused by insufficient attached volume storage for intermediate data processing.