
Ultimate access to all questions.
A Spark job is executing much slower than expected. Upon examining the Spark UI, a data engineer notices that within a specific stage, the minimum and median task durations are nearly identical, yet the maximum task duration is approximately 100 times longer than the minimum. What is the most likely cause for this performance bottleneck?
A
Disk spillover caused by insufficient attached volume storage for temporary data.
B
Data skew resulting from uneven distribution, where certain partitions contain significantly more records than others.
C
Task queuing delays resulting from an incorrectly configured thread pool.
D
Network latency caused by cluster nodes residing in different geographic regions from the source data.