
Answer-first summary for fast verification
Answer: Data skew resulting from uneven distribution, where certain Spark partitions contain significantly more records than others.
The scenario described—where most tasks (minimum and median) finish quickly but one or a few tasks (maximum) take orders of magnitude longer—is a classic symptom of **Data Skew**. ### Why this is correct: Data skew occurs when the underlying data is not distributed evenly across partitions. The tasks responsible for the 'heavy' partitions become **stragglers**, processing far more records than their peers and preventing the stage from completing. ### Why other options are incorrect: * **Network/Regional Latency:** If cross-region data transfer were the issue, you would likely see an increase in the median duration across many tasks, not just a single outlier. * **Task Queuing:** Thread pool issues would result in tasks waiting to start even when cores are available, affecting task scheduling rather than individual task execution duration. * **Disk Spillover:** While spills to disk slow down tasks, they usually occur when executor memory is exceeded and tend to affect multiple tasks or lead to Out-of-Memory (OOM) errors rather than creating a single task that is 100x slower.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
While monitoring a Spark job, you observe in the Spark UI that for a specific stage, the minimum and median task durations are nearly identical. However, the maximum task duration is approximately 100 times longer than the minimum. What is the most likely cause of this performance bottleneck?
A
Network latency caused by cluster nodes residing in different geographic regions from the source data.
B
Task queuing delays stemming from an incorrectly configured executor thread pool.
C
Data skew resulting from uneven distribution, where certain Spark partitions contain significantly more records than others.
D
Disk spillover caused by insufficient attached volume storage for intermediate data processing.