
Ultimate access to all questions.
A Spark job is running slower than anticipated. While analyzing the Spark UI, a data engineer observes that for tasks in a specific stage, the Min and Median task durations are approximately equal, but the Max task duration is about 100 times longer than the Min.
What issue is causing the overall job to take longer to complete?
A
Task queueing resulting from improper thread pool assignment.
B
Spill resulting from attached volume storage being too small.
C
Network latency due to some cluster nodes being in different regions from the source data
D
Skew caused by more data being assigned to a subset of spark-partitions.