
Answer-first summary for fast verification
Answer: Check the data skew by analyzing the task metrics and repartition the data if necessary.
The presence of long-running tasks in a specific stage indicates a possible data skew issue. Analyzing task metrics can help identify if certain tasks are processing disproportionately larger amounts of data. Repartitioning the data can help distribute the load more evenly across tasks, thereby improving performance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Consider a scenario where a Spark application is experiencing high latency during the execution of a batch job. Using the Spark UI, you notice that the event timeline shows a significant number of long-running tasks in a specific stage. What steps would you take to diagnose and potentially resolve this performance issue?
A
Increase the number of executors and reduce the task parallelism.
B
Check the data skew by analyzing the task metrics and repartition the data if necessary.
C
Decrease the memory allocated to the executors.
D
Ignore the long-running tasks as they might be due to external factors.