Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
Your Spark application is experiencing performance issues due to large shuffle operations during aggregation. Which of the following strategies would most effectively reduce the size of these shuffle operations?
A
Use coalesce to reduce the number of partitions just before aggregation.
B
Increase the default parallelism to create more tasks and thus reduce data per task.
C
Apply repartition based on the aggregation key before performing the aggregation.
D
Aggregate data locally on each partition with mapPartitions before the global aggregation.