
Answer-first summary for fast verification
Answer: Increasing spark.sql.shuffle.partitions and enabling spark.shuffle.compress
The correct answer is **C. Increasing spark.sql.shuffle.partitions and enabling spark.shuffle.compress**. This combination is optimal because: 1. **Increasing spark.sql.shuffle.partitions** helps distribute data more evenly across partitions, reducing the amount of data shuffled in each partition and thereby minimizing shuffle spill and network I/O. 2. **Enabling spark.shuffle.compress** reduces the volume of data transferred over the network during shuffle operations by compressing the data beforehand, which enhances performance. While options A, B, and D involve important Spark settings for general optimization, they are not as directly effective for the specific goal of reducing shuffle spill and network I/O in jobs with extensive shuffle operations as option C.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In a Spark job that involves extensive shuffle operations due to wide transformations, which combination of settings is most effective for reducing shuffle spill and network I/O?
A
Tuning spark.executor.memory and spark.shuffle.file.buffer
B
Adjusting spark.memory.fraction and spark.reducer.maxSizeInFlight
C
Increasing spark.sql.shuffle.partitions and enabling spark.shuffle.compress
D
Configuring spark.default.parallelism and spark.shuffle.spill.compress