
Explanation:
The correct answer is C. Increasing spark.sql.shuffle.partitions and enabling spark.shuffle.compress. This combination is optimal because:
While options A, B, and D involve important Spark settings for general optimization, they are not as directly effective for the specific goal of reducing shuffle spill and network I/O in jobs with extensive shuffle operations as option C.
Ultimate access to all questions.
In a Spark job that involves extensive shuffle operations due to wide transformations, which combination of settings is most effective for reducing shuffle spill and network I/O?
A
Tuning spark.executor.memory and spark.shuffle.file.buffer
B
Adjusting spark.memory.fraction and spark.reducer.maxSizeInFlight
C
Increasing spark.sql.shuffle.partitions and enabling spark.shuffle.compress
D
Configuring spark.default.parallelism and spark.shuffle.spill.compress
No comments yet.