
Answer-first summary for fast verification
Answer: Adjust spark.sql.autoBroadcastJoinThreshold to control the size of data being broadcasted.
Correct answer is **C. Adjust spark.sql.autoBroadcastJoinThreshold to control the size of data being broadcasted.** This adjustment is the most effective for addressing OutOfMemory errors during large join operations in Spark without over-provisioning resources. It ensures that only smaller tables are broadcasted, while larger tables are processed in a memory-efficient manner through shuffling. Other options either do not directly address the root cause or may lead to resource wastage or increased overhead without solving the memory issue during large joins.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
When encountering OutOfMemory errors during a large join operation in Spark, which configuration adjustment specifically targets this issue without unnecessarily allocating extra resources?
A
Set spark.memory.fraction to a higher value to allocate more memory for shuffle operations.
B
Increase spark.driver.memory to provide more memory for join operations.
C
Adjust spark.sql.autoBroadcastJoinThreshold to control the size of data being broadcasted.
D
Decrease spark.executor.memory to force more frequent garbage collection.
No comments yet.