
Answer-first summary for fast verification
Answer: Increasing the number of partitions to distribute data more evenly across executors.
Increasing the number of partitions distributes the data more evenly across executors, reducing memory pressure on individual executors and preventing OutOfMemory errors. Decreasing spark.executor.memory may increase garbage collection overhead without solving memory issues. Persisting DataFrames to disk introduces I/O overhead, and allocating more memory to the driver does not directly address executor memory problems. Thus, increasing partitions is the most effective technique.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
When running a large Spark job that occasionally fails due to OutOfMemory errors, which technique can most effectively manage memory usage?
A
Using persist(StorageLevel.DISK_ONLY) for intermediate DataFrames.
B
Decreasing spark.executor.memory to force garbage collection to run more frequently.
C
Increasing the number of partitions to distribute data more evenly across executors.
D
Allocating more memory to the driver than executors to manage task distribution.
No comments yet.