
Answer-first summary for fast verification
Answer: MEMORY_AND_DISK_SER
When dealing with a DataFrame too large for available memory, selecting a storage level that permits spilling to disk is crucial to avoid OOM errors. The **MEMORY_AND_DISK_SER** storage level is ideal as it stores data in both memory and on disk in a serialized format. This approach not only allows for disk spilling when memory is full but also reduces the memory footprint compared to the **MEMORY_ONLY** option. While **DISK_ONLY** stores data solely on disk, it may not be efficient for operations requiring frequent data access. Therefore, **MEMORY_AND_DISK_SER** stands out as the optimal choice for caching large DataFrames, ensuring minimal OOM errors by leveraging disk storage when necessary.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.