Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

Explanation:

When dealing with a DataFrame too large for available memory, selecting a storage level that permits spilling to disk is crucial to avoid OOM errors. The MEMORY_AND_DISK_SER storage level is ideal as it stores data in both memory and on disk in a serialized format. This approach not only allows for disk spilling when memory is full but also reduces the memory footprint compared to the MEMORY_ONLY option. While DISK_ONLY stores data solely on disk, it may not be efficient for operations requiring frequent data access. Therefore, MEMORY_AND_DISK_SER stands out as the optimal choice for caching large DataFrames, ensuring minimal OOM errors by leveraging disk storage when necessary.

Explanation:

Comments (0)

No comments yet.

To minimize Out Of Memory (OOM) errors when caching a DataFrame that exceeds available memory, which storage level should you choose to allow spilling to disk?

Real Exam

MEMORY_ONLY_SER

6.3%

DISK_ONLY

11.6%

MEMORY_AND_DISK_SER

72.6%

MEMORY_ONLY

9.5%