Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

Explanation:

The MEMORY_ONLY storage level is strictly defined to store RDD partitions as deserialized Java objects in the JVM heap memory. Specifically, the useDisk flag for this level is set to False.

In a healthy MEMORY_ONLY configuration, Spark will never spill cached partitions to disk. If a partition does not fit in the allocated memory, Spark simply discards it and recomputes it when needed. Therefore, seeing any value greater than 0 for 'Size on Disk' in the Spark UI is a definitive indicator that the cache is not operating as expected—either it has fallen back to a level like MEMORY_AND_DISK or there is a configuration mismatch.

Why other options are incorrect:

The asterisk (*): This is expected behavior for MEMORY_ONLY when a dataset is larger than the available cache; it simply means some partitions are recomputed on demand.
Off-Heap Memory: MEMORY_ONLY does not use off-heap storage (useOffHeap=False), so ratios between on-heap and off-heap usage do not provide insights into cache health.
Size on Disk < Size in Memory: In a proper MEMORY_ONLY setup, Disk size should always be zero, so any non-zero value is the issue, regardless of its size relative to memory.

Explanation:

The MEMORY_ONLY storage level is strictly defined to store RDD partitions as deserialized Java objects in the JVM heap memory. Specifically, the useDisk flag for this level is set to False.

Why other options are incorrect:

The asterisk (*): This is expected behavior for MEMORY_ONLY when a dataset is larger than the available cache; it simply means some partitions are recomputed on demand.
Off-Heap Memory: MEMORY_ONLY does not use off-heap storage (useOffHeap=False), so ratios between on-heap and off-heap usage do not provide insights into cache health.
Size on Disk < Size in Memory: In a proper MEMORY_ONLY setup, Disk size should always be zero, so any non-zero value is the issue, regardless of its size relative to memory.

Comments (0)

No comments yet.

When reviewing the Storage tab in the Spark UI for a table supposedly cached with the `MEMORY_ONLY` storage level, which of the following indicators suggests that the caching strategy is not functioning optimally or as configured?

Real Exam

Last updated: January 6, 2026 at 15:41

The RDD Block Name includes the "*" annotation, indicating a failure to cache specific partitions.

16.7%

The number of Cached Partitions exceeds the total number of Spark Partitions.

20.8%

Size on Disk is greater than 0.

39.6%

On-Heap Memory Usage is within 75% of Off-Heap Memory Usage.

14.6%

Size on Disk is significantly smaller than Size in Memory.

8.3%

Databricks Certified Data Engineer - Professional

Get started today

Comments (0)

Get started today

Comments (0)

When reviewing the Storage tab in the Spark UI for a table supposedly cached with the MEMORY_ONLY storage level, which of the following indicators suggests that the caching strategy is not functioning optimally or as configured?

When reviewing the Storage tab in the Spark UI for a table supposedly cached with the `MEMORY_ONLY` storage level, which of the following indicators suggests that the caching strategy is not functioning optimally or as configured?