
Explanation:
The MEMORY_ONLY storage level is strictly defined to store RDD partitions as deserialized Java objects in the JVM heap memory. Specifically, the useDisk flag for this level is set to False.
In a healthy MEMORY_ONLY configuration, Spark will never spill cached partitions to disk. If a partition does not fit in the allocated memory, Spark simply discards it and recomputes it when needed. Therefore, seeing any value greater than 0 for 'Size on Disk' in the Spark UI is a definitive indicator that the cache is not operating as expected—either it has fallen back to a level like MEMORY_AND_DISK or there is a configuration mismatch.
Why other options are incorrect:
MEMORY_ONLY when a dataset is larger than the available cache; it simply means some partitions are recomputed on demand.MEMORY_ONLY does not use off-heap storage (useOffHeap=False), so ratios between on-heap and off-heap usage do not provide insights into cache health.MEMORY_ONLY setup, Disk size should always be zero, so any non-zero value is the issue, regardless of its size relative to memory.Ultimate access to all questions.
When reviewing the Storage tab in the Spark UI for a table supposedly cached with the MEMORY_ONLY storage level, which of the following indicators suggests that the caching strategy is not functioning optimally or as configured?
A
The RDD Block Name includes the "*" annotation, indicating a failure to cache specific partitions.
B
The number of Cached Partitions exceeds the total number of Spark Partitions.
C
Size on Disk is greater than 0.
D
On-Heap Memory Usage is within 75% of Off-Heap Memory Usage.
E
Size on Disk is significantly smaller than Size in Memory.
No comments yet.