
Answer-first summary for fast verification
Answer: Size on Disk is greater than 0.
The `MEMORY_ONLY` storage level is strictly defined to store RDD partitions as deserialized Java objects in the JVM heap memory. Specifically, the `useDisk` flag for this level is set to `False`. In a healthy `MEMORY_ONLY` configuration, Spark will **never** spill cached partitions to disk. If a partition does not fit in the allocated memory, Spark simply discards it and recomputes it when needed. Therefore, seeing any value **greater than 0 for 'Size on Disk'** in the Spark UI is a definitive indicator that the cache is not operating as expected—either it has fallen back to a level like `MEMORY_AND_DISK` or there is a configuration mismatch. **Why other options are incorrect:** * **The asterisk (*)**: This is expected behavior for `MEMORY_ONLY` when a dataset is larger than the available cache; it simply means some partitions are recomputed on demand. * **Off-Heap Memory**: `MEMORY_ONLY` does not use off-heap storage (`useOffHeap=False`), so ratios between on-heap and off-heap usage do not provide insights into cache health. * **Size on Disk < Size in Memory**: In a proper `MEMORY_ONLY` setup, Disk size should always be zero, so any non-zero value is the issue, regardless of its size relative to memory.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
When reviewing the Storage tab in the Spark UI for a table supposedly cached with the MEMORY_ONLY storage level, which of the following indicators suggests that the caching strategy is not functioning optimally or as configured?
A
The RDD Block Name includes the "*" annotation, indicating a failure to cache specific partitions.
B
The number of Cached Partitions exceeds the total number of Spark Partitions.
C
Size on Disk is greater than 0.
D
On-Heap Memory Usage is within 75% of Off-Heap Memory Usage.
E
Size on Disk is significantly smaller than Size in Memory.
No comments yet.