
Answer-first summary for fast verification
Answer: The RDD Block Name includes the “*” annotation signaling a failure to cache
When using the MEMORY_ONLY storage level, Spark stores RDDs as deserialized objects in the JVM heap. If memory is insufficient, some partitions may not be cached and will be recomputed on demand, leading to performance issues. Here's the analysis of the options: - **A**: Incorrect. MEMORY_ONLY does not use off-heap memory, so comparing On Heap and Off Heap Memory Usage is irrelevant. - **B**: Correct. The "*" annotation in the RDD Block Name indicates partitions that could not be cached and will be recomputed, signaling suboptimal performance. - **C**: Incorrect. MEMORY_ONLY does not spill to disk, so Size on Disk should always be 0. A value >0 suggests a different storage level is in use. - **D**: Incorrect. The number of cached partitions cannot exceed the original Spark partitions. This scenario is impossible and not an indicator of caching issues.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
What indicators in the Spark UI's Storage tab should a data engineer monitor to identify suboptimal performance of a cached table when using the MEMORY_ONLY storage level?
A
On Heap Memory Usage is within 75% of Off Heap Memory Usage
B
The RDD Block Name includes the “*” annotation signaling a failure to cache
C
Size on Disk is > 0
D
The number of Cached Partitions > the number of Spark Partitions