
Explanation:
When using the MEMORY_ONLY storage level, Spark stores RDDs as deserialized objects in the JVM heap. If memory is insufficient, some partitions may not be cached and will be recomputed on demand, leading to performance issues. Here's the analysis of the options:
Ultimate access to all questions.
What indicators in the Spark UI's Storage tab should a data engineer monitor to identify suboptimal performance of a cached table when using the MEMORY_ONLY storage level?
A
On Heap Memory Usage is within 75% of Off Heap Memory Usage
B
The RDD Block Name includes the “*” annotation signaling a failure to cache
C
Size on Disk is > 0
D
The number of Cached Partitions > the number of Spark Partitions
No comments yet.