
Ultimate access to all questions.
A data engineering team is optimizing a complex pipeline handling trillions of rows per table. They choose to persist some frequently used DataFrames to speed up query processing. After executing the persist() command on a DataFrame, a data engineer checks the Spark UI's Storage Tab but finds no information about the persisted DataFrame. What could be the reason?
A
DataFrames persisted via persist() do not appear in the Storage Tab; only those persisted with cache() are visible in Spark UI's Storage Tab.
B
The details of the persisted DataFrame are exclusively available in Ganglia metrics.
C
The DataFrame details should appear in the Spark UI's Storage Tab right after the persist() command. Absence indicates the command failed to execute.
D
Because persist() is lazily evaluated, executing an action on the DataFrame is required to see the cached DataFrame in Spark UI.
E
persist() stores the DataFrame in memory, making it invisible in Spark UI.