
Answer-first summary for fast verification
Answer: Because `persist()` is lazily evaluated, executing an action on the DataFrame is required to see the cached DataFrame in Spark UI.
Both `cache()` and `persist()` are lazily evaluated in Spark. This means an action such as `df.count()` or `df.show()` must be performed to trigger the caching process and make the DataFrame visible in the Spark UI's Storage Tab. The absence of the DataFrame in the UI immediately after `persist()` is due to this lazy evaluation, not a failure of the command.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data engineering team is optimizing a complex pipeline handling trillions of rows per table. They choose to persist some frequently used DataFrames to speed up query processing. After executing the persist() command on a DataFrame, a data engineer checks the Spark UI's Storage Tab but finds no information about the persisted DataFrame. What could be the reason?
A
DataFrames persisted via persist() do not appear in the Storage Tab; only those persisted with cache() are visible in Spark UI's Storage Tab.
B
The details of the persisted DataFrame are exclusively available in Ganglia metrics.
C
The DataFrame details should appear in the Spark UI's Storage Tab right after the persist() command. Absence indicates the command failed to execute.
D
Because persist() is lazily evaluated, executing an action on the DataFrame is required to see the cached DataFrame in Spark UI.
E
persist() stores the DataFrame in memory, making it invisible in Spark UI.
No comments yet.