
Explanation:
Both cache() and persist() are lazily evaluated in Spark. This means an action such as df.count() or df.show() must be performed to trigger the caching process and make the DataFrame visible in the Spark UI's Storage Tab. The absence of the DataFrame in the UI immediately after persist() is due to this lazy evaluation, not a failure of the command.
Ultimate access to all questions.
A data engineering team is optimizing a complex pipeline handling trillions of rows per table. They choose to persist some frequently used DataFrames to speed up query processing. After executing the persist() command on a DataFrame, a data engineer checks the Spark UI's Storage Tab but finds no information about the persisted DataFrame. What could be the reason?
A
DataFrames persisted via persist() do not appear in the Storage Tab; only those persisted with cache() are visible in Spark UI's Storage Tab.
B
The details of the persisted DataFrame are exclusively available in Ganglia metrics.
C
The DataFrame details should appear in the Spark UI's Storage Tab right after the persist() command. Absence indicates the command failed to execute.
D
Because persist() is lazily evaluated, executing an action on the DataFrame is required to see the cached DataFrame in Spark UI.
E
persist() stores the DataFrame in memory, making it invisible in Spark UI.
No comments yet.