Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
In the context of Pandas UDFs, explain the concept of caching and its benefits when working with intermediate datasets in Spark. Provide an example of how you would use caching in a Pandas UDF.
A
Caching is a technique where intermediate datasets are stored in memory to enable faster access and processing.
B
Caching is a technique where intermediate datasets are stored on disk to save storage space.
C
Caching is a technique where intermediate datasets are partitioned across multiple nodes in a cluster to enable parallel processing.
D
Caching is not applicable when working with Pandas UDFs in Spark.