Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
Consider a scenario where you have a large dataset stored in a distributed file system, and you need to perform some data preprocessing and analysis using Pandas-like operations. How would you approach this task using Pandas API on Spark?
A
Read the entire dataset into a Pandas DataFrame and perform the preprocessing and analysis locally.
B
Read the dataset in chunks, perform the preprocessing and analysis on each chunk using Pandas, and then combine the results.
C
Read the dataset into a Spark DataFrame, convert it to a Pandas on Spark DataFrame, and perform the preprocessing and analysis using the Pandas on Spark APIs.
D
Use native Spark operations to perform the preprocessing and analysis, as it is more efficient for large datasets.