
Ultimate access to all questions.
You are tasked with optimizing the performance of a data processing pipeline that involves converting a Spark DataFrame to a pandas DataFrame for further analysis. What strategy would you employ to minimize the overhead associated with this conversion?
A
Reduce the number of columns in the Spark DataFrame before conversion.
B
Use Apache Arrow to facilitate zero-copy data transfers between Spark and pandas.
C
Increase the number of partitions in the Spark DataFrame.
D
Convert the Spark DataFrame to a pandas DataFrame in batches.