
Explanation:
To perform a distributed groupby operation on a PySpark DataFrame using Pandas API on Spark, you first need to convert the Spark DataFrame to a Pandas on Spark DataFrame using the 'toPandasAPI()' method. Then, you can perform the groupby operation using the 'groupby()' method, followed by 'collect()' to gather the results. The correct code snippet is shown in option B, which demonstrates the distributed groupby operation using Pandas API on Spark.
Ultimate access to all questions.
Given a PySpark DataFrame named 'spark_df', write a code snippet that demonstrates how to perform a distributed groupby operation on a column named 'category' using Pandas API on Spark.
A
grouped = spark_df.groupby('category').collect()
B
grouped = spark_df.toPandasAPI().groupby('category').collect()
C
grouped = spark_df.groupby('category').toPandasAPI().collect()
D
grouped = spark_df.toPandasAPI().groupby('category').toSpark().collect()
No comments yet.