
Answer-first summary for fast verification
Answer: grouped = spark_df.toPandasAPI().groupby('category').collect()
To perform a distributed groupby operation on a PySpark DataFrame using Pandas API on Spark, you first need to convert the Spark DataFrame to a Pandas on Spark DataFrame using the 'toPandasAPI()' method. Then, you can perform the groupby operation using the 'groupby()' method, followed by 'collect()' to gather the results. The correct code snippet is shown in option B, which demonstrates the distributed groupby operation using Pandas API on Spark.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Given a PySpark DataFrame named 'spark_df', write a code snippet that demonstrates how to perform a distributed groupby operation on a column named 'category' using Pandas API on Spark.
A
grouped = spark_df.groupby('category').collect()
B
grouped = spark_df.toPandasAPI().groupby('category').collect()
C
grouped = spark_df.groupby('category').toPandasAPI().collect()
D
grouped = spark_df.toPandasAPI().groupby('category').toSpark().collect()