
Answer-first summary for fast verification
Answer: DataFrame.join()
A shuffle operation in Apache Spark is a mechanism for redistributing data across partitions, which is often necessary for operations that require data to be grouped or aggregated in a certain way. Among the options provided, DataFrame.join() is the operation most likely to result in a shuffle because it requires matching keys from different DataFrames to be co-located on the same partition. This is especially true when the DataFrames are not already partitioned by the join keys. On the other hand, DataFrame.filter(), DataFrame.where(), and DataFrame.drop() are narrow transformations that do not require data to be moved across partitions, as they operate on each partition independently. DataFrame.union() also does not typically result in a shuffle, as it simply combines the partitions of two DataFrames without the need for data redistribution. Therefore, the correct answer is A.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.