
Answer-first summary for fast verification
Answer: DataFrame.fliter()
A shuffle operation in Apache Spark is a mechanism for redistributing data across partitions, which can be expensive in terms of performance. Operations that require data to be grouped or sorted across partitions, such as joins (A), orderBy (C), distinct (D), and intersect (E), typically result in a shuffle. On the other hand, DataFrame.filter() (B) is a narrow transformation that processes data within the existing partitions without the need for data redistribution, making it the least likely to result in a shuffle.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.