Databricks Certified Associate Developer for Apache Spark

Get started today

Ultimate access to all questions.

Explanation:

A shuffle operation in Apache Spark is a mechanism for redistributing data across partitions, which is often necessary for operations that require data to be grouped or aggregated in a certain way. Among the options provided, DataFrame.join() is the operation most likely to result in a shuffle because it requires matching keys from different DataFrames to be co-located on the same partition. This is especially true when the DataFrames are not already partitioned by the join keys. On the other hand, DataFrame.filter(), DataFrame.where(), and DataFrame.drop() are narrow transformations that do not require data to be moved across partitions, as they operate on each partition independently. DataFrame.union() also does not typically result in a shuffle, as it simply combines the partitions of two DataFrames without the need for data redistribution. Therefore, the correct answer is A.

Explanation:

Comments (0)

No comments yet.

Which of the following operations is most likely to cause a shuffle in Apache Spark?

Exam-Like

Last updated: May 25, 2026 at 14:02

DataFrame.join()

88.2%

DataFrame.filter()

2.2%

DataFrame.union()

4.3%

DataFrame.where()

3.2%

DataFrame.drop()

2.2%