Databricks Certified Associate Developer for Apache Spark

Get started today

Ultimate access to all questions.

Explanation:

A shuffle operation in Apache Spark is a mechanism for redistributing data across partitions, which can be expensive in terms of performance. Operations that require data to be grouped or sorted across partitions, such as joins (A), orderBy (C), distinct (D), and intersect (E), typically result in a shuffle. On the other hand, DataFrame.filter() (B) is a narrow transformation that processes data within the existing partitions without the need for data redistribution, making it the least likely to result in a shuffle.

Explanation:

A shuffle operation in Apache Spark is a mechanism for redistributing data across partitions, which can be expensive in terms of performance. Operations that require data to be grouped or sorted across partitions, such as joins (A), orderBy (C), distinct (D), and intersect (E), typically result in a shuffle. On the other hand, DataFrame.filter() (B) is a narrow transformation that processes data within the existing partitions without the need for data redistribution, making it the least likely to result in a shuffle.

Comments (0)

No comments yet.

Get started today

Ultimate access to all questions.

Comments (0)

No comments yet.

Which of the following operations is least likely to cause a shuffle in Spark?

Exam-Like

Last updated: February 20, 2026 at 14:03

0

A

DataFrame.join()

11.4%

B

DataFrame.fliter()

57.2%

C

DataFrame.orderBy()

9.0%

D

DataFrame.distinct()

12.7%

E

DataFrame.intersect()

9.6%

Powered ByGPT 5.4 powered