LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Databricks Certified Associate Developer for Apache Spark

Databricks Certified Associate Developer for Apache Spark

Get started today

Ultimate access to all questions.


A Spark application has a 128 GB DataFrame A and a 1 GB DataFrame B. If a broadcast join were to be performed on these two DataFrames, which of the following describes which DataFrame should be broadcast and why?

Exam-Like




Explanation:

In a broadcast join, the smaller DataFrame (B, 1 GB) is sent to all executors to avoid shuffling the larger DataFrame (A, 128 GB). Broadcasting B eliminates the need to shuffle B itself (since it is distributed via broadcast) and also avoids shuffling A (as each partition of A can join locally with B). Both options B and D correctly describe parts of this reasoning: B focuses on avoiding B's shuffle, while D emphasizes avoiding A's shuffle, which is critical due to A's larger size. Thus, both B and D are correct.

Powered ByGPT-5