
Ultimate access to all questions.
A Spark application has a 128 GB DataFrame A and a 1 GB DataFrame B. If a broadcast join were to be performed on these two DataFrames, which of the following describes which DataFrame should be broadcast and why?
A
Either DataFrame can be broadcasted. Their results will be identical in result and efficiency.
B
DataFrame B should be broadcasted because it is smaller and will eliminate the need for the shuffling of itself.
C
DataFrame A should be broadcasted because it is larger and will eliminate the need for the shuffling of DataFrame B.
D
DataFrame B should be broadcasted because it is smaller and will eliminate the need for the shuffling of DataFrame A.
E
DataFrame A should be broadcasted because it is smaller and will eliminate the need for the shuffling of itself.