
Ultimate access to all questions.
Identify the logical error in the following code block intended to efficiently perform a broadcast join between DataFrame storesDF and the much larger DataFrame employeesDF using the key column storeId. The current implementation may contain inefficiencies.
Code block:
storesDF.join(broadcast(employeesDF), "storeId")
storesDF.join(broadcast(employeesDF), "storeId")
A
The larger DataFrame employeesDF is being broadcasted rather than the smaller DataFrame storesDF.
B
There is never a need to call the broadcast() operation in Apache Spark 3.
C
The entire line of code should be wrapped in broadcast() rather than just DataFrame employeesDF.
D
The broadcast() operation will only perform a broadcast join if the Spark property spark.sql.autoBroadcastJoinThreshold is manually set.
E
Only one of the DataFrames is being broadcasted rather than both of the DataFrames.