Reddit

Identify the logical error in the following code block intended to efficiently perform a broadcast join between DataFrame storesDF and the much larger DataFrame employeesDF using the key column storeId. The current implementation may contain inefficiencies.

Code block:

storesDF.join(broadcast(employeesDF), "storeId")

storesDF.join(broadcast(employeesDF), "storeId")

Exam-Like

The larger DataFrame employeesDF is being broadcasted rather than the smaller DataFrame storesDF.

75.7%

There is never a need to call the broadcast() operation in Apache Spark 3.

5.4%

The entire line of code should be wrapped in broadcast() rather than just DataFrame employeesDF.

5.4%

The broadcast() operation will only perform a broadcast join if the Spark property spark.sql.autoBroadcastJoinThreshold is manually set.

8.1%

Only one of the DataFrames is being broadcasted rather than both of the DataFrames.

5.4%

Databricks Certified Associate Developer for Apache Spark

Get started today

Comments