Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
A data engineer is optimizing a join operation between two DataFrames, df1 and df2, using the following query: joined_df = df1.join(broadcast(df2), 'id', 'inner'). Which statement accurately describes how this join operation works?
joined_df = df1.join(broadcast(df2), 'id', 'inner')
A
The join operation will fail because 'inner' should be replaced with 'broadcast'.
B
A copy of df2 will be sent to all worker nodes to facilitate the join.
C
The join operation will fail because 'broadcast_df' should be used instead of 'broadcast'.
D
Only the first 10 MB of data from df2 will be used in the join.
E
The result of the join, joined_df, will be broadcasted to all worker nodes due to the use of the broadcast function.