Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
To optimize performance in Apache Spark when joining a DataFrame df with another DataFrame lookupDf on a common key, which method should you use to leverage broadcast variables effectively?
df
lookupDf
A
Partitioning both DataFrames by 'key' before joining
B
Broadcasting both DataFrames before joining
C
Using df.join(broadcast(lookupDf), 'key'
df.join(broadcast(lookupDf), 'key'
D
Applying lookupDf.join(df, 'key') without broadcasting
lookupDf.join(df, 'key')