Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
When joining a large DataFrame df1 with a small DataFrame df2 in Spark, what is the most efficient method to optimize the operation?
df1
df2
A
Repartition df1 to match the number of partitions in df2 before joining.
B
Convert df2 to an RDD and manually broadcast it for the join operation with df1.
C
Cache both DataFrames in memory before performing the join to speed up access.
D
Use broadcast variables to distribute df2 and apply the broadcast hint in the join operation.