
Answer-first summary for fast verification
Answer: Broadcast the lookup table using the broadcast method.
In Spark, broadcasting the smaller table/DataFrame to all executors within the cluster can significantly increase the performance of the join by reducing shuffles. This is especially effective when joining a very small table with a much larger one. More Info: Broadcast joins in Spark.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
When a Delta table needs to be joined with a very small lookup table to add a column, which technique can a data engineer use to speed up the column addition process?
A
Convert the code to Python to increase the join speed.
B
Use the union method instead of join to add the column.
C
Broadcast the lookup table using the broadcast method.
D
Create a UDF to add the column.
E
Use a full outer join to speed up the process, as outer joins are always optimized.
No comments yet.