
Explanation:
When joining a massive table with a very small lookup table (e.g., 5,000 rows), using a Broadcast Join (Broadcast Hash Join) is highly optimal. Spark broadcasts the smaller table to all worker nodes, avoiding a massive data shuffle of the large table over the network.
Ultimate access to all questions.
No comments yet.