
Answer-first summary for fast verification
Answer: pyspark.sql.functions.broadcast
The `pyspark.sql.functions.broadcast` function is used to mark a DataFrame as small enough for use in broadcast joins, which allows the smaller DataFrame to be sent to all executor nodes in the cluster. This optimization is crucial for improving the performance of join operations. Reference: [Apache Spark Documentation](https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.broadcast.html)
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
To optimize a join operation in Databricks by ensuring the smaller DataFrame is sent to all executor nodes in the cluster, which function should a data engineer use to mark the DataFrame as small enough to fit in memory on all executors?
A
pyspark.sql.functions.explode
B
pyspark.sql.functions.distribute
C
pyspark.sql.functions.broadcast
D
pyspark.sql.functions.diffuse
E
pyspark.sql.functions.shuffle
No comments yet.