
Answer-first summary for fast verification
Answer: spark.sql.autoBroadcastJoinThreshold
The correct Spark property to configure the maximum size of an automatically broadcasted DataFrame during a join is `spark.sql.autoBroadcastJoinThreshold`. This property specifies the size limit (in bytes) for a table to be considered for broadcasting. If the size of the DataFrame is below this threshold, Spark will automatically broadcast it to all worker nodes to optimize join performance. Other options are unrelated: - **A** (`spark.sql.broadcastTimeout`) sets the timeout for broadcast tasks. - **C** (`spark.sql.shuffle.partitions`) controls the number of partitions after shuffles. - **D** (`spark.sql.inMemoryColumnarStorage.batchSize`) configures columnar storage batch size. - **E** (`spark.sql.adaptive.skewedJoin.enabled`) handles skewed joins in adaptive execution.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Which of the following Spark properties configures the maximum size for automatic DataFrame broadcasting during join operations?
A
spark.sql.broadcastTimeout
B
spark.sql.autoBroadcastJoinThreshold
C
spark.sql.shuffle.partitions
D
spark.sql.inMemoryColumnarStorage.batchSize
E
spark.sql.adaptive.skewedJoin.enabled
No comments yet.