
Answer-first summary for fast verification
Answer: By default, DataFrames will be split into 200 unique partitions when data is being shuffled.
The configuration `spark.sql.shuffle.partitions` determines the number of partitions created after a shuffle operation (e.g., joins, aggregations). The default value of 200 means that Spark will split shuffled data into 200 partitions to enable parallel processing. - **A, B, D** are incorrect because they incorrectly associate partitions with executors or imply static partitioning for existing DataFrames. Partitions are not tied to executors' memory or count, and existing DataFrames are only repartitioned during shuffles. - **C** is incorrect as it conflates shuffle partitions with input data reading, which is unrelated. - **E** is correct because it directly addresses the purpose of `spark.sql.shuffle.partitions`: splitting data into 200 partitions during shuffles.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
What does the default value of 200 for spark.sql.shuffle.partitions signify?
A
By default, all DataFrames in Spark will be spit to perfectly fill the memory of 200 executors.
B
By default, new DataFrames created by Spark will be split to perfectly fill the memory of 200 executors.
C
By default, Spark will only read the first 200 partitions of DataFrames to improve speed.
D
By default, all DataFrames in Spark, including existing DataFrames, will be split into 200 unique segments for parallelization.
E
By default, DataFrames will be split into 200 unique partitions when data is being shuffled.
No comments yet.