
Explanation:
The configuration spark.sql.shuffle.partitions determines the number of partitions created after a shuffle operation (e.g., joins, aggregations). The default value of 200 means that Spark will split shuffled data into 200 partitions to enable parallel processing.
spark.sql.shuffle.partitions: splitting data into 200 partitions during shuffles.Ultimate access to all questions.
No comments yet.
What does the default value of 200 for spark.sql.shuffle.partitions signify?
A
By default, all DataFrames in Spark will be spit to perfectly fill the memory of 200 executors.
B
By default, new DataFrames created by Spark will be split to perfectly fill the memory of 200 executors.
C
By default, Spark will only read the first 200 partitions of DataFrames to improve speed.
D
By default, all DataFrames in Spark, including existing DataFrames, will be split into 200 unique segments for parallelization.
E
By default, DataFrames will be split into 200 unique partitions when data is being shuffled.