
Answer-first summary for fast verification
Answer: spark.default.parallelism is not the right Spark configuration parameter – spark.sql.shuffle.partitions should be used instead.
The code block uses `spark.default.parallelism`, which configures the default number of partitions for RDD-based operations. However, for wide transformations like `join()` in DataFrames/Datasets, the correct parameter to adjust shuffle partitions is `spark.sql.shuffle.partitions`. While the value "32" is passed as a string, Spark accepts string values for configuration parameters and internally converts them to the required type, making option E incorrect. Other options (B, C, D) are invalid as they misstate Spark's configuration capabilities.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Identify the error in the following code block intended to configure the number of partitions for wide transformations like join() to 32:
Code block:
spark.conf.set("spark.sql.shuffle.partitions", "32")
spark.conf.set("spark.sql.shuffle.partitions", "32")
A
spark.default.parallelism is not the right Spark configuration parameter – spark.sql.shuffle.partitions should be used instead.
B
There is no way to adjust the number of partitions used in wide transformations – it defaults to the number of total CPUs in the cluster.
C
Spark configuration parameters cannot be set in runtime.
D
Spark configuration parameters are not set with spark.conf.set().
E
The second argument should not be the string version of "32" – it should be the integer 32.