
Ultimate access to all questions.
When tuning a Spark job that processes a dataset with uneven data distribution (skewed data), which configuration setting is most effective for ensuring the workload is evenly distributed across all cluster nodes?
A
Enabling spark.speculation to true to restart slow tasks.
B
Setting spark.sql.adaptive.skewJoin.enabled to true.
C
Adjusting spark.sql.shuffle.partitions to a lower number than the default.
D
Configuring spark.default.parallelism to match the number of cores in the cluster.