Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
When tuning a Spark job that processes a dataset with uneven data distribution (skewed data), which configuration setting is most effective for ensuring the workload is evenly distributed across all cluster nodes?
A
Enabling spark.speculation to true to restart slow tasks.
spark.speculation
B
Setting spark.sql.adaptive.skewJoin.enabled to true.
spark.sql.adaptive.skewJoin.enabled
C
Adjusting spark.sql.shuffle.partitions to a lower number than the default.
spark.sql.shuffle.partitions
D
Configuring spark.default.parallelism to match the number of cores in the cluster.
spark.default.parallelism