
Explanation:
In conclusion, increasing the spark.sql.shuffle.partitions parameter may not be the most effective technique for managing data skew as it does not address the root cause of the skew and may lead to unnecessary overhead. Other techniques such as employing salting techniques, using the coalesce function, or applying a custom partitioner that considers data distribution would be more suitable for managing data skew in a Spark job processing data from Azure Blob Storage.
Ultimate access to all questions.
When dealing with data skew in a Spark job that processes data from Azure Blob Storage, which of the following techniques is least effective for managing skew?
A
Applying a custom partitioner that considers data distribution
B
Employing salting techniques before shuffling
C
Using the coalesce function to reduce the number of partitions
D
Increasing the spark.sql.shuffle.partitions parameter
No comments yet.