Ultimate access to all questions.
In a scenario where you are processing a large dataset using Spark Structured Streaming, you notice that certain tasks are taking significantly longer to complete than others, leading to inefficiencies in your data processing pipeline. Upon investigation, you identify that this is due to data skew, where some partitions contain significantly more data than others. Considering the need to optimize query performance while adhering to cost constraints and ensuring scalability, which of the following approaches would be the MOST effective in addressing the issue of data skew? Choose the best option from the four provided.