Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
Which strategy is most effective for reducing shuffling and enhancing query performance when optimizing a Spark job that processes a large Delta Lake table?
A
Using broadcast variables to minimize data transfer
B
Maintaining default partitioning to let Spark decide
C
Decreasing the number of partitions using coalesce()
D
Increasing the number of partitions to maximize parallelism