
Answer-first summary for fast verification
Answer: Utilizing the repartition method to reduce the number of connections
The correct answer is **D. Utilizing the repartition method to reduce the number of connections**. This strategy is the least effective because repartitioning can increase the number of connections to the database, potentially causing performance bottlenecks. On the other hand, employing the batchsize parameter (A) groups multiple writes into a single batch, reducing overhead. Caching the DataFrame (B) stores intermediate results in memory, minimizing recomputation. Using a broadcast join (C) reduces data movement across the cluster, enhancing processing speed. While all strategies aim to optimize performance, repartitioning may not yield the desired improvements in this context.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
When optimizing a Spark job that writes data to Azure SQL Database using JDBC, which strategy is considered the least effective?
A
Employing batchsize parameter in the JDBC URL to batch writes
B
Caching the DataFrame before writing to reduce recomputation
C
Leveraging the broadcast join before writing to minimize shuffling
D
Utilizing the repartition method to reduce the number of connections
No comments yet.