Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
In a scenario where you're dynamically loading varying volumes of data into Spark DataFrames, what is the best approach to optimize partitioning for enhanced performance across different loads?
A
Leverage Spark’s adaptive query execution feature to adjust partitions automatically.
B
Use repartitionByRange dynamically based on the DataFrame’s actual size after loading.
C
Always use coalesce to minimize shuffling, regardless of the data volume.
D
Hard-code the number of partitions to match the highest anticipated data volume.