Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
When tasked with performing complex transformations on a multi-terabyte dataset in Apache Spark, which practice ensures scalability and performance?
A
Increase the number of Spark workers arbitrarily to improve processing speed.
B
Optimize data partitioning based on the transformation logic to enhance parallelism.
C
Broadcast large reference datasets to all Spark nodes to minimize data shuffling during joins.
D
Cache all intermediate transformations in memory, regardless of cluster size.