Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
When optimizing Spark SQL window functions over a large dataset, which technique significantly enhances performance?
A
Increasing spark.sql.shuffle.partitions to a very high number to ensure data is evenly distributed.
B
Partitioning the DataFrame by the window’s partitionBy column before applying the window function.
C
Leveraging broadcast join before applying the window function to reduce shuffle.
D
Caching the DataFrame before applying the window function.