
Ultimate access to all questions.
As a Microsoft Fabric Analytics Engineer Associate, you are optimizing a Spark notebook in Azure Databricks that includes a SQL query with performance issues. The query's prolonged execution is delaying the entire data processing pipeline. Considering the need for cost efficiency, compliance with data governance policies, and scalability, which of the following steps would you prioritize to enhance the SQL query's performance? (Choose one option)
A
Increase the memory allocation for the Spark cluster to handle larger datasets more efficiently.
B
Repartition the DataFrame to increase the number of partitions, aiming to improve parallelism and reduce query execution time.
C
Implement a broadcast join for the query if it involves joining a small DataFrame with a larger one, to optimize join performance.
D
Apply data filtering as early as possible in the query to minimize the volume of data processed, thereby reducing execution time and resource consumption.