
Ultimate access to all questions.
As a Microsoft Fabric Analytics Engineer Associate, you are tasked with optimizing the performance of a complex SQL query in a Spark notebook within Azure Databricks. The query involves multiple joins and aggregations across large datasets. Your goal is to ensure the query executes as efficiently as possible, considering factors such as cost, scalability, and the need to minimize recomputation. Which of the following approaches would BEST improve the performance of the query under these constraints? (Choose one option)
A
Implement additional indexing on the tables involved in the query to speed up data retrieval.
B
Leverage a distributed cache mechanism to store and reuse intermediate results of the query across the cluster.
C
Restructure the query to utilize subqueries and temporary tables for breaking down the complexity.
D
Apply the 'cache' command to persist the tables involved in the query in the memory of the worker nodes.