
Ultimate access to all questions.
As a Microsoft Fabric Analytics Engineer Associate, you are tasked with optimizing the performance of a complex SQL query in a Spark notebook within Azure Databricks. The query involves multiple joins on large tables, and you need to ensure the solution is cost-effective, scalable, and complies with data governance policies. Considering these constraints, which of the following approaches would you choose to significantly improve the query performance? (Choose one option)
A
Rewrite the query to utilize subqueries and temporary tables, ensuring that the temporary tables are persisted in a cost-effective storage layer.
B
Implement the 'cache' command to store the tables involved in the query in memory, taking into account the memory constraints and the size of the tables.
C
Add more indexes to the tables involved in the query, considering the overhead of maintaining additional indexes on large tables.
D
Apply the 'broadcast' join optimization to minimize the data shuffle during the join operation, especially when one of the tables is small enough to be broadcasted.