
Ultimate access to all questions.
In a scenario where you are tasked with optimizing the performance of a Spark application running on Azure Databricks, you need to identify and resolve performance bottlenecks under the constraints of minimizing cost while ensuring compliance with data governance policies. The application processes large datasets and is expected to scale efficiently. Which of the following steps is the MOST effective first step to identify performance bottlenecks, and why? Choose one option.
A
Review the Spark UI to identify stages with long execution times, as these are likely the bottlenecks affecting overall job performance.
B
Increase the number of executors in the Spark cluster to improve parallelism without analyzing the current performance metrics.
C
Cache all datasets in memory to reduce I/O operations, regardless of the dataset size or access patterns.
D
Switch to a higher-priced Azure Databricks tier to automatically optimize performance without manual intervention.