
Ultimate access to all questions.
In a scenario where a Spark application is experiencing performance issues on a Databricks cluster, you are tasked with identifying and resolving the bottlenecks. The application processes large datasets and must comply with strict SLAs regarding execution time. Considering the need for cost efficiency and scalability, which of the following steps would you prioritize in your analysis using the Spark UI to accurately identify the root cause of the performance issues? Choose the best option.
A
Focus solely on the 'Jobs' tab to review the number of jobs submitted, without analyzing the stages or tasks for any anomalies.
B
Analyze the 'Stages' tab to identify stages with unusually high execution times or failures, and then examine the 'Tasks' tab within those stages to look for skewed task execution or outliers.
C
Limit your investigation to the 'Environment' tab to check the cluster's configuration settings, ignoring the actual performance metrics and logs.
D
Only review the 'Storage' tab to assess the data distribution, without considering the potential impact of data skew or partition sizes on performance.