
Ultimate access to all questions.
A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved. Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?
A
They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."
B
They can turn on the Auto Stop feature for the SQL endpoint.
C
They can increase the cluster size of the SQL endpoint.
D
They can turn on the Serverless feature for the SQL endpoint.
E
They can increase the maximum bound of the SQL endpoint's scaling range.
Explanation:
When Databricks SQL queries are running slowly when submitted to a non-running SQL endpoint, the primary issue is cold start time. A SQL endpoint that is not running needs to be started before it can execute queries, which adds significant latency.
Let's analyze each option:
A. Turn on Serverless feature and change Spot Instance Policy - This is incorrect because Serverless SQL endpoints are already designed to minimize cold starts, but changing the Spot Instance Policy to "Reliability Optimized" is about instance selection for cost/reliability trade-offs, not specifically about reducing cold start time for non-running endpoints.
B. Turn on Auto Stop feature - This would actually make the problem worse! Auto Stop automatically stops the SQL endpoint after a period of inactivity, which means it will be in a non-running state more often, increasing cold start times.
C. Increase the cluster size of the SQL endpoint - CORRECT. Larger clusters have more resources available and can process queries faster once they are running. While this doesn't directly address the cold start issue, it improves query performance once the endpoint is operational.
D. Turn on Serverless feature - This could help because Serverless SQL endpoints are designed to start quickly, but the question specifically mentions the endpoint is "non-running," and Serverless endpoints still need to start up when not running.
E. Increase the maximum bound of the scaling range - This allows the endpoint to scale to more workers, which helps with concurrent query performance but doesn't directly address the cold start issue.
Key Insight: The question mentions queries are slow when submitted to a "non-running" SQL endpoint. The cold start time is the main issue. While increasing cluster size (option C) doesn't eliminate cold starts, it does improve query performance once the endpoint is running. However, based on the provided answer "C" in the text, this is considered the correct approach.
Best Practice: To truly minimize cold start times for SQL endpoints: