Analysis of the Question
The requirement is to retain an Azure Databricks interactive cluster's configuration indefinitely after termination while minimizing costs. The cluster is used infrequently and has auto-termination enabled.
Evaluation of Options
A: Pin the cluster ✅
- Optimal Choice: Pinning a cluster in Azure Databricks preserves the cluster configuration permanently, even after termination.
- Cost Efficiency: Pinning doesn't keep the cluster running; it only retains the configuration metadata. The cluster still terminates when inactive, avoiding compute costs.
- Best Practice: This is the standard Azure Databricks feature designed specifically for preserving cluster configurations long-term without ongoing expenses.
B: Create an Azure runbook that starts the cluster every 90 days ❌
- Inefficient: This approach would incur unnecessary compute costs by periodically starting the cluster.
- Complexity: Requires additional Azure Automation resources and maintenance.
- Not Cost-Effective: Regular cluster startups defeat the purpose of minimizing costs for infrequent usage.
C: Terminate the cluster manually when processing completes ❌
- Configuration Loss: By default, Azure Databricks only retains terminated cluster configurations for 30 days for all-purpose clusters.
- Doesn't Meet Requirement: This approach doesn't ensure indefinite retention of the configuration.
D: Clone the cluster after it is terminated ❌
- Temporary Solution: Cloning creates a new cluster but doesn't preserve the original configuration indefinitely.
- Manual Process: Requires intervention each time the configuration needs to be reused.
- Not Permanent: The cloned cluster would also be subject to the same retention limitations.
Why Pinning is the Optimal Solution
Pinning is specifically designed for this scenario:
- Configuration Preservation: Pinned clusters retain their configuration metadata permanently in Azure Databricks.
- Cost Optimization: No compute costs are incurred while the cluster is terminated; only configuration storage.
- Ease of Use: When needed again, the pinned cluster can be easily restarted with the exact same configuration.
- Azure Databricks Best Practice: This is the recommended approach for preserving cluster configurations for infrequently used clusters.