Databricks Certified Data Engineer - Associate

Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.


A team of data engineers has developed a Databricks Job in production with multiple tasks. Their goal is to minimize costs related to compute resources and optimize the execution time for each task. What is the best strategy to achieve these objectives?





Explanation:

The correct approach involves using a shared job cluster for all tasks in the job. This method leverages cluster reuse, allowing multiple tasks to share the same cluster. The shared cluster initializes when the first task starts and remains active until the last task using it completes. This strategy eliminates the need for multiple cluster startups, reducing both the time and cost associated with cluster initialization and underutilization. It maintains the flexibility of fine-grained configuration while improving cluster utilization, especially with parallel tasks. For more details, refer to: Cluster Reuse | Databricks.