
Databricks Certified Data Engineer - Professional
Get started today
Ultimate access to all questions.
Given a long-running job that must complete successfully even if one or more VMs fail, which of the following cluster configurations (each with identical total resources: 400 GB RAM, 160 cores, and one Executor per VM) will ensure job completion?
Given a long-running job that must complete successfully even if one or more VMs fail, which of the following cluster configurations (each with identical total resources: 400 GB RAM, 160 cores, and one Executor per VM) will ensure job completion?
Explanation:
To ensure the completion of an extremely long-running job in the event of one or more VM failures, the cluster configuration must be resilient to such failures. This resilience is achieved by having more, smaller executors, which allows the workload to be redistributed among the remaining executors if one or more VMs fail. Option B, with 16 VMs, each having 25 GB per Executor and 10 Cores / Executor, provides the highest level of fault tolerance. This configuration ensures that even if several VMs fail, the remaining executors can still handle the job, albeit at a potentially reduced speed. In contrast, options with fewer, larger executors (like A, C, and D) are more vulnerable to VM failures, as the loss of a single VM would have a more significant impact on the cluster's overall capacity. Therefore, option B is the most suitable configuration for guaranteeing job completion in light of VM failures.