
Explanation:
Databricks clusters (running Apache Spark) are designed to survive individual VM failures by re-executing lost tasks on healthy nodes and replicating in-memory data blocks across executors.
Why 16 VMs is the best choice: Spreading the workload across many smaller workers (16 nodes) maximizes resilience. Losing one machine in this configuration only removes ~6.25% of the cluster's capacity. This ensures that plenty of healthy executors and data replicas remain available, allowing the job to continue without hitting retry limits.
Analysis of other configurations:
Ultimate access to all questions.
No comments yet.
A data engineer is designing a Databricks cluster for a mission-critical, long-running job. The configuration must prioritize job completion even in the event of individual Virtual Machine (VM) failures. Which of the following cluster setups provides the highest level of fault tolerance to ensure the job finishes successfully?
A
Total VMs: 4, RAM per Executor: 100 GB, Cores per Executor: 40
B
Total VMs: 16, RAM per Executor: 25 GB, Cores per Executor: 10
C
Total VMs: 1, RAM per Executor: 400 GB, Cores per Executor: 160
D
Total VMs: 8, RAM per Executor: 50 GB, Cores per Executor: 20
E
Total VMs: 2, RAM per Executor: 200 GB, Cores per Executor: 80