Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

Explanation:

Databricks clusters (running Apache Spark) are designed to survive individual VM failures by re-executing lost tasks on healthy nodes and replicating in-memory data blocks across executors.

Why 16 VMs is the best choice: Spreading the workload across many smaller workers (16 nodes) maximizes resilience. Losing one machine in this configuration only removes ~6.25% of the cluster's capacity. This ensures that plenty of healthy executors and data replicas remain available, allowing the job to continue without hitting retry limits.

Analysis of other configurations:

1 huge VM (Option C): This is a single point of failure (SPOF). If the VM fails, the job terminates immediately because there are no other nodes to take over.
2 or 4 VMs (Options E and A): These provide minimal redundancy. Losing one node removes 25%-50% of the compute power and a large portion of replicated data, making the job highly vulnerable to failure during the recovery phase.
8 VMs (Option D): While better than 2 or 4, each failure still removes ~12.5% of capacity, posing a higher risk to job stability compared to the 16-node setup.

Explanation:

Databricks clusters (running Apache Spark) are designed to survive individual VM failures by re-executing lost tasks on healthy nodes and replicating in-memory data blocks across executors.

Analysis of other configurations:

1 huge VM (Option C): This is a single point of failure (SPOF). If the VM fails, the job terminates immediately because there are no other nodes to take over.
2 or 4 VMs (Options E and A): These provide minimal redundancy. Losing one node removes 25%-50% of the compute power and a large portion of replicated data, making the job highly vulnerable to failure during the recovery phase.
8 VMs (Option D): While better than 2 or 4, each failure still removes ~12.5% of capacity, posing a higher risk to job stability compared to the 16-node setup.

Comments (0)

No comments yet.

A data engineer is designing a Databricks cluster for a mission-critical, long-running job. The configuration must prioritize job completion even in the event of individual Virtual Machine (VM) failures. Which of the following cluster setups provides the highest level of fault tolerance to ensure the job finishes successfully?

Real Exam

Last updated: January 6, 2026 at 15:41

Total VMs: 4, RAM per Executor: 100 GB, Cores per Executor: 40

6.1%

Total VMs: 16, RAM per Executor: 25 GB, Cores per Executor: 10

63.6%

Total VMs: 1, RAM per Executor: 400 GB, Cores per Executor: 160

6.1%

Total VMs: 8, RAM per Executor: 50 GB, Cores per Executor: 20

12.1%

Total VMs: 2, RAM per Executor: 200 GB, Cores per Executor: 80