
Answer-first summary for fast verification
Answer: They can institute a retry policy for the task that periodically fails
## Explanation **Correct Answer: D** - They can institute a retry policy for the task that periodically fails **Why this is correct:** 1. **Targeted approach**: The problem is specific to one task that fails only 10% of the time. Implementing a retry policy for just that task is the most efficient solution. 2. **Cost-effective**: Retrying only the failing task minimizes compute costs compared to retrying the entire job or running multiple job instances. 3. **High success probability**: With a 90% success rate, a few retries should ensure completion without excessive resource consumption. 4. **Standard practice**: Task-level retry policies are a common pattern for handling intermittent failures in data pipelines. **Why other options are incorrect:** **A. They can institute a retry policy for the entire Job** - This would retry all tasks, not just the failing one, wasting compute resources on tasks that already succeeded. - Increases costs unnecessarily by re-running successful tasks. **B. They can observe the task as it runs to try and determine why it is failing** - While investigation is important for root cause analysis, it doesn't ensure job completion each night. - This is a reactive approach rather than a proactive solution for ensuring completion. **C. They can set up the Job to run multiple times ensuring that at least one will complete** - This would significantly increase compute costs by running multiple instances of the entire job. - With a 90% success rate, running multiple instances is wasteful overkill. **E. They can utilize a Jobs cluster for each of the tasks in the Job** - This would increase costs by provisioning separate clusters for each task. - Doesn't address the intermittent failure issue - just changes the execution environment. **Best Practice Recommendation:** In Databricks Jobs, you can configure task-level retry policies with: - Maximum number of retries (e.g., 3-5) - Exponential backoff between retries - Specific error handling for known failure patterns This approach ensures job completion while maintaining cost efficiency for intermittent failures.
Author: Keng Suppaseth
Ultimate access to all questions.
No comments yet.
A data engineer has a Job with multiple tasks that runs nightly. One of the tasks unexpectedly fails during 10 percent of the runs.
Which of the following actions can the data engineer perform to ensure the Job completes each night while minimizing compute costs?
A
They can institute a retry policy for the entire Job
B
They can observe the task as it runs to try and determine why it is failing
C
They can set up the Job to run multiple times ensuring that at least one will complete
D
They can institute a retry policy for the task that periodically fails
E
They can utilize a Jobs cluster for each of the tasks in the Job