
Ultimate access to all questions.
A data engineer has a single-task Job that runs each morning before they begin working. After identifying an upstream data issue, they need to set up another task to run a new notebook prior to the original task. Which of the following approaches can the data engineer use to set up the new task?
A
They can clone the existing task in the existing Job and update it to run the new notebook.
B
They can create a new task in the existing Job and then add it as a dependency of the original task.
C
They can create a new task in the existing Job and then add the original task as a dependency of the new task.
D
They can create a new job from scratch and add both tasks to run concurrently.
E
They can clone the existing task to a new Job and then edit it to run the new notebook.
Explanation:
The correct answer is B because:
The requirement is to run a new notebook BEFORE the original task - This means the new task should be executed first, followed by the original task.
In Databricks Jobs, task dependencies work by specifying that a task depends on another task - When you add a task as a dependency of another task, the dependent task will wait for the prerequisite task to complete successfully before starting.
Option B correctly implements this:
Why other options are incorrect:
The explanation provided in the text confirms this: "Once the new task is created, designate it as a dependency of the original task in the Job configuration. This ensures that the new task is executed before the original task."
This approach maintains the existing job structure while adding the necessary preprocessing step to handle the upstream data issue.