
Ultimate access to all questions.
A data engineer has a single-task Job that runs each morning before they begin working. After identifying an upstream data issue, they need to set up another task to run a new notebook prior to the original task. Which of the following approaches can the data engineer use to set up the new task?
A
They can clone the existing task in the existing Job and update it to run the new notebook.
B
They can create a new task in the existing Job and then add it as a dependency of the original task.
C
They can create a new task in the existing Job and then add the original task as a dependency of the new task.
D
They can create a new job from scratch and add both tasks to run concurrently.
E
They can clone the existing task to a new Job and then edit it to run the new notebook.
Explanation:
The correct answer is B because:
Requirement: The data engineer needs to run a new notebook prior to the original task. This means the new task must execute before the original task.
Task Dependencies in Databricks Jobs: In Databricks Jobs, you can set up task dependencies where one task must complete before another starts. When you add a task as a dependency of another task, the dependent task will only run after the prerequisite task completes successfully.
Option B Analysis:
Why other options are incorrect:
Best Practice: This approach maintains the existing job structure while adding the necessary preprocessing step, ensuring data quality issues are addressed before the main processing task runs.