
A Databricks workflow consists of multiple tasks, including 'DataIngestion' and 'DataValidation'. The 'DataValidation' task should only execute if 'DataIngestion' completes successfully, ensuring that no validation occurs on incomplete or missing data. As a Databricks Data Engineer, you are configuring this workflow using the Jobs UI or Jobs API. Which of the following is the most appropriate way to enforce this dependency within the job configuration?
A
Define the 'dependsOn' attribute for the 'DataValidation' task, referencing 'DataIngestion' as its predecessor task.
B
Implement a manual process to monitor the status of 'DataIngestion' and trigger 'DataValidation' upon its completion.
C
Set the start time of 'DataValidation' to immediately follow the scheduled time of 'DataIngestion'.
D
Write a custom script that continuously checks for the completion of 'DataIngestion' and then initiates 'DataValidation'.
Explanation:
The 'dependsOn' attribute in the Databricks Jobs configuration allows you to specify task dependencies directly. By setting 'DataIngestion' as a predecessor of 'DataValidation', you ensure that 'DataValidation' will only run after 'DataIngestion' has completed successfully, automating the dependency management within the workflow.
Ultimate access to all questions.