
Ultimate access to all questions.
In a data engineering project, you have two Databricks notebooks: one for data preprocessing and another for data analysis. The project requires that the analysis notebook runs only after the preprocessing notebook has completed successfully. Considering factors such as automation, reliability, and scalability, which of the following approaches is the BEST to achieve this workflow? Choose one option.
A
Manually execute the analysis notebook after verifying the preprocessing notebook's completion, which is simple but lacks automation and scalability.
B
Utilize the Databricks Jobs feature to create a job that runs the preprocessing notebook first and then the analysis notebook, ensuring automation and dependency management.
C
Embed the analysis notebook's code directly into the preprocessing notebook, combining both processes into a single notebook, which may complicate maintenance and scalability.
D
Use the %run magic command within the preprocessing notebook to call the analysis notebook, which is straightforward but does not provide robust dependency management or scheduling capabilities.