Ultimate access to all questions.
A team of machine learning engineers is given three notebooks (Notebook A, Notebook B, and Notebook C) by a data scientist to establish a machine learning pipeline. Notebook A is used for exploratory data analysis, while Notebooks B and C are for feature engineering. Notebook A must be completed before Notebooks B and C can start, but Notebooks B and C can run independently of each other. What is the most efficient and reliable way to orchestrate this pipeline in Databricks? Choose the ONE best answer.
Explanation:
The correct answer is D. This approach efficiently orchestrates the pipeline by respecting the dependency of Notebooks B and C on Notebook A's completion, while also allowing Notebooks B and C to run in parallel. This minimizes the total runtime by leveraging Databricks' job orchestration capabilities for both sequential and parallel task execution. Options A, B, and C either fail to utilize parallel execution where possible or do not respect the necessary completion of Notebook A before starting Notebooks B and C, leading to inefficiencies or potential errors.