
Answer-first summary for fast verification
Answer: Configure a three-task job where each task runs a specific notebook. The last two tasks run simultaneously, each depending on the completion of the first task.
The correct answer is D. This approach efficiently orchestrates the pipeline by respecting the dependency of Notebooks B and C on Notebook A's completion, while also allowing Notebooks B and C to run in parallel. This minimizes the total runtime by leveraging Databricks' job orchestration capabilities for both sequential and parallel task execution. Options A, B, and C either fail to utilize parallel execution where possible or do not respect the necessary completion of Notebook A before starting Notebooks B and C, leading to inefficiencies or potential errors.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A team of machine learning engineers is given three notebooks (Notebook A, Notebook B, and Notebook C) by a data scientist to establish a machine learning pipeline. Notebook A is used for exploratory data analysis, while Notebooks B and C are for feature engineering. Notebook A must be completed before Notebooks B and C can start, but Notebooks B and C can run independently of each other. What is the most efficient and reliable way to orchestrate this pipeline in Databricks? Choose the ONE best answer.
A
Set up a three-task job where each task runs a specific notebook, with each task depending on the completion of the previous one.
B
Create a three-task job where each task runs a distinct notebook, and all three tasks are executed in parallel.
C
Establish three single-task jobs, each running a unique notebook, all scheduled to run at the same time.
D
Configure a three-task job where each task runs a specific notebook. The last two tasks run simultaneously, each depending on the completion of the first task.
No comments yet.