
Answer-first summary for fast verification
Answer: In each notebook (extract_notebook, transform_notebook, load_notebook), use %run to include utils_notebook at the top, and then use dbutils.notebook.run() in the master orchestrator to call each notebook with parameters and error handling.
In each notebook (extract_notebook, transform_notebook, load_notebook), use %run to include utils_notebook at the top, and then use dbutils.notebook.run() in the master orchestrator to call each notebook with parameters and error handling. Reference Explanation: Reusability: Including utils_notebook with %run at the top of each notebook ensures utility functions are available in each notebook’s context without code duplication. Parameterization: dbutils.notebook.run() allows you to pass parameters to each notebook at runtime. Isolation: Each notebook runs in its own context when called with dbutils.notebook.run(), preventing variable conflicts and supporting independent error handling. Error Handling: The master orchestrator can use try/except blocks to catch errors from each dbutils.notebook.run() call and invoke the logging function (available in each notebook via %run utils_notebook). Scalability: This modular approach is maintainable and scalable for collaborative teams. Why other options are less suitable: A: Utility functions from utils_notebook included in the master notebook are not available in the child notebooks’ isolated contexts when using dbutils.notebook.run(). B: Calling utils_notebook with dbutils.notebook.run() does not make its functions available to other notebooks. D: %run does not provide context isolation or parameterization, and using global variables is not robust or scalable.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are tasked with building a robust, production-grade data pipeline in Databricks. The pipeline consists of several modular notebooks:
utils_notebook: Contains utility functions for logging, error notification, and data quality checks.
extract_notebook: Extracts data from multiple sources and performs initial validation.
transform_notebook: Applies complex business logic and transformations.
load_notebook: Loads the processed data into a Delta table.
utils_notebook: Contains utility functions for logging, error notification, and data quality checks.
extract_notebook: Extracts data from multiple sources and performs initial validation.
transform_notebook: Applies complex business logic and transformations.
load_notebook: Loads the processed data into a Delta table.
Your requirements are:
Reusability: Utility functions from utils_notebook must be accessible in all other notebooks without code duplication.
Parameterization: Each notebook must accept parameters (e.g., source paths, table names, run dates) at runtime.
Isolation: Each notebook should run in its own execution context to avoid variable conflicts and ensure independent error handling.
Error Handling: If any notebook fails, the pipeline should log the error using a function from utils_notebook and halt further execution.
Scalability: The solution should be maintainable and scalable for a team of data engineers.
Reusability: Utility functions from utils_notebook must be accessible in all other notebooks without code duplication.
Parameterization: Each notebook must accept parameters (e.g., source paths, table names, run dates) at runtime.
Isolation: Each notebook should run in its own execution context to avoid variable conflicts and ensure independent error handling.
Error Handling: If any notebook fails, the pipeline should log the error using a function from utils_notebook and halt further execution.
Scalability: The solution should be maintainable and scalable for a team of data engineers.
Which of the following approaches best satisfies all requirements? Select the best answer and explain why the other options are less suitable.
A
In a master orchestrator notebook, use %run to include utils_notebook at the top, then use dbutils.notebook.run() to call extract_notebook, transform_notebook, and load_notebook in sequence, passing parameters and handling errors with try/except blocks that call the logging function from utils_notebook.
B
In a master orchestrator notebook, use dbutils.notebook.run() to call utils_notebook, extract_notebook, transform_notebook, and load_notebook in sequence, passing parameters and handling errors with try/except blocks.
C
In each notebook (extract_notebook, transform_notebook, load_notebook), use %run to include utils_notebook at the top, and then use dbutils.notebook.run() in the master orchestrator to call each notebook with parameters and error handling.
D
In a master orchestrator notebook, use %run to include all notebooks (utils_notebook, extract_notebook, transform_notebook, load_notebook) in sequence, setting parameters as global variables and handling errors with try/except blocks.