
Ultimate access to all questions.
You are tasked with building a robust, production-grade data pipeline in Databricks. The pipeline consists of several modular notebooks:
utils_notebook: Contains utility functions for logging, error notification, and data quality checks.
extract_notebook: Extracts data from multiple sources and performs initial validation.
transform_notebook: Applies complex business logic and transformations.
load_notebook: Loads the processed data into a Delta table.
utils_notebook: Contains utility functions for logging, error notification, and data quality checks.
extract_notebook: Extracts data from multiple sources and performs initial validation.
transform_notebook: Applies complex business logic and transformations.
load_notebook: Loads the processed data into a Delta table.
Your requirements are:
Reusability: Utility functions from utils_notebook must be accessible in all other notebooks without code duplication.
Parameterization: Each notebook must accept parameters (e.g., source paths, table names, run dates) at runtime.
Isolation: Each notebook should run in its own execution context to avoid variable conflicts and ensure independent error handling.
Error Handling: If any notebook fails, the pipeline should log the error using a function from utils_notebook and halt further execution.
Scalability: The solution should be maintainable and scalable for a team of data engineers.
Reusability: Utility functions from utils_notebook must be accessible in all other notebooks without code duplication.
Parameterization: Each notebook must accept parameters (e.g., source paths, table names, run dates) at runtime.
Isolation: Each notebook should run in its own execution context to avoid variable conflicts and ensure independent error handling.
Error Handling: If any notebook fails, the pipeline should log the error using a function from utils_notebook and halt further execution.
Scalability: The solution should be maintainable and scalable for a team of data engineers.
Which of the following approaches best satisfies all requirements? Select the best answer and explain why the other options are less suitable.
A
In a master orchestrator notebook, use %run to include utils_notebook at the top, then use dbutils.notebook.run() to call extract_notebook, transform_notebook, and load_notebook in sequence, passing parameters and handling errors with try/except blocks that call the logging function from utils_notebook._
B
In a master orchestrator notebook, use dbutils.notebook.run() to call utils_notebook, extract_notebook, transform_notebook, and load_notebook in sequence, passing parameters and handling errors with try/except blocks.
C
In each notebook (extract_notebook, transform_notebook, load_notebook), use %run to include utils_notebook at the top, and then use dbutils.notebook.run() in the master orchestrator to call each notebook with parameters and error handling.
D
In a master orchestrator notebook, use %run to include all notebooks (utils_notebook, extract_notebook, transform_notebook, load_notebook) in sequence, setting parameters as global variables and handling errors with try/except blocks.