Databricks Certified Data Engineer - Associate

Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.


You are tasked with building a robust, production-grade data pipeline in Databricks. The pipeline consists of several modular notebooks:

utils_notebook: Contains utility functions for logging, error notification, and data quality checks.
extract_notebook: Extracts data from multiple sources and performs initial validation.
transform_notebook: Applies complex business logic and transformations.
load_notebook: Loads the processed data into a Delta table.

Your requirements are:

Reusability: Utility functions from utils_notebook must be accessible in all other notebooks without code duplication.
Parameterization: Each notebook must accept parameters (e.g., source paths, table names, run dates) at runtime.
Isolation: Each notebook should run in its own execution context to avoid variable conflicts and ensure independent error handling.
Error Handling: If any notebook fails, the pipeline should log the error using a function from utils_notebook and halt further execution.
Scalability: The solution should be maintainable and scalable for a team of data engineers.

Which of the following approaches best satisfies all requirements? Select the best answer and explain why the other options are less suitable.




Explanation:

In each notebook (extract_notebook, transform_notebook, load_notebook), use %run to include utils_notebook at the top, and then use dbutils.notebook.run() in the master orchestrator to call each notebook with parameters and error handling.

Reference Explanation:

Reusability: Including utils_notebook with %run at the top of each notebook ensures utility functions are available in each notebook’s context without code duplication.
Parameterization: dbutils.notebook.run() allows you to pass parameters to each notebook at runtime.
Isolation: Each notebook runs in its own context when called with dbutils.notebook.run(), preventing variable conflicts and supporting independent error handling.
Error Handling: The master orchestrator can use try/except blocks to catch errors from each dbutils.notebook.run() call and invoke the logging function (available in each notebook via %run utils_notebook).
Scalability: This modular approach is maintainable and scalable for collaborative teams.

Why other options are less suitable:

A: Utility functions from utils_notebook included in the master notebook are not available in the child notebooks’ isolated contexts when using dbutils.notebook.run().
B: Calling utils_notebook with dbutils.notebook.run() does not make its functions available to other notebooks.
D: %run does not provide context isolation or parameterization, and using global variables is not robust or scalable.