You are tasked with building a robust, production-grade data pipeline in Databricks. The pipeline consists of several modular notebooks: utils_notebook: Contains utility functions for logging, error notification, and data quality checks. extract_notebook: Extracts data from multiple sources and performs initial validation. transform_notebook: Applies complex business logic and transformations. load_notebook: Loads the processed data into a Delta table. Your requirements are: Reusability: Utility functions from utils_notebook must be accessible in all other notebooks without code duplication. Parameterization: Each notebook must accept parameters (e.g., source paths, table names, run dates) at runtime. Isolation: Each notebook should run in its own execution context to avoid variable conflicts and ensure independent error handling. Error Handling: If any notebook fails, the pipeline should log the error using a function from utils_notebook and halt further execution. Scalability: The solution should be maintainable and scalable for a team of data engineers. Which of the following approaches best satisfies all requirements? Select the best answer and explain why the other options are less suitable. | Databricks Certified Data Engineer - Associate Quiz

You are tasked with building a robust, production-grade data pipeline in Databricks. The pipeline consists of several modular notebooks:

utils_notebook: Contains utility functions for logging, error notification, and data quality checks.
extract_notebook: Extracts data from multiple sources and performs initial validation.
transform_notebook: Applies complex business logic and transformations.
load_notebook: Loads the processed data into a Delta table.

Your requirements are:

Reusability: Utility functions from utils_notebook must be accessible in all other notebooks without code duplication.
Parameterization: Each notebook must accept parameters (e.g., source paths, table names, run dates) at runtime.
Isolation: Each notebook should run in its own execution context to avoid variable conflicts and ensure independent error handling.
Error Handling: If any notebook fails, the pipeline should log the error using a function from utils_notebook and halt further execution.
Scalability: The solution should be maintainable and scalable for a team of data engineers.

Which of the following approaches best satisfies all requirements? Select the best answer and explain why the other options are less suitable.

Exam-Like

Databricks Certified Data Engineer - Associate

Get started today

Comments