Ultimate access to all questions.
You are tasked with building a robust, production-grade data pipeline in Databricks. The pipeline consists of several modular notebooks:
utils_notebook: Contains utility functions for logging, error notification, and data quality checks.
extract_notebook: Extracts data from multiple sources and performs initial validation.
transform_notebook: Applies complex business logic and transformations.
load_notebook: Loads the processed data into a Delta table.
Your requirements are:
Reusability: Utility functions from utils_notebook must be accessible in all other notebooks without code duplication.
Parameterization: Each notebook must accept parameters (e.g., source paths, table names, run dates) at runtime.
Isolation: Each notebook should run in its own execution context to avoid variable conflicts and ensure independent error handling.
Error Handling: If any notebook fails, the pipeline should log the error using a function from utils_notebook and halt further execution.
Scalability: The solution should be maintainable and scalable for a team of data engineers.
Which of the following approaches best satisfies all requirements? Select the best answer and explain why the other options are less suitable.