
Ultimate access to all questions.
You are developing a modular data pipeline in Databricks, where you have separated reusable utility functions (such as data validation, logging, and custom transformations) into dedicated notebooks. Your main ETL notebook needs to leverage these utilities and ensure that any changes to the utility notebooks are automatically reflected in the ETL workflow without code duplication.
Given the following requirements:
The utility code should be executed in the same context as the main notebook, allowing variables and Spark sessions to be shared.
The approach should support parameterization, so the utility notebook can accept arguments from the main notebook.
The solution should be maintainable and scalable for a team of data engineers working collaboratively.
The utility code should be executed in the same context as the main notebook, allowing variables and Spark sessions to be shared.
The approach should support parameterization, so the utility notebook can accept arguments from the main notebook.
The solution should be maintainable and scalable for a team of data engineers working collaboratively.
Which of the following approaches best satisfies all requirements? Select the best answer and explain why the other options are less suitable.
A
Use the %run magic command to include utility notebooks at the top of the main ETL notebook, and pass parameters via global variables.
B
Use the dbutils.notebook.run() method to call utility notebooks, passing parameters as arguments, and handle outputs via return values.
C
Copy and paste the utility code directly into the main ETL notebook to ensure all code is present in one place.
D
Use external Python scripts stored in DBFS and import them as modules in the main notebook.