
Explanation:
The most effective way to make the etl_utils library available to all team members on a shared Databricks cluster is by using the dbutils.library.installPyPI('etl_utils') command in the cluster's initialization script. This ensures the library is installed and accessible across all notebooks attached to the cluster, providing a centralized and consistent method for library management. Other methods, such as modifying the Databricks Runtime, adjusting PYTHONPATH, or manually installing the library in individual notebooks, are less efficient and may lead to compatibility issues or inconsistencies.
Ultimate access to all questions.
No comments yet.
A data engineering team is working on ETL pipelines using a shared Databricks cluster and needs to utilize a third-party Python library, etl_utils, in their notebooks. What is the best method to ensure this library is accessible to all team members?
A
Modify the cluster to utilize the Databricks Runtime for Data Engineering.
B
Execute %pip install etl_utils in any notebook connected to the cluster.
C
Adjust the PYTHONPATH variable in the cluster settings to include the path to etl_utils.
D
Incorporate the dbutils.library.installPyPI('etl_utils') command into the cluster's initialization script.
E
It's impossible to make the etl_utils library available on a cluster.