
Databricks Certified Machine Learning - Associate
Get started today
Ultimate access to all questions.
A data engineering team is working on ETL pipelines using a shared Databricks cluster and needs to utilize a third-party Python library, etl_utils
, in their notebooks. What is the best method to ensure this library is accessible to all team members?
A data engineering team is working on ETL pipelines using a shared Databricks cluster and needs to utilize a third-party Python library, etl_utils
, in their notebooks. What is the best method to ensure this library is accessible to all team members?
Explanation:
The most effective way to make the etl_utils
library available to all team members on a shared Databricks cluster is by using the dbutils.library.installPyPI('etl_utils')
command in the cluster's initialization script. This ensures the library is installed and accessible across all notebooks attached to the cluster, providing a centralized and consistent method for library management. Other methods, such as modifying the Databricks Runtime, adjusting PYTHONPATH
, or manually installing the library in individual notebooks, are less efficient and may lead to compatibility issues or inconsistencies.