Databricks Certified Machine Learning - Associate

Databricks Certified Machine Learning - Associate

Get started today

Ultimate access to all questions.


A data engineering team is working on ETL pipelines using a shared Databricks cluster and needs to utilize a third-party Python library, etl_utils, in their notebooks. What is the best method to ensure this library is accessible to all team members?





Explanation:

The most effective way to make the etl_utils library available to all team members on a shared Databricks cluster is by using the dbutils.library.installPyPI('etl_utils') command in the cluster's initialization script. This ensures the library is installed and accessible across all notebooks attached to the cluster, providing a centralized and consistent method for library management. Other methods, such as modifying the Databricks Runtime, adjusting PYTHONPATH, or manually installing the library in individual notebooks, are less efficient and may lead to compatibility issues or inconsistencies.