
Answer-first summary for fast verification
Answer: Incorporate the `dbutils.library.installPyPI('etl_utils')` command into the cluster's initialization script.
The most effective way to make the `etl_utils` library available to all team members on a shared Databricks cluster is by using the `dbutils.library.installPyPI('etl_utils')` command in the cluster's initialization script. This ensures the library is installed and accessible across all notebooks attached to the cluster, providing a centralized and consistent method for library management. Other methods, such as modifying the Databricks Runtime, adjusting `PYTHONPATH`, or manually installing the library in individual notebooks, are less efficient and may lead to compatibility issues or inconsistencies.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data engineering team is working on ETL pipelines using a shared Databricks cluster and needs to utilize a third-party Python library, etl_utils, in their notebooks. What is the best method to ensure this library is accessible to all team members?
A
Modify the cluster to utilize the Databricks Runtime for Data Engineering.
B
Execute %pip install etl_utils in any notebook connected to the cluster.
C
Adjust the PYTHONPATH variable in the cluster settings to include the path to etl_utils.
D
Incorporate the dbutils.library.installPyPI('etl_utils') command into the cluster's initialization script.
E
It's impossible to make the etl_utils library available on a cluster.
No comments yet.