
Answer-first summary for fast verification
Answer: Incorporate `/databricks/python/bin/pip install newpackage` into the cluster's bash init script
To ensure the Python library 'newpackage' is available across all notebooks on a Databricks cluster, the most effective approach is to add `/databricks/python/bin/pip install newpackage` to the cluster's bash initialization script. This script runs upon cluster startup, installing 'newpackage' cluster-wide. - **Option A** suggests using the Databricks Runtime for Machine Learning, which includes pre-installed ML libraries but may not be necessary for a single library installation. - **Option B**, running `%pip install newpackage` in a notebook, only installs the package for that specific session, not cluster-wide. - **Option C**, setting the runtime-version to 'ml', selects a specific Databricks runtime version but doesn't directly facilitate library installation. - **Option E** incorrectly states that cluster-wide availability is impossible, overlooking methods like initialization scripts. Thus, modifying the cluster's bash init script is the optimal solution for making 'newpackage' universally accessible.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A machine learning team aims to utilize the Python library 'newpackage' across all their projects, sharing a common cluster. What is the most effective method to ensure 'newpackage' is accessible in all notebooks on this cluster?
A
Configure the cluster to utilize the Databricks Runtime for Machine Learning
B
Execute %pip install newpackage in any notebook connected to the cluster
C
Adjust the runtime-version variable in their Spark session to 'ml'
D
Incorporate /databricks/python/bin/pip install newpackage into the cluster's bash init script
E
It's impossible to make 'newpackage' available across the entire cluster
No comments yet.