
Ultimate access to all questions.
The following code has been migrated to a Databricks notebook from a legacy workload:
%sh
git clone https://github.com/foo/data_loader;
python ./data_loader/run.py;
mv ./output /dbfs/mnt/new_data
%sh
git clone https://github.com/foo/data_loader;
python ./data_loader/run.py;
mv ./output /dbfs/mnt/new_data
Which statement provides the most likely explanation for the high latency observed during the execution of this cell?
A
Python execution is inherently slower than Scala on Databricks; the run.py script should be refactored into Scala to leverage the JVM's performance.
B
The %sh magic command triggers a cluster restart to initialize Git, meaning the majority of the latency is caused by the cluster startup time.
C
Instead of cloning the repository, the code should use %sh pip install to ensure the Python logic is automatically distributed across all nodes in the cluster.
D
The %sh command does not support distributed file movement; the final line must be updated to use %fs to enable parallelized I/O across the cluster.
E
The %sh magic command executes shell code exclusively on the driver node, failing to utilize worker nodes or Databricks' optimized Spark engine for distributed processing.