Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

Explanation:

The %sh magic command runs shell commands locally on the cluster's driver node. Consequently, operations like cloning a repository, executing a standalone Python script, or moving files via shell commands are limited to the resources of a single node. This approach does not leverage Spark's distributed processing capabilities or the available worker nodes, making it highly inefficient for large-scale data tasks.

Explanation:

Comments (0)

No comments yet.

The following code has been migrated to a Databricks notebook from a legacy workload:

%sh
git clone https://github.com/foo/data_loader;
python ./data_loader/run.py;
mv ./output /dbfs/mnt/new_data

%sh
git clone https://github.com/foo/data_loader;
python ./data_loader/run.py;
mv ./output /dbfs/mnt/new_data

Which statement provides the most likely explanation for the high latency observed during the execution of this cell?

Real Exam

Last updated: May 6, 2026 at 14:02

Python execution is inherently slower than Scala on Databricks; the run.py script should be refactored into Scala to leverage the JVM's performance.

5.9%

The %sh magic command triggers a cluster restart to initialize Git, meaning the majority of the latency is caused by the cluster startup time.

7.8%

Instead of cloning the repository, the code should use %sh pip install to ensure the Python logic is automatically distributed across all nodes in the cluster.

11.8%

The %sh command does not support distributed file movement; the final line must be updated to use %fs to enable parallelized I/O across the cluster.

21.6%

The %sh magic command executes shell code exclusively on the driver node, failing to utilize worker nodes or Databricks' optimized Spark engine for distributed processing.

52.9%