Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

A data engineer is troubleshooting performance bottlenecks in their pipeline logic. Currently, they are developing interactively by executing notebook cells one-by-one and using `display()` calls to validate each step. To estimate production execution time, they manually re-run cells multiple times.

Which of the following adjustments would provide the most precise evaluation of how the code will perform once deployed to production?

Real Exam

Last updated: January 6, 2026 at 15:41

Restructure all PySpark and Spark SQL logic into Scala JARs, as Scala is the only language that allows for accurate performance benchmarking and optimal execution in interactive notebooks.

9.1%

Utilize production-sized datasets and production-grade clusters while using the Run All execution mode to measure performance.

Comments

Loading comments...

Continue using display() calls to trigger jobs manually, while accounting for the fact that Spark only contributes to the logical query plan until an action is called.

22.7%