
Answer-first summary for fast verification
Answer: Calling display() forces a job to trigger, while many transformations will only add to the logical query plan; because of caching, repeated execution of the same logic does not provide meaningful results.
The user's current method of measuring execution time by running cells multiple times interactively with `display()` calls is not accurate for production performance due to caching effects. Option D correctly points out that `display()` forces a job to trigger and caching can make repeated executions faster, not reflecting true production performance. The other options are not correct: Option A is incorrect because Photon can be enabled on any cluster, not just those launched for scheduled jobs. Option B is incorrect because while using production-sized data and clusters can provide more accurate measurements, it's not the only way to troubleshoot execution times. Option C is incorrect because local builds of Spark and Delta Lake do not include Databricks-specific optimizations, making them poor benchmarks for production performance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Which of the following modifications would provide a more precise assessment of how the code will perform in a production environment for a Databricks user troubleshooting pipeline execution times? The user is currently testing transformations interactively by running cells multiple times with display() calls to verify correctness.
A
The Jobs UI should be leveraged to occasionally run the notebook as a job and track execution time during incremental code development because Photon can only be enabled on clusters launched for scheduled jobs.
B
The only way to meaningfully troubleshoot code execution times in development notebooks is to use production-sized data and production-sized clusters with Run All execution.
C
Production code development should only be done using an IDE; executing code against a local build of open source Spark and Delta Lake will provide the most accurate benchmarks for how code will perform in production.
D
Calling display() forces a job to trigger, while many transformations will only add to the logical query plan; because of caching, repeated execution of the same logic does not provide meaningful results.