
Databricks Certified Data Engineer - Professional
Get started today
Ultimate access to all questions.
A production-deployed Structured Streaming job is incurring higher than expected cloud storage costs. Currently, each microbatch processes in under 3 seconds during normal execution, with at least 12 microbatch executions per minute containing zero records. The streaming write uses default trigger settings. The job runs in a workspace with instance pools provisioned to minimize startup time for batch jobs, alongside many other Databricks jobs.
Assuming all other variables remain constant and records must be processed within 10 minutes, which configuration adjustment will meet this requirement while addressing the cost issue?
A production-deployed Structured Streaming job is incurring higher than expected cloud storage costs. Currently, each microbatch processes in under 3 seconds during normal execution, with at least 12 microbatch executions per minute containing zero records. The streaming write uses default trigger settings. The job runs in a workspace with instance pools provisioned to minimize startup time for batch jobs, alongside many other Databricks jobs.
Assuming all other variables remain constant and records must be processed within 10 minutes, which configuration adjustment will meet this requirement while addressing the cost issue?
Explanation:
The issue arises from frequent API calls to the source storage due to the default trigger (0s interval), leading to high costs. The solution must reduce API calls while ensuring processing latency <10 minutes. Option A is the correct choice because setting a 3s trigger interval reduces the number of microbatches (from potentially many per second to 20/min), lowering API calls and it meets latency requirements (max latency 3s). Options B, C, and D either violate the latency requirements or worsen the cost issue.