
Ultimate access to all questions.
A production-deployed Structured Streaming job is incurring higher than expected cloud storage costs. Currently, each microbatch processes in under 3 seconds during normal execution, with at least 12 microbatch executions per minute containing zero records. The streaming write uses default trigger settings. The job runs in a workspace with instance pools provisioned to minimize startup time for batch jobs, alongside many other Databricks jobs.
Assuming all other variables remain constant and records must be processed within 10 minutes, which configuration adjustment will meet this requirement while addressing the cost issue?
A
Set the trigger interval to 3 seconds; the default trigger interval is consuming too many records per batch, resulting in spill to disk that can increase volume costs.
B
Use the trigger once option and configure a Databricks job to execute the query every 10 minutes; this approach minimizes costs for both compute and storage.
C
Set the trigger interval to 10 minutes; each batch calls APIs in the source storage account, so decreasing trigger frequency to maximum allowable threshold should minimize this cost.
D
Set the trigger interval to 500 milliseconds; setting a small but non-zero trigger interval ensures that the source is not queried too frequently.