
Answer-first summary for fast verification
Answer: Use the `Trigger.Once` (or `AvailableNow`) option and configure a Databricks job to execute the query every 10 minutes.
To minimize both compute and storage costs while meeting a 10-minute SLA, the most effective strategy is to use **incremental batch processing**. * **Cost Efficiency:** By using `Trigger.Once` or `Trigger.AvailableNow` combined with a Databricks Job scheduled for every 10 minutes, the cluster only runs long enough to process the available data. This avoids the cost of a 24/7 continuously running cluster. * **Storage Savings:** This approach minimizes the number of micro-batch metadata commits and API calls to the storage layer compared to high-frequency streaming, which can significantly drive up costs. * **SLA Compliance:** Scheduling the trigger at the maximum allowable threshold (10 minutes) ensures all new records are processed within the required window. **Why other options are incorrect:** * **Continuous Triggers (A, C, D):** These keep the cluster active even when no data is arriving, leading to higher compute costs. Extremely short intervals (3s or 500ms) further increase storage API costs due to constant polling. * **Shuffle Partitions (B):** Shuffle partitions affect parallelism within a batch but do not address the idle compute time or the frequency of storage metadata operations.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A production Structured Streaming job must process records within a 10-minute SLA. The engineering team aims to minimize cloud storage and compute costs while meeting this requirement. Which configuration change should be implemented?
A
Set the trigger interval to 3 seconds; the default trigger interval consumes too many records per batch, causing disk spills and increased storage costs.
B
Increase the number of shuffle partitions to maximize parallelism, as the trigger interval cannot be modified once the checkpoint directory is established.
C
Set the trigger interval to 10 minutes within a continuous streaming query to minimize the frequency of API calls to the source storage account.
D
Set the trigger interval to 500 milliseconds; a non-zero interval ensures the source is not queried too frequently, reducing overhead.
E
Use the Trigger.Once (or AvailableNow) option and configure a Databricks job to execute the query every 10 minutes.