
Explanation:
To minimize both compute and storage costs while meeting a 10-minute SLA, the most effective strategy is to use incremental batch processing.
Trigger.Once or Trigger.AvailableNow combined with a Databricks Job scheduled for every 10 minutes, the cluster only runs long enough to process the available data. This avoids the cost of a 24/7 continuously running cluster.Why other options are incorrect:
Ultimate access to all questions.
No comments yet.
A production Structured Streaming job must process records within a 10-minute SLA. The engineering team aims to minimize cloud storage and compute costs while meeting this requirement. Which configuration change should be implemented?
A
Set the trigger interval to 3 seconds; the default trigger interval consumes too many records per batch, causing disk spills and increased storage costs.
B
Increase the number of shuffle partitions to maximize parallelism, as the trigger interval cannot be modified once the checkpoint directory is established.
C
Set the trigger interval to 10 minutes within a continuous streaming query to minimize the frequency of API calls to the source storage account.
D
Set the trigger interval to 500 milliseconds; a non-zero interval ensures the source is not queried too frequently, reducing overhead.
E
Use the Trigger.Once (or AvailableNow) option and configure a Databricks job to execute the query every 10 minutes.