
Answer-first summary for fast verification
Answer: Reduce the trigger interval to 5 seconds. This creates smaller, more frequent micro-batches, preventing backlog growth and reducing the risk of data spillage.
### Explanation **Correct Answer: A** To meet a tight latency SLA (like < 10 seconds), the primary strategy is to reduce the trigger interval (e.g., `.trigger(processingTime='5 seconds')`). This forces the engine to fire smaller, more frequent micro-batches. Smaller batches are processed faster and are less likely to cause memory spillage, which often occurs when a single large batch exceeds executor memory during shuffles. **Why other options are incorrect:** * **Sequential Execution:** Micro-batches in Structured Streaming run sequentially. The next batch starts only after the previous one completes and commits. Therefore, executors cannot start a new batch while tasks from a previous one are still running. This makes the reasoning in Option C incorrect. * **Checkpointing:** Databricks allows you to change trigger intervals between runs while reusing the same checkpoint; no new checkpoint directory is required to change a trigger interval. * **Latency Impact:** Increasing the trigger interval (Option B) increases the amount of data per batch, which increases processing time and worsens latency. * **Overhead:** `Trigger.Once` (or the modern `Trigger.AvailableNow`) is designed for batch-style processing. Scheduling a new job every 10 seconds introduces significant driver initialization overhead and is not an appropriate architectural choice for continuous, low-latency workloads.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A Structured Streaming job is failing to meet a 10-second latency SLA during peak hours. Which adjustment will most effectively ensure records are processed within the 10-second requirement?
A
Reduce the trigger interval to 5 seconds. This creates smaller, more frequent micro-batches, preventing backlog growth and reducing the risk of data spillage.
B
Extend the trigger interval to 30 seconds. By setting the interval closer to the maximum observed execution time, the engine ensures no records are dropped.
C
Decrease the trigger interval to 5 seconds. This allows idle executors to start processing the next micro-batch while longer tasks from the current batch finish.
D
Increase the number of shuffle partitions for maximum parallelism, as modifying the trigger interval requires the creation of a new checkpoint directory.
E
Utilize the Trigger.Once option and set up a Databricks job to execute the query every 10 seconds to process all backlogged records in a single batch.