
Ultimate access to all questions.
A production Structured Streaming job is currently processing each microbatch in under 3 seconds during normal operation, but experiences inconsistent execution times during peak hours—sometimes exceeding 30 seconds per microbatch. The streaming write is configured with a 10-second trigger interval.
Assuming all other variables remain constant and records must be processed in under 10 seconds, which configuration adjustment will satisfy this requirement?
A
Decrease the trigger interval to 5 seconds; triggering batches more frequently allows idle executors to begin processing the next batch while longer running tasks from previous batches finish.
B
Increase the trigger interval to 30 seconds; setting the trigger interval near the maximum execution time observed for each batch is always best practice to ensure no records are dropped.
C
The trigger interval cannot be modified without modifying the checkpoint directory; to maintain the current stream state, increase the number of shuffle partitions to maximize parallelism.
D
Use the trigger once option and configure a Databricks job to execute the query every 10 seconds; this ensures all backlogged records are processed with each batch.
E
Decrease the trigger interval to 5 seconds; triggering batches more frequently may prevent records from backing up and large batches from causing spill.