
Answer-first summary for fast verification
Answer: append
## Explanation In Azure Databricks structured streaming, the **append** output mode is the correct choice for this scenario because: ### Why Append Mode is Optimal: - **Append mode** only writes new rows to the output sink and never modifies existing rows - For windowed aggregations (like counting events in 5-minute intervals), append mode ensures that once a window's result is written, it remains unchanged - This aligns perfectly with the requirement to "report only events that arrive during the interval" - each interval's count is written once and never updated - When writing to Delta Lake tables, append mode maintains data integrity by preventing overwrites of previously calculated interval counts ### Why Other Options Are Less Suitable: - **Update mode**: This would modify existing rows in the output, which contradicts the requirement to report only events that arrived during each specific interval. Update mode is better for scenarios where results need to be refined as more data arrives. - **Complete mode**: This rewrites the entire output table on each trigger, which is inefficient for this use case and doesn't align with the requirement to report only new interval events. Complete mode is typically used when you need the entire aggregated result set available at all times. ### Best Practice Considerations: - For time-windowed aggregations where you want to preserve historical interval counts, append mode is the standard approach - The combination of windowed aggregation + append mode + Delta Lake provides exactly the behavior described: counting events per interval and preserving each interval's result permanently - This approach ensures that downstream consumers can reliably read completed interval counts without worrying about data being modified retroactively
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are building a structured streaming solution in Azure Databricks that will count new events in five-minute intervals. The solution must report only the events that arrived during each interval, and the output will be written to a Delta Lake table.
Which output mode should you use?
A
update
B
complete
C
append
No comments yet.