
Answer-first summary for fast verification
Answer: Append
## Detailed Explanation ### Understanding the Scenario This scenario involves processing streaming sales transaction data in Azure Databricks using Structured Streaming, with the following key requirements: - **Immutable transactions**: Sales transactions are never updated; instead, new rows are added to adjust sales - **Aggregation required**: Line total sales amount and line total tax amount need to be aggregated - **Minimize duplicates**: The solution must minimize duplicate data ### Analysis of Output Modes #### **Append Mode (Option C)** - **Append mode** only writes new rows to the output sink that are added to the result table since the last trigger - This is optimal because the requirement explicitly states that "Sales transactions will never be updated. Instead, new rows will be added to adjust a sale" - Since adjustments are handled by adding new rows rather than updating existing ones, Append mode naturally aligns with this pattern - When aggregations are performed, Append mode ensures that only the incremental results are written, preventing duplication #### **Update Mode (Option A)** - **Update mode** writes updated rows to the output sink when changes occur in the result table - This would be inappropriate because the requirement specifies that transactions are never updated - only new rows are added - Using Update mode would be inefficient and potentially misleading, as it's designed for scenarios where existing records are modified #### **Complete Mode (Option B)** - **Complete mode** writes the entire result table to the output sink after each trigger - This would create significant duplication since the entire dataset is rewritten each time - Complete mode is typically used for scenarios requiring the full state to be available, which is unnecessary here ### Why Append is the Optimal Choice 1. **Matches the business logic**: The requirement that "new rows will be added to adjust a sale" directly corresponds to Append mode's behavior 2. **Minimizes duplicates**: Append mode only writes new incremental data, avoiding the duplication that would occur with Complete mode 3. **Efficient for streaming**: Append mode is designed for continuous addition of new records without modifying existing ones 4. **Works with aggregations**: When aggregations are applied in Structured Streaming, Append mode correctly handles the incremental results ### Conclusion Given that sales transactions are immutable and adjustments are made by adding new rows rather than updating existing ones, **Append mode** is the most appropriate output mode. It directly supports the business requirement while minimizing duplicate data and maintaining efficient streaming processing.
Ultimate access to all questions.
Author: LeetQuiz Editorial Team
You are designing a streaming data solution using Azure Databricks to process sales transaction data from an online store. The solution has the following requirements:
You need to recommend a Structured Streaming output mode for the processed dataset. The solution must minimize the presence of duplicate data.
What output mode should you recommend?

A
Update
B
Complete
C
Append
No comments yet.