
Answer-first summary for fast verification
Answer: Increase the number of streaming units., Scale out the query by using PARTITION BY.
## Explanation To reduce latency in an Azure Stream Analytics job with 10,000 distinct values for the `clusterID` column, the optimal approach involves both resource scaling and query optimization strategies. ### Selected Options: **B. Increase the number of streaming units:** - Streaming Units (SUs) represent the compute resources allocated to a Stream Analytics job - Increasing SUs provides more processing capacity, allowing the job to handle higher data volumes and complex computations - This directly addresses resource constraints that contribute to latency by providing additional CPU and memory resources - For jobs processing large datasets with many distinct values, adequate SU allocation is essential for maintaining low latency **D. Scale out the query by using PARTITION BY:** - The `PARTITION BY` clause enables parallel processing by distributing the workload across multiple compute nodes - With 10,000 distinct `clusterID` values, partitioning by this column allows the system to process different clusters in parallel - This horizontal scaling approach significantly reduces processing bottlenecks - Partitioning is particularly effective when dealing with high cardinality columns like `clusterID` with many distinct values ### Why Other Options Are Less Suitable: **A. Add a pass-through query:** - Pass-through queries simply forward data without transformation - This doesn't address the underlying performance issues causing latency - No computational optimization is achieved with this approach **C. Add a temporal analytic function:** - Temporal functions add complexity to queries and can increase computational overhead - These functions are for time-based analysis, not performance optimization - Adding more functions typically increases, rather than decreases, latency **E. Convert the query to a reference query:** - Reference queries are for joining streaming data with reference data - This conversion doesn't inherently improve performance for latency issues - The fundamental problem is resource allocation and parallelization, not query type ### Combined Approach: The combination of increasing streaming units (resource scaling) and using `PARTITION BY` (query optimization) provides both the computational power and efficient parallel processing needed to handle the high cardinality of 10,000 distinct `clusterID` values effectively, thereby reducing latency.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You have an Azure Stream Analytics job that uses a query returning a result set with 10,000 distinct values for a column named clusterID. You observe high latency in the job. What two actions should you take to reduce the latency? Each correct answer presents a complete solution.
A
Add a pass-through query.
B
Increase the number of streaming units.
C
Add a temporal analytic function.
D
Scale out the query by using PARTITION BY.
E
Convert the query to a reference query.