
Answer-first summary for fast verification
Answer: 60 million
## Explanation In Azure Synapse Analytics dedicated SQL pool, when working with clustered columnstore indexes, the optimal approach for partitioning is based on the fundamental architecture of how data is distributed and compressed. ### Key Architectural Considerations: 1. **Default Distributions**: Dedicated SQL pool automatically divides each table into **60 distributions** by default, regardless of partitioning. This is a fundamental architectural characteristic of the MPP (Massively Parallel Processing) system. 2. **Optimal Row Count per Distribution/Partition**: For clustered columnstore tables to achieve optimal compression and query performance, Microsoft recommends having **at least 1 million rows per distribution and partition**. 3. **Partitioning Strategy**: When considering additional partitions beyond the default distributions, the total table size must support having sufficient rows in each resulting segment. ### Calculation: - 60 distributions × 1 million rows per distribution = **60 million rows** ### Why 60 Million is Correct: - With 60 distributions already in place, creating partitions would further subdivide the data - To maintain the 1 million row threshold per segment (distribution + partition combination), the table must contain at least 60 million rows - Below this threshold, partitioning would create segments with fewer than 1 million rows, potentially degrading compression and performance ### Why Other Options Are Less Suitable: - **A (100,000)**: Far too small - would result in only ~1,667 rows per distribution, severely impacting columnstore compression efficiency - **B (600,000)**: Still insufficient - would average only 10,000 rows per distribution, well below the optimal threshold - **C (1 million)**: This represents the minimum per distribution/partition, but for the entire table with 60 distributions, this would result in only ~16,667 rows per distribution, which is suboptimal This guidance aligns with Microsoft's best practices for maintaining optimal columnstore compression and query performance in large-scale data warehousing scenarios.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You have an Azure Synapse Analytics dedicated SQL pool with a fact table named Table1 that will use a clustered columnstore index. To optimize data compression and query performance, what is the minimum number of rows Table1 should have before creating partitions?
A
100,000
B
600,000
C
1 million
D
60 million