
Answer-first summary for fast verification
Answer: clustered columnstore
## Analysis of the Scenario Given a large fact table in Azure Synapse Analytics dedicated SQL pool with: - 5 billion rows - 50 columns - Currently stored as a heap - Queries aggregating values from approximately 100 million rows - Queries returning only two columns - Poor query performance ## Recommended Solution: Clustered Columnstore Index **Why Clustered Columnstore Index (CCI) is the optimal choice:** ### 1. **Columnar Storage for Analytical Workloads** - Clustered Columnstore Index stores data in columnar format, which is ideal for analytical queries that typically scan large volumes of data but only access a subset of columns - Since queries return only two columns out of 50, columnar storage provides significant performance benefits by reading only the required columns ### 2. **Excellent Compression** - CCI provides superior data compression (typically 10x compression ratio) - Reduced I/O operations when scanning 100 million rows - Better memory utilization during query execution ### 3. **Batch Mode Processing** - Columnstore indexes enable batch mode processing, which processes multiple rows simultaneously - This is particularly effective for aggregation operations on large datasets ### 4. **Large Table Optimization** - Microsoft recommends clustered columnstore indexes for tables with more than 60-100 million rows - With 5 billion rows, this table clearly falls into the category where CCI provides maximum benefit ### 5. **Heap Conversion** - Since the table is currently a heap, converting to clustered columnstore index will organize the data optimally for analytical queries - The conversion process will restructure the entire table into columnar format ## Why Other Options Are Less Suitable ### **Nonclustered Columnstore Index (Option A)** - Not supported in Azure Synapse Analytics dedicated SQL pool - Only available in traditional SQL Server environments - Would not be a valid option for this Azure service ### **Nonclustered Index (Option C)** - Row-based storage doesn't provide the same compression benefits - Less efficient for scanning large volumes of data (100 million rows) - Would require creating multiple indexes for different query patterns - Not optimized for aggregation operations on large datasets ### **Clustered Index (Option D)** - Row-based storage limits compression benefits - Less efficient for analytical queries that access only a few columns - Better suited for OLTP workloads with point lookups rather than large-scale aggregations - Microsoft recommends clustered indexes for tables up to 100 million rows, not 5 billion ## Performance Impact Implementing a clustered columnstore index on this table would: - Reduce I/O by reading only the two required columns - Leverage high compression to minimize data movement - Enable batch mode processing for faster aggregations - Provide the optimal storage format for the described analytical workload pattern This solution aligns with Microsoft's best practices for large fact tables in data warehousing scenarios using Azure Synapse Analytics.
Ultimate access to all questions.
Author: LeetQuiz Editorial Team
You have an Azure Synapse Analytics dedicated SQL pool containing a large heap fact table with 50 columns and 5 billion rows. Most queries aggregate values from around 100 million rows and return only two columns. These queries are performing very slowly.
What type of index should you add to achieve the fastest query performance?
A
nonclustered columnstore
B
clustered columnstore
C
nonclustered
D
clustered
No comments yet.