Microsoft Azure Data Engineer Associate - DP-203

Get started today

Ultimate access to all questions.

Explanation:

Analysis of the Scenario

Given a large fact table in Azure Synapse Analytics dedicated SQL pool with:

5 billion rows
50 columns
Currently stored as a heap
Queries aggregating values from approximately 100 million rows
Queries returning only two columns
Poor query performance

Why Other Options Are Less Suitable

Nonclustered Columnstore Index (Option A)

Not supported in Azure Synapse Analytics dedicated SQL pool
Only available in traditional SQL Server environments
Would not be a valid option for this Azure service

Nonclustered Index (Option C)

Row-based storage doesn't provide the same compression benefits
Less efficient for scanning large volumes of data (100 million rows)
Would require creating multiple indexes for different query patterns
Not optimized for aggregation operations on large datasets

Clustered Index (Option D)

Row-based storage limits compression benefits
Less efficient for analytical queries that access only a few columns
Better suited for OLTP workloads with point lookups rather than large-scale aggregations
Microsoft recommends clustered indexes for tables up to 100 million rows, not 5 billion

Performance Impact

Implementing a clustered columnstore index on this table would:

Reduce I/O by reading only the two required columns
Leverage high compression to minimize data movement
Enable batch mode processing for faster aggregations
Provide the optimal storage format for the described analytical workload pattern

This solution aligns with Microsoft's best practices for large fact tables in data warehousing scenarios using Azure Synapse Analytics.

Explanation:

Analysis of the Scenario

Given a large fact table in Azure Synapse Analytics dedicated SQL pool with:

5 billion rows
50 columns
Currently stored as a heap
Queries aggregating values from approximately 100 million rows
Queries returning only two columns
Poor query performance

Why Other Options Are Less Suitable

Nonclustered Columnstore Index (Option A)

Not supported in Azure Synapse Analytics dedicated SQL pool
Only available in traditional SQL Server environments
Would not be a valid option for this Azure service

Nonclustered Index (Option C)

Row-based storage doesn't provide the same compression benefits
Less efficient for scanning large volumes of data (100 million rows)
Would require creating multiple indexes for different query patterns
Not optimized for aggregation operations on large datasets

Clustered Index (Option D)

Row-based storage limits compression benefits
Less efficient for analytical queries that access only a few columns
Better suited for OLTP workloads with point lookups rather than large-scale aggregations
Microsoft recommends clustered indexes for tables up to 100 million rows, not 5 billion

Performance Impact

Implementing a clustered columnstore index on this table would:

Reduce I/O by reading only the two required columns
Leverage high compression to minimize data movement
Enable batch mode processing for faster aggregations
Provide the optimal storage format for the described analytical workload pattern

This solution aligns with Microsoft's best practices for large fact tables in data warehousing scenarios using Azure Synapse Analytics.

Comments (0)

No comments yet.

You have an Azure Synapse Analytics dedicated SQL pool containing a large heap fact table with 50 columns and 5 billion rows. Most queries aggregate values from around 100 million rows and return only two columns. These queries are performing very slowly.

What type of index should you add to achieve the fastest query performance?

Exam-Like

Last updated: July 15, 2026 at 14:06

nonclustered columnstore

clustered columnstore

nonclustered

clustered

Microsoft Azure Data Engineer Associate - DP-203

Analysis of the Scenario

Recommended Solution: Clustered Columnstore Index

1. Columnar Storage for Analytical Workloads

2. Excellent Compression

3. Batch Mode Processing

4. Large Table Optimization

5. Heap Conversion

Why Other Options Are Less Suitable

Nonclustered Columnstore Index (Option A)

Nonclustered Index (Option C)

Clustered Index (Option D)

Performance Impact

Analysis of the Scenario

Recommended Solution: Clustered Columnstore Index

1. Columnar Storage for Analytical Workloads

2. Excellent Compression

3. Batch Mode Processing

4. Large Table Optimization

5. Heap Conversion

Why Other Options Are Less Suitable

Nonclustered Columnstore Index (Option A)

Nonclustered Index (Option C)

Clustered Index (Option D)

Performance Impact

Comments (0)

Comments (0)