Microsoft Azure Data Engineer Associate - DP-203

Get started today

Ultimate access to all questions.

Explanation:

Analysis of Data Skew Identification in Azure Synapse Analytics

To identify data skew in a distributed table within an Azure Synapse Analytics dedicated SQL pool, the correct approach involves connecting to the dedicated pool and querying the appropriate system view.

Why Option D is Correct

sys.dm_pdw_nodes_db_partition_stats is the recommended system view for analyzing data skew in distributed tables
This view provides detailed partition statistics across all compute nodes, allowing you to compare data distribution patterns
By examining metrics like row counts and space usage across different distributions, you can quantify the extent of data skew
This approach works specifically when connected to the dedicated SQL pool (Pool1), which is essential since the table resides there

Why Other Options Are Incorrect

Option A (Connect to built-in pool and run DBCC PDW_SHOWSPACEUSED):

DBCC PDW_SHOWSPACEUSED is not supported in serverless SQL pools (built-in pool)
Even if it were supported, connecting to the wrong pool would prevent access to Table1

Option B (Connect to built-in pool and run DBCC CHECKALLOC):

DBCC CHECKALLOC is not designed for identifying data skew in distributed tables
This command checks page allocation consistency, not distribution patterns
Again, connecting to the built-in pool prevents access to the dedicated pool's tables

Option C (Connect to Pool1 and query sys.dm_pdw_node_status):

sys.dm_pdw_node_status provides node health and status information, not data distribution statistics
This view shows node availability and operational status, not table-level data skew metrics

Best Practice Considerations

Always connect to the dedicated SQL pool when working with distributed tables
Use system views specifically designed for analyzing distribution patterns
Monitor data skew regularly as it can significantly impact query performance in distributed systems
Consider redistributing tables with significant skew using appropriate distribution keys to optimize performance

Explanation:

Analysis of Data Skew Identification in Azure Synapse Analytics

Why Option D is Correct

sys.dm_pdw_nodes_db_partition_stats is the recommended system view for analyzing data skew in distributed tables
This view provides detailed partition statistics across all compute nodes, allowing you to compare data distribution patterns
By examining metrics like row counts and space usage across different distributions, you can quantify the extent of data skew
This approach works specifically when connected to the dedicated SQL pool (Pool1), which is essential since the table resides there

Why Other Options Are Incorrect

Option A (Connect to built-in pool and run DBCC PDW_SHOWSPACEUSED):

DBCC PDW_SHOWSPACEUSED is not supported in serverless SQL pools (built-in pool)
Even if it were supported, connecting to the wrong pool would prevent access to Table1

Option B (Connect to built-in pool and run DBCC CHECKALLOC):

DBCC CHECKALLOC is not designed for identifying data skew in distributed tables
This command checks page allocation consistency, not distribution patterns
Again, connecting to the built-in pool prevents access to the dedicated pool's tables

Option C (Connect to Pool1 and query sys.dm_pdw_node_status):

sys.dm_pdw_node_status provides node health and status information, not data distribution statistics
This view shows node availability and operational status, not table-level data skew metrics

Best Practice Considerations

Always connect to the dedicated SQL pool when working with distributed tables
Use system views specifically designed for analyzing distribution patterns
Monitor data skew regularly as it can significantly impact query performance in distributed systems
Consider redistributing tables with significant skew using appropriate distribution keys to optimize performance

Comments (0)

No comments yet.

You have an Azure Synapse Analytics dedicated SQL pool named Pool1 and a database named DB1 that contains a fact table named Table1. You need to determine the extent of data skew in Table1. What should you run in Synapse Studio?

Exam-Like

Last updated: July 15, 2026 at 14:06

Connect to the built-in pool and run DBCC PDW_SHOWSPACEUSED.

Connect to the built-in pool and run DBCC CHECKALLOC.

Connect to Pool1 and query sys.dm_pdw_node_status.

Connect to Pool1 and query sys.dm_pdw_nodes_db_partition_stats.