Microsoft Azure Data Engineer Associate - DP-203

Get started today

Ultimate access to all questions.

Explanation:

In Azure Synapse Analytics dedicated SQL pools, the REPLICATE distribution type is optimal for date dimension tables that are used by all fact tables to minimize data movement during queries. Here's why:

REPLICATE Distribution: This creates a full copy of the table on each compute node. Since date dimension tables are typically small (containing dates, holidays, fiscal periods, etc.), the storage overhead is minimal. When joined with fact tables distributed using HASH or ROUND_ROBIN, the replicated dimension is locally available on every node, eliminating the need for data movement (shuffling) during joins. This significantly improves query performance.
Why Not HASH: HASH distribution spreads data across nodes based on a distribution key. For a date dimension, this would require aligning the distribution key with fact tables (e.g., using DateKey). However, fact tables may use different distribution keys (e.g., ProductKey or CustomerKey), leading to data movement during joins if the keys don't match. This defeats the goal of minimizing movement.
Why Not ROUND_ROBIN: ROUND_ROBIN distributes rows evenly but randomly across nodes. This would cause data movement in almost every join scenario, as there's no logical alignment with fact table distribution, resulting in poor performance for dimension-table joins.

Best practices for Azure Synapse Analytics recommend using REPLICATE for small dimension tables (typically under 2 GB) to leverage local joins and avoid shuffling. Since date dimensions are compact and universally used, replication ensures efficient query execution across all fact tables.

Explanation:

REPLICATE Distribution: This creates a full copy of the table on each compute node. Since date dimension tables are typically small (containing dates, holidays, fiscal periods, etc.), the storage overhead is minimal. When joined with fact tables distributed using HASH or ROUND_ROBIN, the replicated dimension is locally available on every node, eliminating the need for data movement (shuffling) during joins. This significantly improves query performance.
Why Not HASH: HASH distribution spreads data across nodes based on a distribution key. For a date dimension, this would require aligning the distribution key with fact tables (e.g., using DateKey). However, fact tables may use different distribution keys (e.g., ProductKey or CustomerKey), leading to data movement during joins if the keys don't match. This defeats the goal of minimizing movement.
Why Not ROUND_ROBIN: ROUND_ROBIN distributes rows evenly but randomly across nodes. This would cause data movement in almost every join scenario, as there's no logical alignment with fact table distribution, resulting in poor performance for dimension-table joins.

Comments (0)

No comments yet.

You are designing a date dimension table in an Azure Synapse Analytics dedicated SQL pool that will be used by all fact tables. Which distribution type should you use to minimize data movement during queries?

Exam-Like

Last updated: July 5, 2026 at 14:03

HASH

REPLICATE

ROUND_ROBIN