
Explanation:
The provided query filters data by DateKey (a 31-day range) and groups by SupplierKey and StockItemKey. The table will grow to approximately 1.1 billion rows (1 million daily × 3 years), making distribution strategy critical for performance.
A: Replicated
B: Hash-distributed on PurchaseKey
PurchaseKey is likely a unique identifier (surrogate key) with high cardinality, ensuring even data distribution across all 60 distributions. This prevents data skew and allows parallel processing. While PurchaseKey isn't directly used in the query, hash distribution on a high-cardinality column ensures balanced workload across all compute nodes.C: Round-robin
D: Hash-distributed on DateKey
DateKey. Date columns have low cardinality (only ~1,095 unique values for 3 years), causing severe data skew. With 60 distributions, most dates would map to empty distributions while a few would be overloaded. This creates "hot spots" where only a few distributions handle most of the workload, defeating the purpose of MPP architecture.PurchaseKey ensures even data spread across all distributionsHash distribution on PurchaseKey ensures that when the query scans the 31-day date range, the work is evenly distributed across all compute nodes. Each node processes a proportional share of the data, maximizing parallel processing efficiency and minimizing query execution time.
Ultimate access to all questions.
You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool for a retail store. The table will contain purchase data from suppliers with the following columns:
PurchaseKeySupplierKeyStockItemKeyDateKeyPurchaseQuantityPurchaseAmountThe table will have 1 million rows of data added daily and will contain three years of data. Daily Transact-SQL queries similar to the following will be executed:
SELECT
SupplierKey,
StockItemKey,
COUNT(*)
FROM FactPurchase
WHERE DateKey >= 20210101
AND DateKey <= 20210131
GROUP BY SupplierKey, StockItemKey
SELECT
SupplierKey,
StockItemKey,
COUNT(*)
FROM FactPurchase
WHERE DateKey >= 20210101
AND DateKey <= 20210131
GROUP BY SupplierKey, StockItemKey
Which table distribution type will minimize query times?

A
replicated
B
hash-distributed on PurchaseKey
C
round-robin
D
hash-distributed on DateKey
No comments yet.