Reddit

You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool for a retail store. The table will contain purchase data from suppliers with the following columns:

PurchaseKey
SupplierKey
StockItemKey
DateKey
IsOrderFinalized

The table will have 1 million rows added daily and will store three years of data. Daily queries will be executed that are similar to the following:

SELECT
    SupplierKey,
    StockItemKey,
    IsOrderFinalized,
    COUNT(*)
FROM FactPurchase
WHERE DateKey >= 20210101
    AND DateKey <= 20210131
GROUP BY
    SupplierKey,
    StockItemKey,
    IsOrderFinalized

SELECT
    SupplierKey,
    StockItemKey,
    IsOrderFinalized,
    COUNT(*)
FROM FactPurchase
WHERE DateKey >= 20210101
    AND DateKey <= 20210131
GROUP BY
    SupplierKey,
    StockItemKey,
    IsOrderFinalized

Which table distribution type will minimize query times?

Exam-Like

replicated

hash-distributed on PurchaseKey

round-robin

hash-distributed on IsOrderFinalized

Microsoft Azure Data Engineer Associate - DP-203

Get started today

Comments