
Answer-first summary for fast verification
Answer: Partition the table by transaction date and cluster by product ID.
Partitioning the fact table by transaction date allows for data to be stored in separate physical files based on the date of the transaction. This can significantly improve query performance as it allows for partition pruning, which means that the query engine can skip reading partitions that do not contain relevant data. This reduces the amount of data that needs to be scanned during query execution. Clustering the table by product ID further enhances query performance by physically organizing the data on disk based on the product ID. This can improve query performance by reducing the number of disk reads required to retrieve data for a specific product, as related data is stored together. Overall, partitioning by transaction date and clustering by product ID are effective design considerations for optimizing query performance in a large fact table containing sales transactions. This approach can help reduce query execution time and improve overall system performance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
For a large fact table containing sales transactions in a lakehouse, which design consideration most effectively improves query performance?
A
Normalize sales data into multiple tables to reduce the size of the fact table.
B
Store the fact table in a columnar format without partitioning or clustering.
C
Partition the table by transaction date and cluster by product ID.
D
Denormalize related dimensions into the fact table to avoid join operations.