
Answer-first summary for fast verification
Answer: Partition by transaction time; cluster by state first, then city, then store ID.
For optimal query performance in BigQuery, especially when frequently querying data over the past 30 days and analyzing by state, city, and individual store, the table should be partitioned by a time-related field and clustered by hierarchical location data. Partitioning by transaction time allows efficient querying over specific time ranges, reducing costs, and improving performance due to minimized data scanning. Furthermore, clustering by state first, then city, and then store ID aligns with the geographical hierarchy of the data and the query patterns mentioned, optimizing the queries filtering by these columns.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
As part of your migration to BigQuery, you are tasked with deciding on the optimal data model for a table that stores transactional purchase information across multiple store locations. This table includes columns for the time of the transaction, items purchased, store ID, and the city and state in which the store is located. You often query this table to analyze the number of each item sold over the past 30 days and to examine purchasing trends broken down by state, city, and individual store. What data modeling approach would you take to ensure the best query performance for these types of analyses?
A
Partition by transaction time; cluster by state first, then city, then store ID.
B
Partition by transaction time; cluster by store ID first, then city, then state.
C
Top-level cluster by state first, then city, then store ID.
D
Top-level cluster by store ID first, then city, then state.
No comments yet.