
Answer-first summary for fast verification
Answer: Implement clustering in BigQuery on the package-tracking ID column.
The correct answer is **B. Implement clustering in BigQuery on the package-tracking ID column.** Clustering organizes data based on the contents of one or more columns, grouping related data together physically on disk. This significantly improves query performance, especially for filtering or joining by the clustered columns. In this scenario, clustering on the package-tracking ID column is optimal because it ensures data related to the same package is stored together, making queries about geospatial trends more efficient. - **Option A** suggests tiering older data onto Cloud Storage, which may help manage costs but doesn't directly improve query performance. - **Option C** proposes partitioning on the package delivery date, which is useful for data management but not necessarily for improving query performance in this context. - **Option D** recommends clustering on the ingest date column, which may optimize queries involving the ingest date but is less effective for analyzing package lifecycle trends.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A shipping company is streaming live package-tracking data to Apache Kafka, which is then loaded into BigQuery. Analysts are facing slower query processing times when analyzing geospatial trends in the package lifecycle, despite the table being initially partitioned by ingest-date. What is the most effective way to enhance query performance by transferring all data to a new clustered table in BigQuery?
A
Tier older data onto Cloud Storage files and create a BigQuery table using Cloud Storage as an external data source.
B
Implement clustering in BigQuery on the package-tracking ID column.
C
Re-create the table using data partitioning on the package delivery date.
D
Implement clustering in BigQuery on the ingest date column.