
Answer-first summary for fast verification
Answer: Implement clustering in BigQuery on the package-tracking ID column.
The correct answer is B. Implement clustering in BigQuery on the package-tracking ID column. Clustering in BigQuery can significantly improve query performance by organizing the data based on the values in the clustering columns. Since analysts are querying to analyze geospatial trends in the lifecycle of a package, clustering by package-tracking ID would help by reducing the amount of scanned data, as the relevant rows would likely be stored together. Partitioning by ingest date can lead to increased processing time as the volume of data grows, hence clustering by a more relevant key (package-tracking ID) enhances performance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A shipping company streams live package-tracking data to an Apache Kafka stream in real time, which is subsequently loaded into BigQuery. The analysts in your company aim to query this tracking data in BigQuery to analyze geospatial trends throughout the lifecycle of a package. The table holding this data was initially created with ingest-date partitioning. However, the query processing time has gradually increased. To enhance the performance of queries in BigQuery, what should you do?
A
Implement clustering in BigQuery on the ingest date column.
B
Implement clustering in BigQuery on the package-tracking ID column.
C
Tier older data onto Cloud Storage files and create a BigQuery table using Cloud Storage as an external data source.
D
Re-create the table using data partitioning on the package delivery date.
No comments yet.