
Explanation:
The correct answer is A. Re-creating the tables using DDL and partitioning the tables by a column containing a TIMESTAMP or DATE type can significantly improve query performance and reduce costs. Partitioning allows queries to scan only a subset of the data rather than the entire table, which is more efficient and cost-effective. This approach maintains the ability to conduct SQL queries without the high costs associated with full table scans.
Ultimate access to all questions.
You possess historical data spanning the last three years stored in BigQuery, alongside a data pipeline that updates this BigQuery dataset with new daily data. Observations have shown that when the Data Science team executes a query filtered by a date column and constrained to a 30-90 day range, the entire table is unintentionally scanned. This extensive table scanning has led to an unexpected surge in costs. To address this, your goal is to find a cost-effective solution that still allows for efficient SQL querying. What steps should you take?
A
Re-create the tables using DDL. Partition the tables by a column containing a TIMESTAMP or DATE Type.
B
Recommend that the Data Science team export the table to a CSV file on Cloud Storage and use Cloud Datalab to explore the data by reading the files directly.
C
Modify your pipeline to maintain the last 30-90 days of data in one table and the longer history in a different table to minimize full table scans over the entire history.
D
Write an Apache Beam pipeline that creates a BigQuery table per day. Recommend that the Data Science team use wildcards on the table name suffixes to select the data they need.
No comments yet.