
Ultimate access to all questions.
You possess historical data spanning the last three years stored in BigQuery, alongside a data pipeline that updates this BigQuery dataset with new daily data. Observations have shown that when the Data Science team executes a query filtered by a date column and constrained to a 30-90 day range, the entire table is unintentionally scanned. This extensive table scanning has led to an unexpected surge in costs. To address this, your goal is to find a cost-effective solution that still allows for efficient SQL querying. What steps should you take?
A
Re-create the tables using DDL. Partition the tables by a column containing a TIMESTAMP or DATE Type.
B
Recommend that the Data Science team export the table to a CSV file on Cloud Storage and use Cloud Datalab to explore the data by reading the files directly.
C
Modify your pipeline to maintain the last 30-90 days of data in one table and the longer history in a different table to minimize full table scans over the entire history.
D
Write an Apache Beam pipeline that creates a BigQuery table per day. Recommend that the Data Science team use wildcards on the table name suffixes to select the data they need.