Ultimate access to all questions.
You possess historical data spanning the last three years stored in BigQuery, alongside a data pipeline that updates this BigQuery dataset with new daily data. Observations have shown that when the Data Science team executes a query filtered by a date column and constrained to a 30-90 day range, the entire table is unintentionally scanned. This extensive table scanning has led to an unexpected surge in costs. To address this, your goal is to find a cost-effective solution that still allows for efficient SQL querying. What steps should you take?
Explanation:
The correct answer is A. Re-creating the tables using DDL and partitioning the tables by a column containing a TIMESTAMP or DATE type can significantly improve query performance and reduce costs. Partitioning allows queries to scan only a subset of the data rather than the entire table, which is more efficient and cost-effective. This approach maintains the ability to conduct SQL queries without the high costs associated with full table scans.