
Answer-first summary for fast verification
Answer: Have a staging table that moves the staged data over to the production table and deletes the contents of the staging table every three hours.
Option C is the correct answer. This option adheres to Google's best practices for extract, transform, load (ETL) processes. Using a staging table and a separate production table allows for data to be loaded into the staging table without impacting users of the data. The staging table is cleared and then moved to the production table every three hours. This approach balances performance and data freshness, as streaming data in BigQuery can stay in the streaming buffer for 30-60 minutes, making it unavailable for deletion during that time.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are tasked with constructing a report-only data warehouse that utilizes BigQuery, where data is continuously streamed using the streaming API. Following Google's best practices for handling such data, you have established both staging and production tables. In keeping with these best practices, what approach would you take to design the data loading process to ensure a single master dataset exists while maintaining optimal performance for both data ingestion and report generation?
A
Have a staging table that is an append-only model, and then update the production table every three hours with the changes written to staging.
B
Have a staging table that is an append-only model, and then update the production table every ninety minutes with the changes written to staging.
C
Have a staging table that moves the staged data over to the production table and deletes the contents of the staging table every three hours.
D
Have a staging table that moves the staged data over to the production table and deletes the contents of the staging table every thirty minutes.