
Answer-first summary for fast verification
Answer: Estimate the average latency for data availability after streaming inserts, and always run queries after waiting twice as long.
The correct answer is D: Estimate the average latency for data availability after streaming inserts, and always run queries after waiting twice as long. This is because BigQuery streaming inserts initially store data in a buffer before writing it to storage. Running queries too soon after streaming inserts may result in missing data that is still in the buffer. Waiting an estimated average latency time ensures that the data has been fully written to storage, thereby maintaining strong consistency and minimizing the possibility of missing in-flight data during querying.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are tasked with designing an application to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute, aiming for near real-time processing. Initially, you plan to use streaming inserts for each individual posting. Following the streaming inserts, your application performs data aggregations immediately. However, you encounter an issue where the queries executed immediately after the streaming inserts lack strong consistency, leading to reports that might miss in-flight data. How can you modify your application design to address this issue?
A
Re-write the application to load accumulated data every 2 minutes.
B
Convert the streaming insert code to batch load for individual messages.
C
Load the original message to Google Cloud SQL, and export the table every hour to BigQuery via streaming inserts.
D
Estimate the average latency for data availability after streaming inserts, and always run queries after waiting twice as long.