
Answer-first summary for fast verification
Answer: 1. Check for duplicate rows in the BigQuery tables that have the daily partition data size doubled. 2. Check the BigQuery Audit logs to find job IDs. 3. Use Cloud Monitoring to determine when the identified Dataflow jobs started and the pipeline code version. 4. When more than one pipeline ingests data into a table, stop all versions except the latest one.
Option C is the correct answer because it outlines a comprehensive approach to investigate and fix the issue. Specifically, it includes checking for duplicate rows and using BigQuery Audit logs and Cloud Monitoring to identify job starts and pipeline versions. This helps pinpoint if any specific version or job is responsible for the issue. Additionally, managing multiple versions of pipelines ensures that only the intended version is active, addressing any errors that might have occurred during deployment. Unlike option B, which only addresses the symptom without necessarily stopping the problem from recurring, option C provides a detailed and systematic approach to both identify and fix the root cause.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are responsible for monitoring your organization’s data lake hosted on BigQuery. The data ingestion pipelines read data from Pub/Sub and then write this data into tables on BigQuery. After deploying a new version of these ingestion pipelines, you notice that the daily stored data in BigQuery has increased by 50%. It is important to note that the volume of data in Pub/Sub has remained constant, and only certain tables have experienced a doubling in the size of their daily partition data. You are tasked with investigating and resolving the cause of this unexpected increase in data storage. What steps should you take to address this issue?
A
B
C
D