
Google Professional Data Engineer
Get started today
Ultimate access to all questions.
You are responsible for monitoring your organization’s data lake hosted on BigQuery. The data ingestion pipelines read data from Pub/Sub and then write this data into tables on BigQuery. After deploying a new version of these ingestion pipelines, you notice that the daily stored data in BigQuery has increased by 50%. It is important to note that the volume of data in Pub/Sub has remained constant, and only certain tables have experienced a doubling in the size of their daily partition data. You are tasked with investigating and resolving the cause of this unexpected increase in data storage. What steps should you take to address this issue?
You are responsible for monitoring your organization’s data lake hosted on BigQuery. The data ingestion pipelines read data from Pub/Sub and then write this data into tables on BigQuery. After deploying a new version of these ingestion pipelines, you notice that the daily stored data in BigQuery has increased by 50%. It is important to note that the volume of data in Pub/Sub has remained constant, and only certain tables have experienced a doubling in the size of their daily partition data. You are tasked with investigating and resolving the cause of this unexpected increase in data storage. What steps should you take to address this issue?
Explanation:
Option C is the correct answer because it outlines a comprehensive approach to investigate and fix the issue. Specifically, it includes checking for duplicate rows and using BigQuery Audit logs and Cloud Monitoring to identify job starts and pipeline versions. This helps pinpoint if any specific version or job is responsible for the issue. Additionally, managing multiple versions of pipelines ensures that only the intended version is active, addressing any errors that might have occurred during deployment. Unlike option B, which only addresses the symptom without necessarily stopping the problem from recurring, option C provides a detailed and systematic approach to both identify and fix the root cause.