Google Professional Data Engineer

Get started today

Ultimate access to all questions.

Explanation:

The correct answer is Option C. This approach systematically investigates the data increase by checking for duplicates, auditing job logs, monitoring job start times, and managing pipeline versions effectively. It's crucial to identify if duplicate rows are causing the data size increase and to understand the changes that led to this situation by examining job IDs and pipeline versions. Stopping older pipeline versions prevents further data duplication. Other options either focus on deduplication without addressing the root cause, suggest immediate rollback without investigation, or overlook the importance of auditing and monitoring to identify the issue's origin.

Explanation:

Comments (0)

No comments yet.

You are overseeing your company's data lake on BigQuery, with data ingestion pipelines pulling data from Pub/Sub into BigQuery tables. After a new pipeline version was deployed, there was a 50% increase in daily stored data, with some tables' daily partition sizes doubling, despite no change in Pub/Sub data volumes. What steps should you take to investigate and resolve this sudden data increase?

Real Exam

Last updated: July 5, 2026 at 14:03

Check for code errors in the deployed pipelines. 2. Check for multiple writing to pipeline BigQuery sink. 3. Check for errors in Cloud Logging during the day of the release of the new pipelines. 4. If no errors, restore the BigQuery tables to their content before the last release by using time travel.

16.7%

Roll back the last deployment. 2. Restore the BigQuery tables to their content before the last release by using time travel. 3. Restart the Dataflow jobs and replay the messages by seeking the subscription to the timestamp of the release.

9.5%

Check for duplicate rows in the BigQuery tables that have the daily partition data size doubled. 2. Check the BigQuery Audit logs to find job IDs. 3. Use Cloud Monitoring to determine when the identified Dataflow jobs started and the pipeline code version. 4. When more than one pipeline ingests data into a table, stop all versions except the latest one.

Check for duplicate rows in the BigQuery tables that have the daily partition data size doubled. 2. Schedule daily SQL jobs to deduplicate the affected tables. 3. Share the deduplication script with the other operational teams to reuse if this occurs to other tables.

4.8%