
Google Professional Data Engineer
Get started today
Ultimate access to all questions.
You are responsible for designing an Apache Beam processing pipeline that reads data from a Google Cloud Pub/Sub topic. The Pub/Sub topic has a message retention duration of one day. The pipeline writes the processed data to a Cloud Storage bucket. Your objective is to choose a bucket location and a processing strategy that can ensure data is not lost in the event of a regional outage, meeting a Recovery Point Objective (RPO) of 15 minutes. What actions should you take?
You are responsible for designing an Apache Beam processing pipeline that reads data from a Google Cloud Pub/Sub topic. The Pub/Sub topic has a message retention duration of one day. The pipeline writes the processed data to a Cloud Storage bucket. Your objective is to choose a bucket location and a processing strategy that can ensure data is not lost in the event of a regional outage, meeting a Recovery Point Objective (RPO) of 15 minutes. What actions should you take?
Explanation:
Option D is the correct answer. This option recommends using a dual-region Cloud Storage bucket with turbo replication enabled, which ensures data replication within a 15-minute RPO. This setup aligns with the requirement of preventing data loss in case of a regional outage with an RPO of 15 minutes. Additionally, monitoring Dataflow metrics with Cloud Monitoring will help identify outages, and seeking the subscription back in time by 60 minutes ensures recovery of acknowledged messages. Starting the Dataflow job in a secondary region further ensures continuity.