
Answer-first summary for fast verification
Answer: 1. Create a dual-region Cloud Storage bucket in the us-central1 and us-south1 regions. 2. Enable turbo replication. 3. Run the Dataproc cluster in a zone in the us-central1 region, reading from the bucket in the same region. 4. In case of a regional failure, redeploy the Dataproc clusters to the us-south1 region and read from the same bucket.
Option D is the correct answer because it addresses all the requirements of the question effectively. The dual-region Cloud Storage bucket with turbo replication ensures that data is replicated quickly between the us-central1 and us-south1 regions, achieving an RPO of 15 minutes or less. Running the Dataproc cluster in the same region as the data reduces latency when reading the data. In case of a regional failure, the solution involves redeploying the Dataproc cluster to the alternate region (us-south1) and continues reading from the same bucket, ensuring minimal disruption and data accessibility.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Your current setup involves an upstream process that writes data to Google Cloud Storage. This data is subsequently processed by an Apache Spark job running on Google Cloud Dataproc. The Spark jobs execute in the us-central1 region, although the data might reside in any region across the United States. To prepare for a potential catastrophic failure in a single region, you need an effective recovery process ensuring a Recovery Point Objective (RPO) of no more than 15 minutes, implying you can afford a data loss of up to 15 minutes. Furthermore, you aim to maintain minimal latency when accessing the data. What strategy should you employ?
A
B
C
D