
Answer-first summary for fast verification
Answer: Launch a cluster in each region to preprocess and compress the raw data, then move the data into a region bucket and use a Cloud Dataproc cluster to finish the job
The most cost-effective solution involves minimizing data transfer costs and efficiently processing the data. Option D suggests launching a Cloud Dataproc cluster in each region to preprocess and compress the raw data first. This preprocessing step reduces the data volume, making data transfer to a centralized location more efficient and less costly. Once the data is compressed, it can be moved into a regional bucket, and a Cloud Dataproc cluster can be used to finish the job. This approach leverages regional data processing to reduce inter-region transfer costs and ensures efficient analysis of the entire dataset.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
TerramEarth's 20 million vehicles, distributed globally, collect telemetry data which is stored in regional Google Cloud Storage (GCS) buckets based on their location (US, Europe, or Asia). The company currently stores and processes data to provide insights, but the CTO now requires an analysis to determine why vehicles are breaking down after 100K miles. Given that the data is already divided into regional buckets, what is the most cost-effective way to run this analysis job on all the raw telemetry data?
A
Move all the data into 1 zone, then launch a Cloud Dataproc cluster to run the job
B
Move all the data into 1 region, then launch a Google Cloud Dataproc cluster to run the job
C
Launch a cluster in each region to preprocess and compress the raw data, then move the data into a multi-region bucket and use a Dataproc cluster to finish the job
D
Launch a cluster in each region to preprocess and compress the raw data, then move the data into a region bucket and use a Cloud Dataproc cluster to finish the job
No comments yet.