
Answer-first summary for fast verification
Answer: Store the common data encoded as Avro in Google Cloud Storage.
The correct answer is C. While Dataproc has connectors for Cloud Storage, Bigtable, and BigQuery, using a BigQuery connector requires more effort compared to Cloud Storage and Bigtable. The optimal approach when migrating Apache Hadoop and Spark jobs to Dataproc is utilizing Cloud Storage over HDFS. Data in formats like ORC, Parquet, Avro, etc., can be accessed by different clusters or jobs, necessitating data persistence upon cluster termination. Simply replace HDFS:// with gs://. BigQuery can also read data in Avro format. Therefore, Cloud Storage is the best solution for storing data common to both workloads.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Flowlogistic, a leading logistics and supply chain provider, is facing challenges with their current infrastructure's inability to support their proprietary real-time inventory-tracking system and the analysis of their orders and shipment logs. They aim to use Google BigQuery for analysis but still have Apache Hadoop and Spark workloads that cannot be moved to BigQuery. The question is, how should Flowlogistic store the data common to both workloads?
A
Store the common data in BigQuery as partitioned tables.
B
Store the common data in BigQuery and expose authorized views.
C
Store the common data encoded as Avro in Google Cloud Storage.
D
Store the common data in the HDFS storage for a Google Cloud Dataproc cluster.