
Answer-first summary for fast verification
Answer: Develop an application that sends events to Cloud Pub/Sub, and implement a Cloud Dataflow pipeline to transform JSON event payloads into Avro, saving the data to Cloud Storage and BigQuery.
**Correct Answer: D** This option is correct because: 1. **Decoupling producer from consumer**: Cloud Pub/Sub enables the separation of event producers (your application) from consumers (the Dataflow pipeline), ensuring system scalability and flexibility. 2. **Efficient storage**: Cloud Storage offers a cost-effective solution for storing large volumes of data indefinitely, meeting the requirement for raw data storage. 3. **Near real-time SQL queries**: Transforming JSON data to Avro with Cloud Dataflow and writing to BigQuery allows for near real-time SQL queries. 4. **Historical data retention**: Storing data in BigQuery facilitates easy maintenance and querying of at least 2 years of historical data. **Why other options are incorrect**: - **A**: Polling an API and saving gzipped JSON files to Cloud Storage lacks real-time processing and direct support for near real-time SQL queries, plus it doesn't address historical data retention. - **B**: Using Cloud SQL and periodic exports may not scale well for continuous large data volumes and lacks near real-time processing capabilities. - **C**: While feasible, using Spark jobs on Cloud Dataproc adds unnecessary complexity for the given requirements. Cloud Dataflow provides a simpler, more suitable solution for real-time processing and BigQuery integration.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are developing a new application that generates approximately 150 GB of JSON data daily by year's end. Your goals include decoupling producers from consumers, storing raw data efficiently in terms of space and cost, enabling near real-time SQL queries, and maintaining at least 2 years of historical data for SQL querying. Which pipeline best meets these requirements?
A
Develop an application with an API. Create a tool to poll the API and save data to Cloud Storage as gzipped JSON files.
B
Develop an application that writes data to a Cloud SQL database. Set up periodic exports from the database to Cloud Storage and then load into BigQuery.
C
Develop an application that sends events to Cloud Pub/Sub, and use Spark jobs on Cloud Dataproc to convert JSON data to Avro format, storing it on HDFS on Persistent Disk.
D
Develop an application that sends events to Cloud Pub/Sub, and implement a Cloud Dataflow pipeline to transform JSON event payloads into Avro, saving the data to Cloud Storage and BigQuery.