
Answer-first summary for fast verification
Answer: Create an application that publishes events to Cloud Pub/Sub, and create a Cloud Dataflow pipeline that transforms the JSON event payloads to Avro, writing the data to Cloud Storage and BigQuery.
Option D is the correct answer because it meets all the specified requirements: 1. Decoupling producer from consumer: Cloud Pub/Sub provides a decoupled messaging system where the producer publishes events, and consumers (such as Dataflow) can subscribe to these events, ensuring flexibility and scalability. 2. Space and cost-efficient storage: Storing data in Avro format is more space-efficient than JSON, and using Cloud Storage provides a cost-effective and scalable solution for storing the raw ingested data indefinitely. 3. Near real-time SQL query: Dataflow can be used to stream the data into BigQuery, where you can perform near real-time SQL queries. 4. Maintain at least 2 years of historical data: BigQuery is well-suited for storing and querying large amounts of historical data, making it an excellent choice for maintaining and querying 2 years of historical data.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are developing a new application that requires scalable data collection mechanisms. The application generates continuous data throughout the day, and projections indicate that by the end of the year, the system will produce roughly 150 GB of JSON data daily. The following requirements must be met:
✑ Decouple the data producer from the data consumer to ensure independent scaling and fault tolerance.
✑ Implement a storage system for the raw ingested data that is both space- and cost-efficient, with the capability to store this data indefinitely.
✑ Enable near real-time querying using SQL to analyze the incoming data quickly.
✑ Retain at least 2 years of historical data, allowing the stored data to be queried with SQL for insights spanning over this period.
Which pipeline should you utilize to fulfill these criteria?
A
Create an application that provides an API. Write a tool to poll the API and write data to Cloud Storage as gzipped JSON files.
B
Create an application that writes to a Cloud SQL database to store the data. Set up periodic exports of the database to write to Cloud Storage and load into BigQuery.
C
Create an application that publishes events to Cloud Pub/Sub, and create Spark jobs on Cloud Dataproc to convert the JSON data to Avro format, stored on HDFS on Persistent Disk.
D
Create an application that publishes events to Cloud Pub/Sub, and create a Cloud Dataflow pipeline that transforms the JSON event payloads to Avro, writing the data to Cloud Storage and BigQuery.