
Ultimate access to all questions.
You are developing a new application that generates approximately 150 GB of JSON data daily by year's end. Your goals include decoupling producers from consumers, storing raw data efficiently in terms of space and cost, enabling near real-time SQL queries, and maintaining at least 2 years of historical data for SQL querying. Which pipeline best meets these requirements?
A
Develop an application with an API. Create a tool to poll the API and save data to Cloud Storage as gzipped JSON files.
B
Develop an application that writes data to a Cloud SQL database. Set up periodic exports from the database to Cloud Storage and then load into BigQuery.
C
Develop an application that sends events to Cloud Pub/Sub, and use Spark jobs on Cloud Dataproc to convert JSON data to Avro format, storing it on HDFS on Persistent Disk.
D
Develop an application that sends events to Cloud Pub/Sub, and implement a Cloud Dataflow pipeline to transform JSON event payloads into Avro, saving the data to Cloud Storage and BigQuery.