Ultimate access to all questions.
You are developing a new application that requires scalable data collection mechanisms. The application generates continuous data throughout the day, and projections indicate that by the end of the year, the system will produce roughly 150 GB of JSON data daily. The following requirements must be met:
✑ Decouple the data producer from the data consumer to ensure independent scaling and fault tolerance.
✑ Implement a storage system for the raw ingested data that is both space- and cost-efficient, with the capability to store this data indefinitely.
✑ Enable near real-time querying using SQL to analyze the incoming data quickly.
✑ Retain at least 2 years of historical data, allowing the stored data to be queried with SQL for insights spanning over this period.
Which pipeline should you utilize to fulfill these criteria?