
Answer-first summary for fast verification
Answer: Publish data to Amazon Kinesis Data Streams, Use Kinesis Data Analytics to query the data.
## Explanation **Correct Answer: A** **Why Option A is correct:** 1. **Kinesis Data Streams** provides real-time data streaming with durability - data is stored for 24 hours by default (up to 7 days with extended retention), preventing data loss when EC2 instances reboot. 2. **Kinesis Data Analytics** allows SQL-based querying of streaming data in near-real time, meeting the requirement for near-real-time querying. 3. **Scalability**: Kinesis Data Streams automatically scales to handle varying data ingestion rates (1 MB/s in this case). 4. **Minimal data loss**: Kinesis ensures data durability and persistence, solving the problem of data loss during EC2 reboots. **Why other options are incorrect:** **Option B (Kinesis Data Firehose with Redshift):** - Kinesis Data Firehose is for batch delivery, not real-time streaming. It buffers data before delivering to destinations. - Redshift is a data warehouse optimized for analytical queries, not near-real-time querying. - This solution introduces latency and doesn't provide true near-real-time capabilities. **Option C (EC2 instance store + Firehose + S3 + Athena):** - EC2 instance store is ephemeral storage - data is lost when instances reboot, which is exactly the problem mentioned. - Firehose to S3 creates latency due to buffering. - Athena queries data in S3, which is not near-real-time. - This solution doesn't solve the data loss problem during reboots. **Option D (EBS + ElastiCache for Redis):** - While EBS persists data, this solution adds complexity and doesn't provide a scalable, managed streaming solution. - ElastiCache for Redis is an in-memory cache, not designed for durable data streaming or analytical queries. - This approach doesn't scale well and requires manual management. **Key AWS Services Understanding:** - **Kinesis Data Streams**: Real-time data streaming service with durability and scalability - **Kinesis Data Analytics**: Real-time analytics on streaming data using SQL - **Kinesis Data Firehose**: Batch delivery service for loading streaming data into destinations - **Near-real-time vs batch**: The requirement specifically asks for near-real-time querying, which Kinesis Data Analytics provides, while Firehose is batch-oriented. This solution addresses all requirements: scalability, minimal data loss, and near-real-time querying capabilities.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A company is using a fleet of Amazon EC2 instances to ingest data from on-premises data sources. The data is in JSON format and ingestion rates can be as high as 1 MB/s. When an EC2 instance is rebooted, the data in-flight is lost. The company's data science team wants to query ingested data in near-real time.
Which solution provides near-real-time data querying that is scalable with minimal data loss?
A
Publish data to Amazon Kinesis Data Streams, Use Kinesis Data Analytics to query the data.
B
Publish data to Amazon Kinesis Data Firehose with Amazon Redshift as the destination. Use Amazon Redshift to query the data.
C
Store ingested data in an EC2 instance store. Publish data to Amazon Kinesis Data Firehose with Amazon S3 as the destination. Use Amazon Athena to query the data.
D
Store ingested data in an Amazon Elastic Block Store (Amazon EBS) volume. Publish data to Amazon ElastiCache for Redis. Subscribe to the Redis channel to query the data.