
Answer-first summary for fast verification
Answer: Collect the data from Amazon Kinesis Data Streams. Use Amazon Kinesis Data Firehose to transmit the data to an Amazon S3 data lake. Load the data in Amazon Redshift for analysis.
## Explanation **Correct Answer: D** **Why Option D is correct:** 1. **Amazon Kinesis Data Streams** is designed for real-time streaming data ingestion at scale, making it ideal for handling 30 TB of clickstream data daily from 300+ global websites. 2. **Amazon Kinesis Data Firehose** is specifically built for loading streaming data into AWS data stores like Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service. It can automatically scale to handle the data volume and provides near-real-time data delivery. 3. **Amazon Redshift** is a fully managed, petabyte-scale data warehouse service that's optimized for analytics workloads, making it perfect for analyzing clickstream data. 4. This architecture provides a **real-time, scalable, and fully managed** solution that can handle the high volume of clickstream data efficiently. **Why other options are incorrect:** **Option A:** AWS Data Pipeline is more suitable for batch-oriented workflows and periodic data processing, not real-time clickstream data. Amazon EMR is good for big data processing but adds unnecessary complexity for this use case. **Option B:** Using EC2 instances with Auto Scaling requires managing infrastructure, which is less efficient than serverless options. This approach also doesn't handle real-time data streaming effectively. **Option C:** Amazon CloudFront is a content delivery network (CDN) for caching static content, not for collecting clickstream data. Using Lambda for processing 30 TB daily would be inefficient and expensive compared to purpose-built streaming services. **Key AWS Services Used:** - **Amazon Kinesis Data Streams**: Real-time data streaming - **Amazon Kinesis Data Firehose**: Automatic loading of streaming data - **Amazon S3**: Data lake storage - **Amazon Redshift**: Analytics and data warehousing
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A company hosts more than 300 global websites and applications. The company requires a platform to analyze more than 30 TB of clickstream data each day. What should a solutions architect do to transmit and process the clickstream data?
A
Design an AWS Data Pipeline to archive the data to an Amazon S3 bucket and run an Amazon EMR cluster with the data to generate analytics.
B
Create an Auto Scaling group of Amazon EC2 instances to process the data and send it to an Amazon S3 data lake for Amazon Redshift to use for analysis.
C
Cache the data to Amazon CloudFront. Store the data in an Amazon S3 bucket. When an object is added to the S3 bucket, run an AWS Lambda function to process the data for analysis.
D
Collect the data from Amazon Kinesis Data Streams. Use Amazon Kinesis Data Firehose to transmit the data to an Amazon S3 data lake. Load the data in Amazon Redshift for analysis.