
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A company hosts more than 300 global websites and applications. The company requires a platform to analyze more than 30 TB of clickstream data each day. What should a solutions architect do to transmit and process the clickstream data?
A
Design an AWS Data Pipeline to archive the data to an Amazon S3 bucket and run an Amazon EMR cluster with the data to generate analytics.
B
Create an Auto Scaling group of Amazon EC2 instances to process the data and send it to an Amazon S3 data lake for Amazon Redshift to use for analysis.
C
Cache the data to Amazon CloudFront. Store the data in an Amazon S3 bucket. When an object is added to the S3 bucket, run an AWS Lambda function to process the data for analysis.
D
Collect the data from Amazon Kinesis Data Streams. Use Amazon Kinesis Data Firehose to transmit the data to an Amazon S3 data lake. Load the data in Amazon Redshift for analysis.
Explanation:
Correct Answer: D
Why Option D is correct:
Amazon Kinesis Data Streams is designed for real-time streaming data ingestion at scale, making it ideal for handling 30 TB of clickstream data daily from 300+ global websites.
Amazon Kinesis Data Firehose is specifically built for loading streaming data into AWS data stores like Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service. It can automatically scale to handle the data volume and provides near-real-time data delivery.
Amazon Redshift is a fully managed, petabyte-scale data warehouse service that's optimized for analytics workloads, making it perfect for analyzing clickstream data.
This architecture provides a real-time, scalable, and fully managed solution that can handle the high volume of clickstream data efficiently.
Why other options are incorrect:
Option A: AWS Data Pipeline is more suitable for batch-oriented workflows and periodic data processing, not real-time clickstream data. Amazon EMR is good for big data processing but adds unnecessary complexity for this use case.
Option B: Using EC2 instances with Auto Scaling requires managing infrastructure, which is less efficient than serverless options. This approach also doesn't handle real-time data streaming effectively.
Option C: Amazon CloudFront is a content delivery network (CDN) for caching static content, not for collecting clickstream data. Using Lambda for processing 30 TB daily would be inefficient and expensive compared to purpose-built streaming services.
Key AWS Services Used: