
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A solutions architect manages an analytics application. The application stores large amounts of semistructured data in an Amazon S3 bucket. The solutions architect wants to use parallel data processing to process the data more quickly. The solutions architect also wants to use information that is stored in an Amazon Redshift database to enrich the data.
Which solution will meet these requirements?
A
Use Amazon Athena to process the S3 data. Use AWS Glue with the Amazon Redshift data to enrich the S3 data.
B
Use Amazon EMR to process the S3 data. Use Amazon EMR with the Amazon Redshift data to enrich the S3 data.
C
Use Amazon EMR to process the S3 data. Use Amazon Kinesis Data Streams to move the S3 data into Amazon Redshift so that the data can be enriched.
D
Use AWS Glue to process the S3 data. Use AWS Lake Formation with the Amazon Redshift data to enrich the S3 data.
Explanation:
Correct Answer: B
Amazon EMR (Elastic MapReduce) is the most appropriate solution for this scenario because:
Parallel Data Processing: Amazon EMR is specifically designed for parallel processing of large datasets using frameworks like Apache Spark, Hadoop, and Hive. It can efficiently process large amounts of semistructured data stored in S3.
Redshift Integration: Amazon EMR can directly connect to Amazon Redshift using the Redshift connector for Spark, allowing it to join S3 data with Redshift data for enrichment.
Single Tool for Both Tasks: Option B uses Amazon EMR for both processing the S3 data and enriching it with Redshift data, providing a unified solution.
Why other options are incorrect:
A: Amazon Athena is good for querying S3 data using SQL, but it's not designed for complex parallel data processing pipelines. AWS Glue is an ETL service, but the combination doesn't provide the same level of parallel processing capabilities as EMR.
C: Amazon Kinesis Data Streams is for real-time streaming data ingestion, not for batch processing of S3 data. Moving data into Redshift just to enrich it is inefficient and doesn't leverage parallel processing capabilities.
D: AWS Lake Formation is for data governance and security, not for data processing or enrichment. AWS Glue can process data but doesn't offer the same parallel processing capabilities as EMR for large-scale analytics.
Key AWS Services:
This solution enables efficient parallel processing of S3 data while leveraging Redshift data for enrichment in a single, scalable environment.