AWS Certified Solutions Architect - Associate

Ultimate access to all questions.

Explanation:

Explanation

Correct Answer: B

Amazon EMR (Elastic MapReduce) is the most appropriate solution for this scenario because:

Parallel Data Processing: Amazon EMR is specifically designed for parallel processing of large datasets using frameworks like Apache Spark, Hadoop, and Hive. It can efficiently process large amounts of semistructured data stored in S3.
Redshift Integration: Amazon EMR can directly connect to Amazon Redshift using the Redshift connector for Spark, allowing it to join S3 data with Redshift data for enrichment.
Single Tool for Both Tasks: Option B uses Amazon EMR for both processing the S3 data and enriching it with Redshift data, providing a unified solution.

Why other options are incorrect:

A: Amazon Athena is good for querying S3 data using SQL, but it's not designed for complex parallel data processing pipelines. AWS Glue is an ETL service, but the combination doesn't provide the same level of parallel processing capabilities as EMR.
C: Amazon Kinesis Data Streams is for real-time streaming data ingestion, not for batch processing of S3 data. Moving data into Redshift just to enrich it is inefficient and doesn't leverage parallel processing capabilities.
D: AWS Lake Formation is for data governance and security, not for data processing or enrichment. AWS Glue can process data but doesn't offer the same parallel processing capabilities as EMR for large-scale analytics.

Key AWS Services:

This solution enables efficient parallel processing of S3 data while leveraging Redshift data for enrichment in a single, scalable environment.

Explanation:

Correct Answer: B

Amazon EMR (Elastic MapReduce) is the most appropriate solution for this scenario because:

Parallel Data Processing: Amazon EMR is specifically designed for parallel processing of large datasets using frameworks like Apache Spark, Hadoop, and Hive. It can efficiently process large amounts of semistructured data stored in S3.
Redshift Integration: Amazon EMR can directly connect to Amazon Redshift using the Redshift connector for Spark, allowing it to join S3 data with Redshift data for enrichment.
Single Tool for Both Tasks: Option B uses Amazon EMR for both processing the S3 data and enriching it with Redshift data, providing a unified solution.

Why other options are incorrect:

A: Amazon Athena is good for querying S3 data using SQL, but it's not designed for complex parallel data processing pipelines. AWS Glue is an ETL service, but the combination doesn't provide the same level of parallel processing capabilities as EMR.
C: Amazon Kinesis Data Streams is for real-time streaming data ingestion, not for batch processing of S3 data. Moving data into Redshift just to enrich it is inefficient and doesn't leverage parallel processing capabilities.
D: AWS Lake Formation is for data governance and security, not for data processing or enrichment. AWS Glue can process data but doesn't offer the same parallel processing capabilities as EMR for large-scale analytics.

Key AWS Services:

This solution enables efficient parallel processing of S3 data while leveraging Redshift data for enrichment in a single, scalable environment.

No comments yet.

Other

Community

UAnonymous

Last updated: February 23, 2026 at 11:39

Use Amazon Athena to process the S3 data. Use AWS Glue with the Amazon Redshift data to enrich the S3 data.

0.0%

Use Amazon EMR to process the S3 data. Use Amazon EMR with the Amazon Redshift data to enrich the S3 data.