AWS Certified Solutions Architect - Associate

Get started today

Ultimate access to all questions.

Explanation:

Detailed Explanation

Option D is the correct answer because it best addresses all the requirements:

Key Requirements Analysis:

1 GB model data loading at startup - Lambda functions have a 10GB maximum package size but cold starts would be problematic with 1GB data loading each time. ECS containers can maintain the model in memory across multiple requests.
Asynchronous API - SQS is perfect for asynchronous request handling where users can send requests and specify where results should be sent.
Irregular usage patterns - Some models unused for days/weeks, others receive thousands of requests at a time.
Hundreds of users - Need scalable solution.

Why Option D is Best:

ECS with SQS: ECS containers can load the 1GB model once and keep it in memory, serving multiple requests without reloading.
Auto Scaling based on queue size: This handles the irregular usage patterns - scaling up when thousands of requests arrive, scaling down when idle.
Cost-effective: ECS can scale to zero when not in use, avoiding costs during idle periods.
Reliable: SQS ensures no requests are lost during scaling events.

Why Other Options Are Incorrect:

Option A (Lambda + NLB):

Lambda cold starts with 1GB data loading would be slow and expensive
Lambda execution time limits (15 minutes) could be problematic for ML inference
NLB is for TCP/UDP traffic, not ideal for this use case

Option B (ECS + ALB + App Mesh):

ALB is for synchronous requests, not ideal for asynchronous patterns
App Mesh is for service mesh, not primarily for scaling based on queue size
More complex than needed

Option C (Lambda + SQS):

Same Lambda issues as Option A - cold starts with 1GB data loading
Lambda doesn't have "vCPU scaling" based on queue size - it scales by concurrency
Lambda functions would need to reload 1GB model data for each cold start

Additional Considerations:

ECS Fargate would be ideal to avoid managing EC2 instances
Could use S3 for model storage with lifecycle policies
Consider using Amazon SageMaker for more complete ML lifecycle management
Results delivery could use SNS, S3, or callback URLs as specified by users

Explanation:

Detailed Explanation

Option D is the correct answer because it best addresses all the requirements:

Key Requirements Analysis:

1 GB model data loading at startup - Lambda functions have a 10GB maximum package size but cold starts would be problematic with 1GB data loading each time. ECS containers can maintain the model in memory across multiple requests.
Asynchronous API - SQS is perfect for asynchronous request handling where users can send requests and specify where results should be sent.
Irregular usage patterns - Some models unused for days/weeks, others receive thousands of requests at a time.
Hundreds of users - Need scalable solution.

Why Option D is Best:

ECS with SQS: ECS containers can load the 1GB model once and keep it in memory, serving multiple requests without reloading.
Auto Scaling based on queue size: This handles the irregular usage patterns - scaling up when thousands of requests arrive, scaling down when idle.
Cost-effective: ECS can scale to zero when not in use, avoiding costs during idle periods.
Reliable: SQS ensures no requests are lost during scaling events.

Why Other Options Are Incorrect:

Option A (Lambda + NLB):

Lambda cold starts with 1GB data loading would be slow and expensive
Lambda execution time limits (15 minutes) could be problematic for ML inference
NLB is for TCP/UDP traffic, not ideal for this use case

Option B (ECS + ALB + App Mesh):

ALB is for synchronous requests, not ideal for asynchronous patterns
App Mesh is for service mesh, not primarily for scaling based on queue size
More complex than needed

Option C (Lambda + SQS):

Same Lambda issues as Option A - cold starts with 1GB data loading
Lambda doesn't have "vCPU scaling" based on queue size - it scales by concurrency
Lambda functions would need to reload 1GB model data for each cold start

Additional Considerations:

ECS Fargate would be ideal to avoid managing EC2 instances
Could use S3 for model storage with lifecycle policies
Consider using Amazon SageMaker for more complete ML lifecycle management
Results delivery could use SNS, S3, or callback URLs as specified by users

Comments (0)

No comments yet.

A company is developing a new machine learning (ML) model solution on AWS. The models are developed as independent microservices that fetch approximately 1 GB of model data from Amazon S3 at startup and load the data into memory. Users access the models through an asynchronous API. Users can send a request or a batch of requests and specify where the results should be sent.

The company provides models to hundreds of users. The usage patterns for the models are irregular. Some models could be unused for days or weeks. Other models could receive batches of thousands of requests at a time.

Which design should a solutions architect recommend to meet these requirements?

Other

Community

UAnonymous

Last updated: May 30, 2026 at 14:02

Direct the requests from the API to a Network Load Balancer (NLB). Deploy the models as AWS Lambda functions that are invoked by the NLB.

0.0%

Direct the requests from the API to an Application Load Balancer (ALB). Deploy the models as Amazon Elastic Container Service (Amazon ECS) services that read from an Amazon Simple Queue Service (Amazon SQS) queue. Use AWS App Mesh to scale the instances of the ECS cluster based on the SQS queue size.