
Ultimate access to all questions.
A company is developing ML models as microservices on AWS, with each model loading 1 GB of data from S3 into memory at startup. The system handles irregular user request patterns, ranging from inactivity to thousands of concurrent requests. Which architecture is recommended for this scalable, asynchronous API-based ML model solution?
A
Use an NLB to direct API requests to AWS Lambda functions.
B
Utilize an ALB with Amazon ECS services reading from an SQS queue, scaled using App Mesh.
C
Employ SQS for request queuing, invoking Lambda functions, and scaling with Auto Scaling based on queue size.
D
Direct API requests into an SQS queue, with ECS services processing the queue, and enable ECS Auto Scaling based on queue length.