Ultimate access to all questions.
A company is developing ML models as microservices on AWS, with each model loading 1 GB of data from S3 into memory at startup. The system handles irregular user request patterns, ranging from inactivity to thousands of concurrent requests. Which architecture is recommended for this scalable, asynchronous API-based ML model solution?