
Explanation:
Option A is correct because it provides the LEAST operational overhead while meeting all requirements:
Streaming responses: Lambda response streaming allows for real-time streaming of responses from Amazon Bedrock, reducing latency by sending chunks of data as they become available.
Token limits enforcement: Token limits can be enforced within the Lambda function by monitoring the token count in the request/response and truncating or rejecting requests that exceed limits.
Retry logic: Lambda and API Gateway timeout configurations can handle model timeouts and partial responses. Lambda can implement retry logic for Bedrock API calls, and API Gateway can handle client-side retries.
Least operational overhead: This solution leverages managed services (API Gateway, Lambda, Bedrock) without requiring infrastructure management. The other options have higher operational overhead:
Key AWS Services:
This architecture provides a fully serverless solution with minimal operational management while meeting all technical requirements.
Ultimate access to all questions.
No comments yet.
A company is designing an API for a generative AI (GenAI) application that uses a foundation model (FM) that is hosted on a managed model service. The API must stream responses to reduce latency, enforce token limits to manage compute resource usage, and implement retry logic to handle model timeouts and partial responses.
Which solution will meet these requirements with the LEAST operational overhead?
A
Integrate an Amazon API Gateway HTTP API with an AWS Lambda function to invoke Amazon Bedrock. Use Lambda response streaming to stream responses. Enforce token limits within the Lambda function. Implement retry logic for model timeouts by using Lambda and API Gateway timeout configurations.
B
Connect an Amazon API Gateway HTTP API directly to Amazon Bedrock. Simulate streaming by using client-side polling. Enforce token limits on the frontend. Configure retry behavior by using API Gateway integration settings.
C
Connect an Amazon API Gateway WebSocket API to an Amazon ECS service that hosts a containerized inference server. Stream responses by using the WebSocket protocol. Enforce token limits within Amazon ECS. Handle model timeouts by using ECS task lifecycle hooks and restart policies.
D
Integrate an Amazon API Gateway REST API with an AWS Lambda function that invokes Amazon Bedrock. Use Lambda response streaming to stream responses. Enforce token limits within the Lambda function. Implement retry logic by using Lambda and API Gateway timeout configurations.