AWS Certified Generative AI Developer - Professional

Get started today

Ultimate access to all questions.

Explanation:

Explanation

Option A is correct because it provides the LEAST operational overhead while meeting all requirements:

Streaming responses: Lambda response streaming allows for real-time streaming of responses from Amazon Bedrock, reducing latency by sending chunks of data as they become available.
Token limits enforcement: Token limits can be enforced within the Lambda function by monitoring the token count in the request/response and truncating or rejecting requests that exceed limits.
Retry logic: Lambda and API Gateway timeout configurations can handle model timeouts and partial responses. Lambda can implement retry logic for Bedrock API calls, and API Gateway can handle client-side retries.
Least operational overhead: This solution leverages managed services (API Gateway, Lambda, Bedrock) without requiring infrastructure management. The other options have higher operational overhead:
- Option B: Client-side polling for streaming is inefficient and increases client complexity. Frontend token enforcement is less secure and reliable.
- Option C: ECS container management has significant operational overhead for scaling, monitoring, and maintenance.
- Option D: REST API with Lambda is similar to A, but HTTP API (in option A) is generally more cost-effective and has better performance for streaming scenarios.

Key AWS Services:

Amazon API Gateway HTTP API: Lightweight, cost-effective API gateway
AWS Lambda: Serverless compute with response streaming capability
Amazon Bedrock: Managed foundation model service

This architecture provides a fully serverless solution with minimal operational management while meeting all technical requirements.

Explanation:

Explanation

Option A is correct because it provides the LEAST operational overhead while meeting all requirements:

Streaming responses: Lambda response streaming allows for real-time streaming of responses from Amazon Bedrock, reducing latency by sending chunks of data as they become available.
Token limits enforcement: Token limits can be enforced within the Lambda function by monitoring the token count in the request/response and truncating or rejecting requests that exceed limits.
Retry logic: Lambda and API Gateway timeout configurations can handle model timeouts and partial responses. Lambda can implement retry logic for Bedrock API calls, and API Gateway can handle client-side retries.
Least operational overhead: This solution leverages managed services (API Gateway, Lambda, Bedrock) without requiring infrastructure management. The other options have higher operational overhead:
- Option B: Client-side polling for streaming is inefficient and increases client complexity. Frontend token enforcement is less secure and reliable.
- Option C: ECS container management has significant operational overhead for scaling, monitoring, and maintenance.
- Option D: REST API with Lambda is similar to A, but HTTP API (in option A) is generally more cost-effective and has better performance for streaming scenarios.

Key AWS Services:

Amazon API Gateway HTTP API: Lightweight, cost-effective API gateway
AWS Lambda: Serverless compute with response streaming capability
Amazon Bedrock: Managed foundation model service

This architecture provides a fully serverless solution with minimal operational management while meeting all technical requirements.

Comments (0)

No comments yet.

A company is designing an API for a generative AI (GenAI) application that uses a foundation model (FM) that is hosted on a managed model service. The API must stream responses to reduce latency, enforce token limits to manage compute resource usage, and implement retry logic to handle model timeouts and partial responses.

Which solution will meet these requirements with the LEAST operational overhead?

Real Exam

Community

DDucse

Last updated: March 23, 2026 at 10:35

Integrate an Amazon API Gateway HTTP API with an AWS Lambda function to invoke Amazon Bedrock. Use Lambda response streaming to stream responses. Enforce token limits within the Lambda function. Implement retry logic for model timeouts by using Lambda and API Gateway timeout configurations.

66.7%