AWS Certified Generative AI Developer - Professional

Ultimate access to all questions.

Explanation:

Explanation

Option B is correct because it addresses all the key requirements:

Low-latency, real-time optimized model - This matches the requirement for real-time inference needs
Amazon Bedrock - Provides access to foundation models with managed infrastructure
Provisioned throughput - Ensures consistent performance and availability
Automatic scaling policies - Handles variable traffic patterns efficiently

Why other options are incorrect:

Option A: Focuses on batch processing rather than real-time inference, and uses a large complex reasoning model which may not be optimized for low-latency requirements.
Option C: While SageMaker real-time endpoints with GPU instances can provide low latency, this approach requires more infrastructure management compared to Amazon Bedrock's fully managed service. It also doesn't mention provisioned throughput or automatic scaling policies.
Option D: Serverless endpoints are optimized for intermittent traffic patterns and may not provide the consistent low-latency performance required for real-time applications. Batch processing optimization contradicts real-time requirements.

Key considerations for real-time GenAI applications:

Choose models specifically optimized for low-latency inference
Use managed services like Amazon Bedrock for simplified infrastructure management
Implement provisioned throughput for predictable performance
Configure auto-scaling to handle traffic variations efficiently

Explanation:

Option B is correct because it addresses all the key requirements:

Low-latency, real-time optimized model - This matches the requirement for real-time inference needs
Amazon Bedrock - Provides access to foundation models with managed infrastructure
Provisioned throughput - Ensures consistent performance and availability
Automatic scaling policies - Handles variable traffic patterns efficiently

Why other options are incorrect:

Option A: Focuses on batch processing rather than real-time inference, and uses a large complex reasoning model which may not be optimized for low-latency requirements.
Option C: While SageMaker real-time endpoints with GPU instances can provide low latency, this approach requires more infrastructure management compared to Amazon Bedrock's fully managed service. It also doesn't mention provisioned throughput or automatic scaling policies.
Option D: Serverless endpoints are optimized for intermittent traffic patterns and may not provide the consistent low-latency performance required for real-time applications. Batch processing optimization contradicts real-time requirements.

Key considerations for real-time GenAI applications:

Choose models specifically optimized for low-latency inference
Use managed services like Amazon Bedrock for simplified infrastructure management
Implement provisioned throughput for predictable performance
Configure auto-scaling to handle traffic variations efficiently

No comments yet.

Real Exam

Community

DDucse

Last updated: April 27, 2026 at 14:02

Deploy a large, complex reasoning model on Amazon Bedrock. Purchase provisioned throughput and optimize for batch processing.

0.0%

Deploy a low-latency, real-time optimized model on Amazon Bedrock. Purchase provisioned throughput and set up automatic scaling policies.