
Explanation:
Option C is the correct solution because it resolves throttling while preserving performance and avoiding fixed costs during low-traffic periods. Amazon Bedrock supports on-demand inference with usage-based pricing, making it well suited for applications with time-zone-dependent traffic spikes.
Throttling during peak hours typically occurs when inference requests exceed available regional capacity. Cross-Region inference allows Amazon Bedrock to automatically distribute requests across multiple AWS Regions, reducing contention and preventing throttling without requiring reserved or provisioned capacity. This approach ensures continuous operation while maintaining low latency for users in different geographic locations.
Invocation logging and native metrics such as InvocationThrottles, InputTokenCount, and OutputTokenCount provide visibility into usage patterns and capacity constraints. Monitoring these metrics enables teams to validate that traffic distribution is working as intended and that performance remains consistent during peak periods.
Option A introduces fixed hourly costs by relying on provisioned throughput, which directly violates the requirement to avoid unnecessary spend during low-traffic periods. Option B introduces regional failover complexity and reactive behavior instead of proactive load distribution. Option D does not address the root cause of throttling, as distributing traffic across model versions within the same Region does not increase available capacity.
Therefore, Option C best aligns with AWS Generative AI best practices for scalable, cost-efficient, global serverless applications.
Ultimate access to all questions.
No comments yet.
A company is building a global generative AI application using Amazon Bedrock. The application experiences high traffic during specific hours in different time zones, leading to throttling during peak usage. The company wants to ensure continuous operation without throttling while avoiding unnecessary spend during low-traffic periods. Which solution should the company implement?
A
Use provisioned throughput for the Amazon Bedrock model. Monitor the ProvisionedThroughputUtilization metric and adjust capacity based on usage patterns.
B
Implement a regional failover mechanism. Route traffic to a secondary AWS Region when throttling occurs in the primary Region.
C
Configure cross-Region inference in Amazon Bedrock. Monitor InvocationThrottles, InputTokenCount, and OutputTokenCount metrics.
D
Enable invocation logging in Amazon Bedrock. Monitor InvocationLatency, InvocationClientErrors, and InvocationServerErrors metrics. Distribute traffic across multiple versions of the same model.