
Explanation:
Explanation
Option B is correct because the application is currently invoking the base foundation model identifier, which routes traffic to the on-demand capacity pool rather than the company's purchased provisioned throughput. In Amazon Bedrock, provisioned throughput is attached to a specific provisioned resource created through the provisioned throughput APIs. To consume that reserved capacity, inference requests must target the provisioned resource identifier that represents the purchased throughput, not the generic model identifier used for on-demand inference.
The code snippet uses modelId="anthropic.claude-v2". This value selects the on-demand endpoint for that model. As a result, requests are subject to on-demand quotas and throttling behavior, while the provisioned throughput remains idle. This directly explains the CloudWatch observation: provisioned capacity metrics show unused capacity because no traffic is being directed to the provisioned resource, and the on-demand path is throttling because it is exceeding the applicable on-demand limits during peak volume.
Replacing the modelId value with the provisioned throughput ARN returned by the CreateProvisionedModelThroughput workflow ensures the runtime invocation is routed to the reserved capacity. Once traffic is directed correctly, the purchased model units provide the consistent throughput required for predictable performance during business hours, which is exactly why provisioned throughput is used.
Option A could increase capacity, but it does not fix the core issue that the application is not using the provisioned resource at all. Option C can reduce the impact of throttling temporarily, but it adds latency and does not guarantee consistent throughput; it also still wastes the provisioned capacity. Option D changes the response delivery mechanism, but throttling is a capacity routing and quota issue, not a streaming API issue.
Ultimate access to all questions.
No comments yet.
A company has purchased provisioned throughput for the Anthropic Claude v2 model in Amazon Bedrock to ensure consistent performance during business hours. The application uses the InvokeModel API with the following code snippet:
response = bedrock_client.invoke_model(
modelId="anthropic.claude-v2",
body=json.dumps({
"prompt": prompt,
"max_tokens_to_sample": 300
})
)
response = bedrock_client.invoke_model(
modelId="anthropic.claude-v2",
body=json.dumps({
"prompt": prompt,
"max_tokens_to_sample": 300
})
)
During peak hours, the application experiences throttling. CloudWatch metrics show that the provisioned throughput capacity is not being utilized. Which change should be made to ensure the application uses the purchased provisioned throughput?
A
Increase the provisioned throughput capacity to match peak demand.
B
Replace the modelId value with the provisioned throughput ARN.
C
Implement exponential backoff and retry logic in the application.
D
Modify the application to use the InvokeModelWithResponseStream API instead of the InvokeModel API.