AWS Certified AI Practitioner

Get started today

Ultimate access to all questions.

Explanation:

Analysis of the Question

The question describes a company using few-shot prompting with a base model on Amazon Bedrock. Key details:

Few-shot prompting: The model uses 10 examples in each prompt to guide its responses.
Current usage: The model is invoked once per day and is performing well.
Goal: Reduce monthly cost while maintaining the model's performance.

Amazon Bedrock Pricing Model

Amazon Bedrock charges based on tokens processed, which includes:

Input tokens: Tokens from the prompt (including examples in few-shot prompting).
Output tokens: Tokens generated by the model in response.

Cost = (Input tokens + Output tokens) × Price per token

Evaluation of Options

A: Customize the model by using fine-tuning.

Not optimal for this scenario: Fine-tuning involves training the model on a dataset to adapt it to specific tasks, which requires additional costs for training and hosting a custom model. While fine-tuning might reduce prompt length in the long run, it introduces upfront costs and complexity that don't align with the immediate goal of reducing monthly costs for a model that's already performing well with just one daily invocation.

B: Decrease the number of tokens in the prompt.

Optimal solution: Since Bedrock pricing is token-based, reducing the number of tokens in the prompt directly decreases input token costs. In few-shot prompting, this can be achieved by:
- Reducing the number of examples from 10 to a smaller number while maintaining performance.
- Making the examples more concise without losing effectiveness.
This approach requires no changes to the model infrastructure, no long-term commitments, and immediately reduces costs while preserving the model's performance with minimal adjustment.

C: Increase the number of tokens in the prompt.

Counterproductive: Adding more tokens would increase input token count, thereby increasing costs—the opposite of what the company wants.

D: Use Provisioned Throughput.

Less suitable for this use case: Provisioned Throughput is designed for predictable, high-volume workloads where committing to a certain throughput level can provide cost savings compared to on-demand pricing. However, with only one invocation per day, the usage is extremely low and sporadic. Provisioned Throughput typically requires a longer-term commitment and is cost-effective only when utilization is high and predictable. For such minimal usage, the savings would likely be negligible or even more expensive than on-demand pricing, and it doesn't address the core cost driver (token count).

Conclusion

Option B is the most effective solution because it directly targets the primary cost component in Amazon Bedrock—token processing. By optimizing the prompt to use fewer tokens (e.g., reducing the number of examples or making them more concise), the company can lower costs immediately without compromising performance, infrastructure changes, or long-term commitments. This aligns with AWS best practices for cost optimization in generative AI applications, where prompt engineering is a key lever for managing expenses.

Explanation:

Analysis of the Question

The question describes a company using few-shot prompting with a base model on Amazon Bedrock. Key details:

Few-shot prompting: The model uses 10 examples in each prompt to guide its responses.
Current usage: The model is invoked once per day and is performing well.
Goal: Reduce monthly cost while maintaining the model's performance.

Amazon Bedrock Pricing Model

Amazon Bedrock charges based on tokens processed, which includes:

Input tokens: Tokens from the prompt (including examples in few-shot prompting).
Output tokens: Tokens generated by the model in response.

Cost = (Input tokens + Output tokens) × Price per token

Evaluation of Options

A: Customize the model by using fine-tuning.

Not optimal for this scenario: Fine-tuning involves training the model on a dataset to adapt it to specific tasks, which requires additional costs for training and hosting a custom model. While fine-tuning might reduce prompt length in the long run, it introduces upfront costs and complexity that don't align with the immediate goal of reducing monthly costs for a model that's already performing well with just one daily invocation.

B: Decrease the number of tokens in the prompt.

Optimal solution: Since Bedrock pricing is token-based, reducing the number of tokens in the prompt directly decreases input token costs. In few-shot prompting, this can be achieved by:
- Reducing the number of examples from 10 to a smaller number while maintaining performance.
- Making the examples more concise without losing effectiveness.
This approach requires no changes to the model infrastructure, no long-term commitments, and immediately reduces costs while preserving the model's performance with minimal adjustment.

C: Increase the number of tokens in the prompt.

Counterproductive: Adding more tokens would increase input token count, thereby increasing costs—the opposite of what the company wants.

D: Use Provisioned Throughput.

Less suitable for this use case: Provisioned Throughput is designed for predictable, high-volume workloads where committing to a certain throughput level can provide cost savings compared to on-demand pricing. However, with only one invocation per day, the usage is extremely low and sporadic. Provisioned Throughput typically requires a longer-term commitment and is cost-effective only when utilization is high and predictable. For such minimal usage, the savings would likely be negligible or even more expensive than on-demand pricing, and it doesn't address the core cost driver (token count).

Conclusion

Comments (0)

No comments yet.

A company is using few-shot prompting with a base model hosted on Amazon Bedrock, including 10 examples in each prompt. The model is invoked once per day and performs well. The company now aims to reduce the monthly cost. Which solution meets these requirements?

Exam-Like

Last updated: May 4, 2026 at 14:02

Customize the model by using fine-tuning.

0.0%

Decrease the number of tokens in the prompt.

88.9%

Increase the number of tokens in the prompt.