Databricks Certified Generative AI Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

Question: 30

A Generative AI Engineer developed an LLM application using the provisioned throughput Foundation Model API. Now that the application is ready to be deployed, they realize their volume of requests are not sufficiently high enough to create their own provisioned throughput endpoint. They want to choose a strategy that ensures the best cost-effectiveness for their application.

What strategy should the Generative AI Engineer use?

Real Exam

Community

LLeetQuiz

Switch to using External Models instead

Deploy the model using pay-per-token throughput as it comes with cost guarantees

Change to a model with a fewer number of parameters in order to reduce hardware constraint issues

Throttle the incoming batch of requests manually to avoid rate limiting issues

Explanation:

Explanation

When the volume of requests is not sufficiently high enough to justify creating a provisioned throughput endpoint, the most cost-effective strategy is to use pay-per-token throughput.

Key Reasons:

Cost Efficiency for Low Volume: Provisioned throughput endpoints are designed for high-volume, predictable workloads where you pay for reserved capacity regardless of usage. For low-volume applications, this results in paying for unused capacity.
Pay-Per-Token Model: This approach charges only for actual usage (tokens processed), making it ideal for applications with variable or low request volumes.
No Minimum Commitments: Unlike provisioned throughput that requires capacity reservations, pay-per-token has no minimum commitments, ensuring you only pay for what you use.
Scalability: Pay-per-token automatically scales with your usage patterns without requiring manual capacity adjustments.

Why Other Options Are Less Suitable:

Option A (External Models): Switching to external models may introduce additional complexity, dependency on third-party services, and potentially higher costs per token.
Option C (Fewer Parameters): Reducing model parameters affects model quality and capability without directly addressing the cost-effectiveness issue for low-volume usage.
Option D (Manual Throttling): This is a reactive approach that doesn't optimize cost structure and may impact user experience.

Conclusion: For low-volume LLM applications, pay-per-token throughput provides the optimal balance of cost-effectiveness, flexibility, and scalability.

Powered ByGPT-5.2

Comments

Loading comments...