
Answer-first summary for fast verification
Answer: Deploy the model using pay-per-token throughput as it comes with cost guarantees
The question focuses on cost-effectiveness for low request volume scenarios with Databricks Foundation Model APIs. Option B (pay-per-token throughput) is optimal because it provides usage-based pricing without the fixed costs of provisioned throughput, aligning perfectly with low-volume applications. The community discussion shows 100% consensus for B, with the top comment noting that pay-per-token maximizes cost-effectiveness in low-volume situations. Option A (External Models) may introduce additional complexity and costs. Option C (changing models) addresses hardware constraints but doesn't directly solve the cost-effectiveness problem. Option D (manual throttling) is inefficient and doesn't provide the built-in cost guarantees that pay-per-token offers.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A Generative AI Engineer built an LLM application using the provisioned throughput Foundation Model API. The application is ready for deployment, but the request volume is too low to justify a dedicated provisioned throughput endpoint. Which strategy should they use to ensure the best cost-effectiveness?
A
Switch to using External Models instead
B
Deploy the model using pay-per-token throughput as it comes with cost guarantees
C
Change to a model with a fewer number of parameters in order to reduce hardware constraint issues
D
Throttle the incoming batch of requests manually to avoid rate limiting issues