
Ultimate access to all questions.
A Generative AI Engineer built an LLM application using the provisioned throughput Foundation Model API. The application is ready for deployment, but the request volume is too low to justify a dedicated provisioned throughput endpoint. Which strategy should they use to ensure the best cost-effectiveness?
A
Switch to using External Models instead
B
Deploy the model using pay-per-token throughput as it comes with cost guarantees
C
Change to a model with a fewer number of parameters in order to reduce hardware constraint issues
D
Throttle the incoming batch of requests manually to avoid rate limiting issues