
Ultimate access to all questions.
A Generative AI Engineer prototyping an LLM system unintentionally executed thousands of inference queries against a Foundation Model endpoint over a weekend. What action should they take to prevent this from recurring?
A
Use prompt engineering to instruct the LLM endpoints to refuse too many subsequent queries.
B
Require that all development code which interfaces with a Foundation Model endpoint must be reviewed by a Staff level engineer before execution.
C
Build a pyfunc model which proxies to the Foundation Model endpoint and add throttling within the pyfune model.
D
Configure rate limiting on the Foundation Model endpoints.