
Answer-first summary for fast verification
Answer: Configure rate limiting on the Foundation Model endpoints.
The question asks for the most effective action to prevent unintentional excessive queries to a Foundation Model endpoint. Option D (Configure rate limiting on the Foundation Model endpoints) is optimal because it directly addresses the root cause by enforcing usage limits at the endpoint level, preventing accidental overuse regardless of the source. This is a proactive, infrastructure-level control that aligns with best practices for managing API costs and resources. Option C (Build a pyfunc model with throttling) adds unnecessary complexity and may not be as reliable as native endpoint controls. Option A (Prompt engineering) is ineffective for preventing excessive queries, as prompts guide model behavior, not usage limits. Option B (Code review) is reactive and doesn't prevent accidental execution; it only catches issues pre-deployment. The community discussion shows 67% support for D, with the highest upvoted comment endorsing it, while C received mixed and less supported responses.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A Generative AI Engineer prototyping an LLM system unintentionally executed thousands of inference queries against a Foundation Model endpoint over a weekend. What action should they take to prevent this from recurring?
A
Use prompt engineering to instruct the LLM endpoints to refuse too many subsequent queries.
B
Require that all development code which interfaces with a Foundation Model endpoint must be reviewed by a Staff level engineer before execution.
C
Build a pyfunc model which proxies to the Foundation Model endpoint and add throttling within the pyfune model.
D
Configure rate limiting on the Foundation Model endpoints.