
Answer-first summary for fast verification
Answer: Deploy an online Vertex AI prediction endpoint. Set the max replica count to 100
The question requires deploying a scikit-learn classification model with 24/7 availability, handling millions of requests per second during peak hours (8 AM to 7 PM), while minimizing cost. Option B is optimal because: (1) scikit-learn does not support GPU acceleration, making GPU-based options (C and D) unnecessary and more expensive; (2) setting a high max replica count (100) allows Vertex AI to scale horizontally to handle the massive request volume during peak hours, ensuring performance without over-provisioning during off-peak periods, as Vertex AI dynamically scales replicas based on demand. Option A (max replica count of 1) is insufficient for millions of requests per second, risking service degradation. The community discussion (e.g., 100% consensus for B, upvoted comments) reinforces that GPUs are not needed for scikit-learn and scalability is critical for cost-effective performance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You need to deploy a scikit-learn classification model to production. The model must serve requests 24/7, and you expect millions of requests per second to the production application between 8 AM and 7 PM. You need to minimize the cost of deployment. What should you do?
A
Deploy an online Vertex AI prediction endpoint. Set the max replica count to 1
B
Deploy an online Vertex AI prediction endpoint. Set the max replica count to 100
C
Deploy an online Vertex AI prediction endpoint with one GPU per replica. Set the max replica count to 1
D
Deploy an online Vertex AI prediction endpoint with one GPU per replica. Set the max replica count to 100
No comments yet.