
Answer-first summary for fast verification
Answer: Configure an appropriate minReplicaCount value based on expected baseline traffic
The correct answer is B. Configuring an appropriate minReplicaCount value based on expected baseline traffic ensures that there are enough replicas running to handle the baseline traffic efficiently. This allows Vertex AI's built-in autoscaling to automatically provision additional replicas when demand increases, thereby maintaining performance and preventing high latency. Options A, C, and D are less optimal because they either introduce redundant resources, risk under-provisioning during peak traffic, or incur unnecessary expenses without directly addressing scaling needs.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You have a machine learning model built with scikit-learn that has been deployed to a Vertex AI endpoint. During testing of this model with live production traffic, you notice that the number of requests per hour is double what you initially anticipated. To ensure that the endpoint scales efficiently and prevents users from experiencing high latency, what action should you take to handle potential future increases in demand?
A
Deploy two models to the same endpoint, and distribute requests among them evenly
B
Configure an appropriate minReplicaCount value based on expected baseline traffic
C
Set the target utilization percentage in the autoscailingMetricSpecs configuration to a higher value
D
Change the model’s machine type to one that utilizes GPUs
No comments yet.