
Answer-first summary for fast verification
Answer: Increase the number of workers in your model server
The issue of the model not scaling beyond one replica, despite low CPU utilization under high load, suggests a bottleneck in the model server itself rather than a resource limitation. Increasing the number of workers in your model server (Option B) allows it to handle more concurrent requests and effectively utilize available CPU resources. This would improve the ability to handle high loads without dropping requests.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You have deployed a machine learning model built with scikit-learn to a Vertex AI endpoint. A custom model server is being used, and autoscaling has been enabled. Despite this, the model is unable to scale beyond a single replica, causing some requests to drop. Additionally, you observe that CPU utilization remains low even during periods of high load on the server. What action should you take to address this issue?
A
Attach a GPU to the prediction nodes
B
Increase the number of workers in your model server
C
Schedule scaling of the nodes to match expected demand
D
Increase the minReplicaCount in your DeployedModel configuration
No comments yet.