Ultimate access to all questions.
You have deployed a machine learning model built with scikit-learn to a Vertex AI endpoint. A custom model server is being used, and autoscaling has been enabled. Despite this, the model is unable to scale beyond a single replica, causing some requests to drop. Additionally, you observe that CPU utilization remains low even during periods of high load on the server. What action should you take to address this issue?