Ultimate access to all questions.
You have built a custom ML model that requires significant memory for preprocessing tasks before making predictions. This model has been deployed to a Vertex AI endpoint, and initial tests showed that it produces results within a reasonable time frame. However, upon routing real user traffic to the endpoint, you notice that it fails to autoscale adequately when handling multiple simultaneous requests. Given this situation, what should you do to ensure proper autoscaling behavior and maintain performance?