Ultimate access to all questions.
You have deployed a custom model with multiple memory-intensive preprocessing steps to a Vertex AI endpoint. While initial validation showed acceptable latency, the endpoint fails to autoscale properly under multiple concurrent requests. What steps should you take to resolve this?