
Ultimate access to all questions.
You have deployed a custom model with multiple memory-intensive preprocessing steps to a Vertex AI endpoint. While initial validation showed acceptable latency, the endpoint fails to autoscale properly under multiple concurrent requests. What steps should you take to resolve this?
A
Use a machine type with more memory
B
Decrease the number of workers per machine
C
Increase the CPU utilization target in the autoscaling configurations.
D
Decrease the CPU utilization target in the autoscaling configurations