
Ultimate access to all questions.
You have built a custom ML model that requires significant memory for preprocessing tasks before making predictions. This model has been deployed to a Vertex AI endpoint, and initial tests showed that it produces results within a reasonable time frame. However, upon routing real user traffic to the endpoint, you notice that it fails to autoscale adequately when handling multiple simultaneous requests. Given this situation, what should you do to ensure proper autoscaling behavior and maintain performance?
A
Use a machine type with more memory
B
Decrease the number of workers per machine
C
Increase the CPU utilization target in the autoscaling configurations.
D
Decrease the CPU utilization target in the autoscaling configurations