
Answer-first summary for fast verification
Answer: Decrease the CPU utilization target in the autoscaling configurations
The correct answer is D. Decreasing the CPU utilization target in the autoscaling configurations will allow the endpoint to scale up at a lower CPU usage level. This means that the system will initiate scaling actions more proactively when there is an increase in load, even if the memory-intensive preprocessing tasks cause spikes in CPU utilization. This adjustment addresses the scaling behavior directly and ensures that the endpoint can handle increased traffic effectively.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You have built a custom ML model that requires significant memory for preprocessing tasks before making predictions. This model has been deployed to a Vertex AI endpoint, and initial tests showed that it produces results within a reasonable time frame. However, upon routing real user traffic to the endpoint, you notice that it fails to autoscale adequately when handling multiple simultaneous requests. Given this situation, what should you do to ensure proper autoscaling behavior and maintain performance?
A
Use a machine type with more memory
B
Decrease the number of workers per machine
C
Increase the CPU utilization target in the autoscaling configurations.
D
Decrease the CPU utilization target in the autoscaling configurations
No comments yet.