
Ultimate access to all questions.
You have a machine learning model built with scikit-learn that has been deployed to a Vertex AI endpoint. During testing of this model with live production traffic, you notice that the number of requests per hour is double what you initially anticipated. To ensure that the endpoint scales efficiently and prevents users from experiencing high latency, what action should you take to handle potential future increases in demand?
A
Deploy two models to the same endpoint, and distribute requests among them evenly
B
Configure an appropriate minReplicaCount value based on expected baseline traffic
C
Set the target utilization percentage in the autoscailingMetricSpecs configuration to a higher value
D
Change the model’s machine type to one that utilizes GPUs