LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Google Professional Machine Learning Engineer

Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.


You have a machine learning model built with scikit-learn that has been deployed to a Vertex AI endpoint. During testing of this model with live production traffic, you notice that the number of requests per hour is double what you initially anticipated. To ensure that the endpoint scales efficiently and prevents users from experiencing high latency, what action should you take to handle potential future increases in demand?

Exam-Like



Explanation:

The correct answer is B. Configuring an appropriate minReplicaCount value based on expected baseline traffic ensures that there are enough replicas running to handle the baseline traffic efficiently. This allows Vertex AI's built-in autoscaling to automatically provision additional replicas when demand increases, thereby maintaining performance and preventing high latency. Options A, C, and D are less optimal because they either introduce redundant resources, risk under-provisioning during peak traffic, or incur unnecessary expenses without directly addressing scaling needs.

Powered ByGPT-5