Ultimate access to all questions.
You have a machine learning model built with scikit-learn that has been deployed to a Vertex AI endpoint. During testing of this model with live production traffic, you notice that the number of requests per hour is double what you initially anticipated. To ensure that the endpoint scales efficiently and prevents users from experiencing high latency, what action should you take to handle potential future increases in demand?