Ultimate access to all questions.
You are a Machine Learning Engineer at a tech company that has developed a model using Google's AI Platform. The model is now ready to be deployed into production to serve predictions. Currently, the model handles a few thousand queries per second but is experiencing significant latency issues. The infrastructure consists of a load balancer distributing incoming requests across multiple Kubeflow CPU-only pods on Google Kubernetes Engine (GKE). Your team has been tasked with improving the serving latency without making any changes to the existing infrastructure. Considering the constraints of cost, compliance, and scalability, which of the following actions would be the BEST to achieve this goal? (Choose one correct option)