Ultimate access to all questions.
You have developed a machine learning model using AI Platform and are preparing to deploy it to production. Your application handles a few thousand queries per second, and you are currently experiencing latency issues. Incoming requests are processed by a load balancer that distributes them across multiple Kubeflow CPU-only pods managed by Google Kubernetes Engine (GKE). The objective is to enhance the serving latency without altering the existing infrastructure. Which action should you take?