
Ultimate access to all questions.
You have developed a machine learning model using AI Platform and are preparing to deploy it to production. Your application handles a few thousand queries per second, and you are currently experiencing latency issues. Incoming requests are processed by a load balancer that distributes them across multiple Kubeflow CPU-only pods managed by Google Kubernetes Engine (GKE). The objective is to enhance the serving latency without altering the existing infrastructure. Which action should you take?
A
Significantly increase the max_batch_size TensorFlow Serving parameter.
B
Switch to the tensorflow-model-server-universal version of TensorFlow Serving.
C
Significantly increase the max_enqueued_batches TensorFlow Serving parameter.
D
Recompile TensorFlow Serving using the source to support CPU-specific optimizations. Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes.