
Ultimate access to all questions.
You are a Machine Learning Engineer at a tech company that has developed a model using Google's AI Platform. The model is now ready to be deployed into production to serve predictions. Currently, the model handles a few thousand queries per second but is experiencing significant latency issues. The infrastructure consists of a load balancer distributing incoming requests across multiple Kubeflow CPU-only pods on Google Kubernetes Engine (GKE). Your team has been tasked with improving the serving latency without making any changes to the existing infrastructure. Considering the constraints of cost, compliance, and scalability, which of the following actions would be the BEST to achieve this goal? (Choose one correct option)
A
Recompile TensorFlow Serving from source to enable CPU-specific optimizations and instruct GKE to select a suitable baseline minimum CPU platform for serving nodes.
B
Switch to the tensorflow-model-server-universal version of TensorFlow Serving.
C
Dramatically increase the max_batch_size parameter in TensorFlow Serving to process more requests simultaneously.
D
Significantly raise the max_enqueued_batches parameter in TensorFlow Serving to allow more batches to be queued for processing.