
Answer-first summary for fast verification
Answer: Recompile TensorFlow Serving from source to enable CPU-specific optimizations and instruct GKE to select a suitable baseline minimum CPU platform for serving nodes.
The question focuses on optimizing server performance within given constraints. Options C and D suggest increasing the load on the system, which could exacerbate latency issues rather than alleviate them. Option B's benefits are not clearly defined and may not directly address the latency problem. Option A, however, directly targets the issue by optimizing TensorFlow Serving for the specific CPU architecture in use, thereby improving latency without necessitating changes to the underlying infrastructure. This approach aligns with the constraints and objectives outlined in the scenario.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are a Machine Learning Engineer at a tech company that has developed a model using Google's AI Platform. The model is now ready to be deployed into production to serve predictions. Currently, the model handles a few thousand queries per second but is experiencing significant latency issues. The infrastructure consists of a load balancer distributing incoming requests across multiple Kubeflow CPU-only pods on Google Kubernetes Engine (GKE). Your team has been tasked with improving the serving latency without making any changes to the existing infrastructure. Considering the constraints of cost, compliance, and scalability, which of the following actions would be the BEST to achieve this goal? (Choose one correct option)
A
Recompile TensorFlow Serving from source to enable CPU-specific optimizations and instruct GKE to select a suitable baseline minimum CPU platform for serving nodes.
B
Switch to the tensorflow-model-server-universal version of TensorFlow Serving.
C
Dramatically increase the max_batch_size parameter in TensorFlow Serving to process more requests simultaneously.
D
Significantly raise the max_enqueued_batches parameter in TensorFlow Serving to allow more batches to be queued for processing.
No comments yet.