
Answer-first summary for fast verification
Answer: Recompile TensorFlow Serving from source to include CPU-specific optimizations and instruct GKE to select an appropriate baseline minimum CPU platform for the serving nodes to ensure the availability of necessary CPU features.
Recompiling TensorFlow Serving from source with CPU-specific optimizations, such as enabling Intel MKL-DNN and tuning TensorFlow inter/intra-op parallelism, can significantly enhance performance on CPU-only pods. Additionally, instructing GKE to select an appropriate minimum CPU platform ensures the availability of necessary CPU features for optimized instructions. This approach improves CPU utilization and reduces latency without infrastructure changes. Other options, like increasing batch sizes or switching to a universal TensorFlow Serving version, are less effective for directly addressing latency issues.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are a Machine Learning Engineer at a tech company that has developed a model using AI Platform. The model is now being moved to production, where it serves a few thousand queries per second. However, you're encountering latency issues. The current setup involves a load balancer distributing requests across multiple Kubeflow CPU-only pods on Google Kubernetes Engine (GKE). Given the constraints of not altering the underlying infrastructure, which of the following strategies would be the MOST effective in improving the serving latency? (Choose one correct option)
A
Switch to the tensorflow-model-server-universal version of TensorFlow Serving to leverage its broader compatibility features.
B
Increase the max_enqueued_batches parameter in TensorFlow Serving significantly to allow more batches to be processed simultaneously, hoping to reduce latency.
C
Recompile TensorFlow Serving from source to include CPU-specific optimizations and instruct GKE to select an appropriate baseline minimum CPU platform for the serving nodes to ensure the availability of necessary CPU features.
D
Significantly increase the max_batch_size parameter in TensorFlow Serving to process larger batches at once, aiming to decrease the overall processing time.