Google Professional Machine Learning Engineer

Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.


You have developed a machine learning model using AI Platform and are preparing to deploy it to production. Your application handles a few thousand queries per second, and you are currently experiencing latency issues. Incoming requests are processed by a load balancer that distributes them across multiple Kubeflow CPU-only pods managed by Google Kubernetes Engine (GKE). The objective is to enhance the serving latency without altering the existing infrastructure. Which action should you take?




Explanation:

The correct answer is D. Recompiling TensorFlow Serving using the source to support CPU-specific optimizations and instructing GKE to select an appropriate baseline minimum CPU platform for serving nodes can significantly improve serving latency. This approach makes the most out of the hardware capabilities and fine-tunes the performance for specific CPU types. Options A and C involve adjusting batching parameters, which primarily affect throughput rather than latency and could potentially increase the latency for individual requests. Option B is less effective and not well-documented for latency improvements.