Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
You are deploying a custom ML model on Vertex AI for online prediction and need to optimize it for high throughput and low latency using the simplest solution. What should you do?
A
Deploy the model to a Vertex AI endpoint resource to automatically scale the serving backend based on the throughput. Configure the endpoint's autoscaling settings to minimize latency.
B
Implement a containerized serving solution using Cloud Run. Configure the concurrency settings to handle multiple requests simultaneously.
C
Apply simplification techniques such as model pruning and quantization to reduce the model's size and complexity. Retrain the model using Vertex AI to improve its performance, latency, memory, and throughput.
D
Enable request-response logging for the model hosted in Vertex AI. Use Looker Studio to analyze the logs, identify bottlenecks, and optimize the model accordingly.