
Answer-first summary for fast verification
Answer: Implement a containerized serving solution using Cloud Run. Configure the concurrency settings to handle multiple requests simultaneously.
The question asks for the simplest solution to optimize a custom ML model on Vertex AI for high throughput and low latency. Option B (Cloud Run with concurrency settings) is the correct choice because it provides a serverless, containerized solution that automatically handles scaling and allows configuration of concurrency to process multiple requests simultaneously, optimizing both throughput and latency with minimal setup. Option A (Vertex AI endpoint with autoscaling) is also valid as it's a managed service designed for ML serving with built-in autoscaling to handle traffic and minimize latency, making it straightforward. However, the community discussion shows 100% consensus on B, likely due to its simplicity and effectiveness. Option C involves model retraining and optimization techniques, which is more complex and time-consuming, not the simplest solution. Option D focuses on logging and analysis for optimization, which is reactive rather than a direct deployment optimization and adds complexity.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are deploying a custom ML model on Vertex AI for online prediction and need to optimize it for high throughput and low latency using the simplest solution. What should you do?
A
Deploy the model to a Vertex AI endpoint resource to automatically scale the serving backend based on the throughput. Configure the endpoint's autoscaling settings to minimize latency.
B
Implement a containerized serving solution using Cloud Run. Configure the concurrency settings to handle multiple requests simultaneously.
C
Apply simplification techniques such as model pruning and quantization to reduce the model's size and complexity. Retrain the model using Vertex AI to improve its performance, latency, memory, and throughput.
D
Enable request-response logging for the model hosted in Vertex AI. Use Looker Studio to analyze the logs, identify bottlenecks, and optimize the model accordingly.