Google Professional Machine Learning Engineer

Ultimate access to all questions.

You are working as a Machine Learning Engineer for a retail company that has deployed a high-demand recommendation model on Vertex AI for real-time inference. The model is crucial for providing personalized recommendations to users in real-time. However, during peak hours, you encounter an 'Out of Memory' error during online prediction requests. The company emphasizes minimizing latency and cost while ensuring scalability. Given these constraints, what is the best course of action to resolve the 'Out of Memory' error without compromising the real-time inference capability? Choose the best option.

Real Exam

Request a quota increase for the number of prediction requests to handle more traffic.

8.7%

Switch to batch prediction mode for all prediction requests to reduce memory usage.

Loading comments...