
Ultimate access to all questions.
You are working as a Machine Learning Engineer for a retail company that has deployed a high-demand recommendation model on Vertex AI for real-time inference. The model is crucial for providing personalized recommendations to users in real-time. However, during peak hours, you encounter an 'Out of Memory' error during online prediction requests. The company emphasizes minimizing latency and cost while ensuring scalability. Given these constraints, what is the best course of action to resolve the 'Out of Memory' error without compromising the real-time inference capability? Choose the best option.
A
Request a quota increase for the number of prediction requests to handle more traffic.
B
Switch to batch prediction mode for all prediction requests to reduce memory usage.
C
Reduce the batch size of instances in each prediction request to decrease memory consumption.
D
Implement base64 encoding for the input data to reduce the size before sending it for prediction.