
Ultimate access to all questions.
You are deploying a machine learning model on Vertex AI for real-time inference in a production environment. The model is critical for processing customer transactions in real-time, and any downtime or performance issues could significantly impact business operations. Recently, you've encountered 'Out of Memory' errors during peak load times. These errors are causing delays in transaction processing and are becoming a bottleneck for scalability. The current setup uses a standard virtual machine configuration with default memory settings. Considering the need for high availability, cost efficiency, and minimal latency, which of the following strategies would you implement to mitigate the 'Out of Memory' errors? (Choose two correct options if option E is available, otherwise choose one.)
A
Switch the deployment from online to batch prediction mode to reduce memory usage during off-peak hours.
B
Implement data compression techniques, such as base64 encoding, for all incoming prediction requests to decrease memory footprint.
C
Adjust the batch size of the inference requests to a smaller number to ensure that the model processes data within the available memory limits.
D
Request an upgrade to a higher memory virtual machine configuration to accommodate larger batch sizes and more complex models.
E
Both C and D are viable strategies to address the 'Out of Memory' errors while maintaining real-time inference capabilities.