You work for a large bank that offers online services to customers through an application hosted on Google Cloud, with instances running in both the US (us-central1) and Singapore (asia-southeast1) regions. To enhance security, you have developed a PyTorch model designed to classify transactions as potentially fraudulent or not. The model is a three-layer perceptron that takes both numerical and categorical features as input, with hashing operations performed within the model itself. The model has been deployed to the us-central1 region on n1-highcpu-16 machines, providing real-time predictions with a current median response latency of 40 ms. However, customers in Singapore are experiencing higher latencies, and you aim to reduce these delays. What should you do to improve latency, particularly for users in Singapore?

Exam-Like

Last updated: January 20, 2026 at 14:03

Attach an NVIDIA T4 GPU to the machines being used for online inference.

7.1%

Change the machines being used for online inference to n1-highcpu-32.

6.1%

Deploy the model to Vertex AI private endpoints in the us-central1 and asia-southeast1 regions, and allow the application to choose the appropriate endpoint.

63.6%

Create another Vertex AI endpoint in the asia-southeast1 region, and allow the application to choose the appropriate endpoint.

23.2%

Google Professional Machine Learning Engineer

Get started today

Comments