Ultimate access to all questions.
You work for a large bank that offers online services to customers through an application hosted on Google Cloud, with instances running in both the US (us-central1) and Singapore (asia-southeast1) regions. To enhance security, you have developed a PyTorch model designed to classify transactions as potentially fraudulent or not. The model is a three-layer perceptron that takes both numerical and categorical features as input, with hashing operations performed within the model itself. The model has been deployed to the us-central1 region on n1-highcpu-16 machines, providing real-time predictions with a current median response latency of 40 ms. However, customers in Singapore are experiencing higher latencies, and you aim to reduce these delays. What should you do to improve latency, particularly for users in Singapore?