
Ultimate access to all questions.
You are an ML engineer at a retail company. You have built a machine learning model that predicts which coupon to offer an e-commerce customer at the checkout based on the items in their cart. The model is deployed using Google Cloud services. When a customer proceeds to checkout, your serving pipeline joins the customer's current cart with a row in a BigQuery table containing the customer's historic purchase behavior, and this combined data is used as the model's input to make predictions. Recently, the web development team has reported that the model's predictions are returned too slowly, causing delays in loading the coupon offer along with the rest of the web page. How should you speed up the model's prediction time to resolve this issue?
A
Attach an NVIDIA P100 GPU to your deployed model’s instance.
B
Use a low latency database for the customers’ historic purchase behavior.
C
Deploy your model to more instances behind a load balancer to distribute traffic.
D
Create a materialized view in BigQuery with the necessary data for predictions.
Explanation:
The correct answer is B: Use a low latency database for the customers’ historic purchase behavior. The primary issue is that the model predictions are returned too slowly due to the time it takes to join the customer's current cart data with historic purchase behavior stored in BigQuery. BigQuery is highly efficient for analytical queries but not optimized for low-latency operations, which are critical for real-time predictions. Switching to a low latency database, such as Cloud Bigtable or Firestore, will provide faster read operations required to quickly retrieve and integrate historic purchase data, thereby speeding up the model's response time and reducing overall latency.