
Ultimate access to all questions.
As the on-call Site Reliability Engineer (SRE) for an online store running on Google Cloud Platform (GCP), you manage a microservice deployed on a Google Kubernetes Engine (GKE) Autopilot cluster. The microservice subscribes to a Cloud Pub/Sub topic that receives order messages and updates stock information in the warehousing system. During a high-traffic sales event, order volume surged, causing delays in stock updates. This results in orders being accepted for out-of-stock products. You observe that the microservice’s metrics (e.g., Pub/Sub lag, CPU usage, and processing latency) are significantly higher than typical levels. How should you scale the microservice to ensure the warehousing system reflects accurate inventory in real time and minimize customer impact, using GCP tools and SRE best practices?
A
Increase the number of replicas for the microservice deployment in GKE Autopilot by updating the replicas field in the deployment configuration. Configure Horizontal Pod Autoscaling (HPA) based on Pub/Sub subscription lag metrics using a custom metric in Cloud Monitoring. Set up an alert in Cloud Monitoring to notify the on-call team if lag exceeds a threshold.
B
Manually increase the CPU and memory resources for the microservice’s pods in GKE Autopilot. Export Pub/Sub logs to BigQuery and analyze processing delays daily to identify bottlenecks. Notify customers via email if their orders are affected by stock discrepancies.
C
Create a new GKE Standard cluster and redeploy the microservice with higher resource limits. Use Cloud Functions to process Pub/Sub messages in parallel and update the warehousing system. Implement a manual review process to cancel out-of-stock orders.
D
Switch the microservice to use Cloud Run instead of GKE Autopilot. Configure Cloud Run to scale based on HTTP request volume. Use Cloud Scheduler to periodically check Pub/Sub message backlog and trigger additional instances if needed.