
Ultimate access to all questions.
You manage a widely-used mobile game application running on Google Kubernetes Engine (GKE) across multiple Google Cloud regions, with each region containing several Kubernetes clusters. A report indicates that users in a specific region cannot connect to the application. Following Site Reliability Engineering (SRE) principles, what is the first action you should take to resolve this incident?
A
Reroute the user traffic from the affected region to other regions that don't report issues.
B
Use Stackdriver Monitoring to check for a spike in CPU or memory usage for the affected region.
C
Add an extra node pool that consists of high memory and high CPU machine type instances to the cluster.
D
Use Stackdriver Logging to filter on the clusters in the affected region, and inspect error messages in the logs.