Ultimate access to all questions.
You manage a widely-used mobile game application running on Google Kubernetes Engine (GKE) across multiple Google Cloud regions, with each region containing several Kubernetes clusters. A report indicates that users in a specific region cannot connect to the application. Following Site Reliability Engineering (SRE) principles, what is the first action you should take to resolve this incident?
Explanation:
In Site Reliability Engineering (SRE) practices, the first step during an incident is to diagnose the problem before taking corrective actions. The issue reported is that users in a specific region cannot connect to the application, indicating a regional failure.