
Google Professional Cloud DevOps Engineer
Get started today
Ultimate access to all questions.
As a member of an on-call Site Reliability Engineering team overseeing a web application in production, you encounter a situation where users from a specific region are reporting errors and failed requests following a recent update. After declaring an incident and assessing the impact, which action should you prioritize?
As a member of an on-call Site Reliability Engineering team overseeing a web application in production, you encounter a situation where users from a specific region are reporting errors and failed requests following a recent update. After declaring an incident and assessing the impact, which action should you prioritize?
Explanation:
The correct course of action is to first mitigate the impact on users, as this ensures service continuity while further investigations or fixes are underway. Options A, B, and D are steps that follow after ensuring the service is stable. Mitigation is crucial in the immediate response to an incident to minimize user disruption. Reference: Google SRE Workbook on Incident Response (Case Study 2).