
Answer-first summary for fast verification
Answer: Develop a post-mortem to be distributed to stakeholders.
Following the Site Reliability Engineering (SRE) recommended practices, the first step after resolving a major service outage is to develop a post-mortem. This document should detail what happened, why it happened, and how similar incidents can be prevented in the future. It is a constructive approach that focuses on learning and improvement rather than assigning blame. Calling individual stakeholders (A) might be necessary for critical stakeholders but is not the first step recommended by SRE practices. Sending the Incident State Document (C) could be part of the process, but it's more about the current state during the incident rather than a summary after resolution. Requiring an apology email from the responsible engineer (D) is not aligned with SRE's blameless post-mortem culture.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
After resolving a major service outage that impacted all users for several hours, you need to prepare an incident summary for stakeholders following Site Reliability Engineering best practices. What is the first step you should take?
A
Call individual stakeholders to explain what happened.
B
Develop a post-mortem to be distributed to stakeholders.
C
Send the Incident State Document to all the stakeholders.
D
Require the engineer responsible to write an apology email to all stakeholders.
No comments yet.