
Answer-first summary for fast verification
Answer: Focus on identifying the contributing causes of the incident rather than the individual responsible for the cause.
In Site Reliability Engineering (SRE) practices, post-mortems prioritize a blameless culture to foster learning and prevent future incidents. The focus should be on systemic issues and contributing factors rather than individual fault. Option B aligns with this by emphasizing identifying contributing causes (e.g., insufficient testing, resource monitoring gaps, or deployment processes) without targeting individuals. Option A contradicts SRE principles, as avoiding recurrence is critical. Option C and D promote blame by investigating individuals or punishing engineers, undermining trust and hindering transparency. SRE encourages collective responsibility and process improvements.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are responsible for conducting a post-mortem for a service outage caused by a new release that consumed excessive memory resources. The release was rolled back successfully to minimize user impact. Following Site Reliability Engineering (SRE) best practices, what steps should you take in the post-mortem process?
A
Focus on developing new features rather than avoiding the outages from recurring.
B
Focus on identifying the contributing causes of the incident rather than the individual responsible for the cause.
C
Plan individual meetings with all the engineers involved. Determine who approved and pushed the new release to production.
D
Use the Git history to find the related code commit. Prevent the engineer who made that commit from working on production services.
No comments yet.