The goal is to reduce Mean Time to Recovery (MTTR) after a production failure. The scenario involved an extended outage due to a faulty release, which required rollback and a fix. To minimize recovery time, focus on strategies that enable rapid rollback and early issue detection in the release process.
- Option B (Blue/Green Deployment): This strategy maintains two identical production environments (blue for current, green for new). If the new release fails, traffic can be instantly switched back to the stable environment (blue), enabling near-zero-time rollback. This directly reduces MTTR by eliminating redeployment delays.
- Option E (CI Server with Unit Tests): Configuring a CI server to run automated unit tests on every commit catches code defects early in the development cycle. While this primarily prevents issues, it reduces the likelihood of failures reaching production. If a failure still occurs, the CI pipeline accelerates diagnosis by providing immediate feedback, indirectly supporting faster recovery through quicker issue identification.
Other options are less effective for MTTR reduction:
- Option A (Peer Code Reviews): Focuses on prevention by improving code quality but does not speed up recovery once a failure occurs in production.
- Option C (Code Linting): Enforces coding standards for prevention but does not address functional failures or recovery speed.
- Option D (Local Integration Tests): Relies on developers to manually run tests in inconsistent local environments, which is unreliable and does not ensure production-like validation. It lacks automation for rapid feedback.
Thus, B and E are the best choices: B for immediate rollback capability, and E for early defect detection to reduce failure frequency and aid recovery efficiency.