The application has consistently used only 5% of its error budget over six months, indicating higher reliability than required by the current SLO. To make the SLO reflect this observed reliability while balancing velocity, reliability, and business needs:
- Option B (Have more frequent or potentially risky application releases): This leverages the surplus error budget to increase deployment velocity. Since the error budget is underutilized, teams can afford riskier releases (e.g., faster rollouts or experimental features) without breaching the SLO. This balances velocity and reliability by using the budget as intended, allowing innovation while maintaining user trust.
- Option C (Tighten the SLO to match the application's observed reliability): Tightening the SLO (e.g., increasing the target reliability percentage) directly aligns the SLO with the observed high performance. This sets a higher bar for future operations, ensuring the SLO accurately reflects the system's capabilities. It must be done collaboratively with stakeholders to confirm business needs, as the SLO was previously deemed appropriate, but the data justifies an update.
Other options are less effective:
- A (Add more serving capacity): Unnecessary because reliability is already high, and it increases costs without directly addressing SLO reflection.
- D (Implement additional SLIs): Adds complexity but doesn't adjust the SLO to match observed reliability; it only provides more data, which isn't the primary goal here.
- E (Announce planned downtime): Artificially consumes error budget but doesn't make the SLO reflect actual reliability; it risks user dissatisfaction and is not a sustainable or balanced approach.