
Explanation:
An availability Service Level Indicator (SLI) quantifies the reliability of a service from the user's perspective, typically defined as the proportion of requests that result in a successful outcome. In this scenario, the issue involved HTTP 500 errors indicating server failures, which were correlated with high I/O wait times and resolved by resizing the persistent disk. This underscores that availability should be measured based on user-visible outcomes, not internal system metrics.
Therefore, Option B is the most appropriate SLI because it is user-centric, measurable, and directly tied to the service's functional reliability.
Ultimate access to all questions.
No comments yet.
You are responsible for the reliability of a high-volume enterprise application. Users report that a critical data-intensive reporting feature consistently fails with HTTP 500 errors. Investigation reveals a strong correlation between failures and the size of an internal queue used for report generation, traced to a reporting backend with high I/O wait times. After resolving the issue by resizing the backend's persistent disk (PD), you need to define an availability Service Level Indicator (SLI) for the report generation feature. How would you define it?
A
As the I/O wait times aggregated across all report generation backends
B
As the proportion of report generation requests that result in a successful response
C
As the application's report generation queue size compared to a known-good threshold
D
As the reporting backend PD throughout capacity compared to a known-good threshold