Google Professional Cloud DevOps Engineer

Get started today

Ultimate access to all questions.

Quick Answer

Answer-first summary for fast verification

Answer: As the proportion of report generation requests that result in a successful response

An availability Service Level Indicator (SLI) quantifies the reliability of a service from the user's perspective, typically defined as the proportion of requests that result in a successful outcome. In this scenario, the issue involved HTTP 500 errors indicating server failures, which were correlated with high I/O wait times and resolved by resizing the persistent disk. This underscores that availability should be measured based on user-visible outcomes, not internal system metrics. - **Option A (I/O wait times)**: This measures a system-level performance metric, not availability. While it correlated with failures in this case, it does not directly reflect whether users are experiencing successful requests and could be misleading if other factors cause high I/O without failures. - **Option B (Proportion of successful responses)**: This is the correct choice. It directly defines availability as the ratio of report generation requests that succeed (e.g., returning HTTP 2xx codes) to total requests. This aligns with SRE best practices, as it captures the user experience and can detect issues like the HTTP 500 errors described. - **Option C (Queue size comparison)**: This focuses on an internal queue metric, which is a leading indicator of potential problems but not a direct measure of availability. Queue size might grow without causing failures (e.g., during temporary spikes), and it doesn't quantify actual user success or failure rates. - **Option D (PD throughput capacity comparison)**: This assesses resource capacity, not availability. While insufficient throughput contributed to the issue, this metric is about system limits rather than the observable service reliability for users. It could be used for capacity planning but not as an SLI for availability. Therefore, Option B is the most appropriate SLI because it is user-centric, measurable, and directly tied to the service's functional reliability.

Author: LeetQuiz Editorial Team

Quick Answer

Answer-first summary for fast verification

Answer: As the proportion of report generation requests that result in a successful response

Author: LeetQuiz Editorial Team

Comments (0)

No comments yet.

You are responsible for the reliability of a high-volume enterprise application. Users report that a critical data-intensive reporting feature consistently fails with HTTP 500 errors. Investigation reveals a strong correlation between failures and the size of an internal queue used for report generation, traced to a reporting backend with high I/O wait times. After resolving the issue by resizing the backend's persistent disk (PD), you need to define an availability Service Level Indicator (SLI) for the report generation feature. How would you define it?

Exam-Like

Last updated: December 13, 2025 at 14:03

As the I/O wait times aggregated across all report generation backends

33.3%

As the proportion of report generation requests that result in a successful response

33.3%

As the application's report generation queue size compared to a known-good threshold