2. Average or Percentile?
● 100 requests: 99 take 1s and 1 takes
100s
● Average = total time / total requests
= 199 / 100 = ~2s
● Median (or p50) = 1s
● p75, p90, p95, p99 = 1s
● p100 = 100s
3. ● Service Level Indicators
○ p95 of homepage < 300ms in last 5 minutes
○ < 1% of total requests are 5xx status code in last 5 minutes
● Service Level Objectives
○ p95 of home < 300ms every 5 minute period for last month
○ < 1% of total requests are 5xx status code every 5 minute period for last month
● Service Level Agreements
○ p95 of home < 350ms every 5 minute period for last month or I give you $$$
○ < 1% of total requests are 5xx status code every 5 minute period for last month or FULL
REFUND
SLIs, SLOs, SLAs
6. ● Allow 1% failure of home page request response code != 5xx over last month
● Allow 1% failure of p95 home page latency < 300ms every 5 minute period
over last month
Error Budgets = 1 - SLO
Engineering
Product
Mo’
features!
But the
Debt!