The document outlines best practices for establishing effective on-call teams including formalizing on-call schedules, ensuring team members have the proper equipment, access, and training. It emphasizes the importance of building an empathetic on-call culture through practices like shadow rotations, avoiding burnout, and establishing clear responsibilities and expectations for on-call staff.
Proprietary & Confidential
On-Call
Aformalized process and schedule for
responding to unplanned incidents, alerts,
and/or system, service or application
issues
Proprietary & Confidential
WhyOn-Call?
Bring Subject Matter Experts (SMEs) in at the
beginning
Reduce the chaos of responding to alerts and incidents
Minimize time to acknowledge and resolve
Minimize handoffs, context switching, and burnout
6.
Proprietary & Confidential
EveryBusiness is a Digital Business
Make payments
Shop online
Be entertained
Order food
Be connected
Get around
Do work
Buy anything
Stay healthy
Proprietary & Confidential
Benefitsof On-Call to Your Team
Know Exactly:
When to be available
Who to call
What you might be called for
Choose schedules that meet service needs
9.
Proprietary & Confidential
HandoffMeetings
A formal handoff to the new on-call responder to
ensure they have all the context they need for their
shift
Proprietary & Confidential
Accountsand Access
Prepare a checklist for your team
❏ Working local copy of repos
❏ Configured environments
❏ Current credentials for third-party services
❏ VPN access
❏ Passwords and permissions to environments
❏ Access to monitoring and dashboards
Proprietary & Confidential
Onboarding
CreateShadow Rotations:
New folks join in “listen only mode”
Allow for people to learn new ways of operating
Creates a low-stress environment
Builds confidence
Proprietary & Confidential
EscalatingBeyond Your Team
Complexity complicates
diagnoses
Don’t keep the wrong
people involved
Focus on folks who can fix
Keep other stakeholders
informed out of band
Never Hesitate to
Escalate
Proprietary & Confidential
HumaneOn-Call
Allow rescheduling
Pre-emptive backup notifications
Sleep is necessary
Watch for burnout
Sleep. Is. Necessary.
24.
Proprietary & Confidential
TeamParticipation
Equal responsibilities
Holiday coverage
Monitor hours on call, number of sleep-time alerts
Talk about good behaviors
Proprietary & Confidential
TakingStock of Your Alerts
Manage alert fatigue
Ensure all alerts are actionable
Complete and current docs
Manage external dependencies
Clear the noise
Disable junk alerts
Proprietary & Confidential
FlexibleModels
Experiment with shift length
Utilize follow-the-sun and sleep/wake when
possible
24x7, 24x5, 8x5 as appropriate for services
Partner with other teams
Proprietary & Confidential
PagerDutyResources
For step-by-step instructions for setting up your team in PagerDuty, see this On-Call
Rotations and Schedules resources page
How to Get Notified Before You Go On-Call in PagerDuty
Sign up for our e-book
Keep an eye on our events page https://www.pagerduty.com/events/ for meetups,
webinars, PagerDuty Connects, and other opportunities
For in-depth training check out PagerDuty University:
https://www.pagerduty.com/university/
Join the PagerDuty Community at https://forums.pagerduty.com
35.
Proprietary & Confidential
IndustryResources
Increment, a magazine published by Stripe published an issue about on-call as their very first issue
https://increment.com/on-call/
Alice Goldfuss’s open source on-call handbook: https://github.com/alicegoldfuss/oncall-handbook
New Relic shares some of their best practices for on-call, as well as their incident response workflows
https://blog.newrelic.com/engineering/on-call-and-incident-response-new-relic-best-practices/
In this classic session from the Velocity conference, Etsy’s team talks about how they worked to quantify their
on-call. Mean Time to Sleep
https://www.youtube.com/watch?v=FLqucVb_et0&feature=youtu.be&ab_channel=LaurieDenness
More resources at https://goingoncall.pagerduty.com/resources