DevopsHQ #2017
SLAs & SREs
SLA - Service Level Agreement
is a contract between a service provider (either internal or external)
and the end user that defines the level of service expected from the
service provider. SLAs are output-based in that their purpose is
specifically to define what the customer will receive.
SRE - Site Reliability Engineering
Google’s mastermind behind SRE, Ben Treynor, still hasn’t published
a single-sentence definition, but describes site reliability as “what
happens when a software engineer is tasked with what used to be
called operations.”
THINK!
THINK!
THINK!
What to take note?
- People are not generally evil, but busy
- Consider time
- Consider stress
- Consider benefits
- Consider rewards
- Help people be great on their job
Improvement
- Post-Mortems
- Communication
- Collaboration
- Ownership
- Standardization
- Policy
- ISMS (Optional)
>>> LOOP
MONITOR!
MONITOR!
MONITOR!
UP/RESPONSE
OPS METRICS
- CPU Utilization
- Memory Utilization
- Disk Utilization
- Process Monitoring
- Webserver Processes
- DB server Processes
- Custom Script Processes
- UPTIME
HOW DO WE KNOW IF EVERYTHING WORKS?
500/502/504
200/206/
404/403
302/304
To the rescue: CDN
HOW DO WE KNOW IF EVERYTHING WORKS?
HOW DO WE KNOW IF EVERYTHING WORKS?
500/502/504
200/206/
404/403
302/304
Working as SRE, everything on your metric/stats count.
SLA is hard to maintain, it takes a badass SRE (not just one but a team) to get
things rolling.
- Always be proactive
- Always improve
- Always be ready
@NeilUpbeta01 | upbeta01@gmail.com

DevopsHQ #2017