1© ITSM Academy unless otherwise stated© ITSM Academy unless otherwise stated
SRE Roundtable with 4 DevOps Institute Ambassadors
Helen Beal Niladri Choudhuri Donna Knapp Craig Pearson
@BealHelen @niladrimc @ITSM_Donna @craigpearson004
2© ITSM Academy unless otherwise stated 2© ITSM Academy unless otherwise stated
Agenda • DevOps and SRE
• SRE and ITIL
• SRE and Security
• Benefits of SRE
Part One
Part Two
Part Three
• Panel Discussion
• Your Questions!
3© ITSM Academy unless otherwise stated
POLL How familiar are you with SRE?
4© ITSM Academy unless otherwise stated 4© ITSM Academy unless otherwise stated
Origins and Comparisons
DevOps SRE
Started as a movement from DevOps Days Started at Google
2009 2003 - but the first book published 2016
Started as ‘agile system administration’ Started as a need to scale
Dev led Ops led
Key concept: CICD Key concept: Wisdom of Production
Focus on speed Focus on stability
Scales to the enterprise Designed for the enterprise
@BealHelen
5© ITSM Academy unless otherwise stated 5© ITSM Academy unless otherwise stated
● In general, an SRE team is responsible for availability, latency,
performance, efficiency, change management, monitoring,
emergency response, and capacity planning (Google)
● Numerous ITIL practices contribute to these activities
● SRE principles and approaches can be used to adapt ITIL practices
● SRE is particularly relevant to organizations looking to achieve
higher levels of velocity
SRE and ITIL
@ITSM_Donna
6© ITSM Academy unless otherwise stated
Leveraging SRE Principles and Approaches
• Service level management
✓ Define service level objectives (SLOs)
✓ Measure using service level indicators
(SLIs)
✓ Service level agreements (SLAs) include
the consequences of not meeting SLOs
• Change enablement
✓ Form an error budget (one minus the
availability target) and set
consequences
✓ Error budgets are the tool SRE uses to
balance service reliability with the pace
of innovation
✓ Ideally, spend the error budget taking
risks
• Incident management
✓ Leverage swarming; particularly for high-
impact incidents
✓ Draw on concepts from the Incident
Command System (ICS)
✓ The Incident Commander role concentrates
on the 3Cs (coordination, communication,
control)
• Problem management
✓ Postmortem culture: learning from failure
✓ Blameless postmortems are a tenet of SRE
culture
• Continual improvement
✓ Foster a culture of continuous
experimentation, learning and
improvement
7© ITSM Academy unless otherwise stated
POLL How do you use error budgets?
8© ITSM Academy unless otherwise stated 8© ITSM Academy unless otherwise stated
• As per Google, Security and Privacy are closely related concepts.
• In designing for reliability and security, you must consider different risks.
Primary reliability risks are non-malicious in nature like a bad software
update or a physical device failure. Security risks comes from adversaries
who are actively trying to exploit system vulnerabilities.
• Reliability and Security Tradeoff – Redundancy
• Reliability and Security Tradeoff – Incident Management – Special small
team with skills to handle security related incidents
• Security and Reliability is about Confidentiality, Integrity and Availability
SRE and Security
@niladrimc
9© ITSM Academy unless otherwise stated 9© ITSM Academy unless otherwise stated
• Intersection of Reliability and Security – Effects of Insiders
○ Design of Security and Reliability
○ Design for least privilege
○ Design for understandability
○ Design for resilience
○ Design for recovery
○ Mitigating Denial of Service (DOS) attacks
SRE and Security
@niladrimc
10© ITSM Academy unless otherwise stated 10© ITSM Academy unless otherwise stated
• Enhanced stability and reliability of services - SRE is about reliability
• Better understanding of how production services work - through observability
and shared “wisdom of production”
• A better balance between the investment in customer experience and
technical reliability - SLO’s are business objectives
• A greater appreciation of the operational impact of services in development
teams - shift left, SRE’s in teams, designing for operations
• Improvements in staff morale and retention - HOW? LET’S CONSIDER THE
INDIVIDUAL FIRST….
Benefits of SRE - for the Organization
@craigpearson004
11© ITSM Academy unless otherwise stated 11© ITSM Academy unless otherwise stated
• A better work balance with ring fenced time for improvement - 50% split,
operational v improvement work, a clear focus on toil reduction
• Less stressful on-call experiences and a reduction in overall call-out volumes -
automation of incident response, “chaos engineering”, blameless post mortems
• Broader skills-based opportunities that leverage the latest in automation - a
future-proof “toolbox”
• An improvement in workplace culture - Westrum model
• Opportunities for “shifting left” and helping to ensure development teams deliver
more reliable services - getting the wisdom of production into teams
• WHICH SHOULD LEAD TO - Improvements in morale and staff retention
Benefits of SRE - for the Individual
@craigpearson004
12© ITSM Academy unless otherwise stated
POLL What % of time do you spend on
toil?
13© ITSM Academy unless otherwise stated 13© ITSM Academy unless otherwise stated
Agenda
PANEL
DISCUSSION
14© ITSM Academy unless otherwise stated© ITSM Academy unless otherwise stated
Helen Beal Niladri Choudhuri Donna Knapp Craig Pearson
@BealHelen @niladrimc @ITSM_Donna @craigpearson004
15© ITSM Academy unless otherwise stated 15© ITSM Academy unless otherwise stated
Agenda
Q&A
16© ITSM Academy unless otherwise stated© ITSM Academy unless otherwise stated
Helen Beal Niladri Choudhuri Donna Knapp Craig Pearson
@BealHelen @niladrimc @ITSM_Donna @craigpearson004
THANK YOU!

SRE Roundtable with 4 DevOps Ambassadors

  • 1.
    1© ITSM Academyunless otherwise stated© ITSM Academy unless otherwise stated SRE Roundtable with 4 DevOps Institute Ambassadors Helen Beal Niladri Choudhuri Donna Knapp Craig Pearson @BealHelen @niladrimc @ITSM_Donna @craigpearson004
  • 2.
    2© ITSM Academyunless otherwise stated 2© ITSM Academy unless otherwise stated Agenda • DevOps and SRE • SRE and ITIL • SRE and Security • Benefits of SRE Part One Part Two Part Three • Panel Discussion • Your Questions!
  • 3.
    3© ITSM Academyunless otherwise stated POLL How familiar are you with SRE?
  • 4.
    4© ITSM Academyunless otherwise stated 4© ITSM Academy unless otherwise stated Origins and Comparisons DevOps SRE Started as a movement from DevOps Days Started at Google 2009 2003 - but the first book published 2016 Started as ‘agile system administration’ Started as a need to scale Dev led Ops led Key concept: CICD Key concept: Wisdom of Production Focus on speed Focus on stability Scales to the enterprise Designed for the enterprise @BealHelen
  • 5.
    5© ITSM Academyunless otherwise stated 5© ITSM Academy unless otherwise stated ● In general, an SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning (Google) ● Numerous ITIL practices contribute to these activities ● SRE principles and approaches can be used to adapt ITIL practices ● SRE is particularly relevant to organizations looking to achieve higher levels of velocity SRE and ITIL @ITSM_Donna
  • 6.
    6© ITSM Academyunless otherwise stated Leveraging SRE Principles and Approaches • Service level management ✓ Define service level objectives (SLOs) ✓ Measure using service level indicators (SLIs) ✓ Service level agreements (SLAs) include the consequences of not meeting SLOs • Change enablement ✓ Form an error budget (one minus the availability target) and set consequences ✓ Error budgets are the tool SRE uses to balance service reliability with the pace of innovation ✓ Ideally, spend the error budget taking risks • Incident management ✓ Leverage swarming; particularly for high- impact incidents ✓ Draw on concepts from the Incident Command System (ICS) ✓ The Incident Commander role concentrates on the 3Cs (coordination, communication, control) • Problem management ✓ Postmortem culture: learning from failure ✓ Blameless postmortems are a tenet of SRE culture • Continual improvement ✓ Foster a culture of continuous experimentation, learning and improvement
  • 7.
    7© ITSM Academyunless otherwise stated POLL How do you use error budgets?
  • 8.
    8© ITSM Academyunless otherwise stated 8© ITSM Academy unless otherwise stated • As per Google, Security and Privacy are closely related concepts. • In designing for reliability and security, you must consider different risks. Primary reliability risks are non-malicious in nature like a bad software update or a physical device failure. Security risks comes from adversaries who are actively trying to exploit system vulnerabilities. • Reliability and Security Tradeoff – Redundancy • Reliability and Security Tradeoff – Incident Management – Special small team with skills to handle security related incidents • Security and Reliability is about Confidentiality, Integrity and Availability SRE and Security @niladrimc
  • 9.
    9© ITSM Academyunless otherwise stated 9© ITSM Academy unless otherwise stated • Intersection of Reliability and Security – Effects of Insiders ○ Design of Security and Reliability ○ Design for least privilege ○ Design for understandability ○ Design for resilience ○ Design for recovery ○ Mitigating Denial of Service (DOS) attacks SRE and Security @niladrimc
  • 10.
    10© ITSM Academyunless otherwise stated 10© ITSM Academy unless otherwise stated • Enhanced stability and reliability of services - SRE is about reliability • Better understanding of how production services work - through observability and shared “wisdom of production” • A better balance between the investment in customer experience and technical reliability - SLO’s are business objectives • A greater appreciation of the operational impact of services in development teams - shift left, SRE’s in teams, designing for operations • Improvements in staff morale and retention - HOW? LET’S CONSIDER THE INDIVIDUAL FIRST…. Benefits of SRE - for the Organization @craigpearson004
  • 11.
    11© ITSM Academyunless otherwise stated 11© ITSM Academy unless otherwise stated • A better work balance with ring fenced time for improvement - 50% split, operational v improvement work, a clear focus on toil reduction • Less stressful on-call experiences and a reduction in overall call-out volumes - automation of incident response, “chaos engineering”, blameless post mortems • Broader skills-based opportunities that leverage the latest in automation - a future-proof “toolbox” • An improvement in workplace culture - Westrum model • Opportunities for “shifting left” and helping to ensure development teams deliver more reliable services - getting the wisdom of production into teams • WHICH SHOULD LEAD TO - Improvements in morale and staff retention Benefits of SRE - for the Individual @craigpearson004
  • 12.
    12© ITSM Academyunless otherwise stated POLL What % of time do you spend on toil?
  • 13.
    13© ITSM Academyunless otherwise stated 13© ITSM Academy unless otherwise stated Agenda PANEL DISCUSSION
  • 14.
    14© ITSM Academyunless otherwise stated© ITSM Academy unless otherwise stated Helen Beal Niladri Choudhuri Donna Knapp Craig Pearson @BealHelen @niladrimc @ITSM_Donna @craigpearson004
  • 15.
    15© ITSM Academyunless otherwise stated 15© ITSM Academy unless otherwise stated Agenda Q&A
  • 16.
    16© ITSM Academyunless otherwise stated© ITSM Academy unless otherwise stated Helen Beal Niladri Choudhuri Donna Knapp Craig Pearson @BealHelen @niladrimc @ITSM_Donna @craigpearson004 THANK YOU!