DevOps Enterprise Summit 2015 presentation with Kevina Finn-Braun, Director of SRE Management at Salesforce: this is the story of my months-long journey with Kevina and her team to identify the specifics of what made reliability retrospectives difficult to have, why actionable takeaways were often lacking, and how the feedback loops within the company’s operations organization weren’t serving Salesforce’s needs.
We then ran a series of experiments together, putting the SRE team on a road to improving their ability to respond, react, remediate, and reincorporate learnings from failure into the organization.
My talk with Jim Kimball on the tyranny of the SLA; in it, we:
- Deconstruct the purpose of the service level agreement
- Discuss pitfalls of aspects of common SLA clauses, including how current SLAs inhibit the development of resilient systems and the cultivation of a DevOps culture
- Explore other potential SLA models that could foster healthier organizational behaviors and dynamics, and ultimately result in better technical outcomes and therefore business outcomes.
Webinar - Data driven postmortems - Jason Yee Codemotion
The DevOps movement has not only influenced the tools we use in modern development and operations engineering, but also how we work. As part of how we work, DevOps has changed how we respond when systems inevitably stop working or don't work as expected. This presentation will provide methods and techniques for gathering information and effectively using that information to avoid and mitigate failure in the future. I'll cover best practices for gathering systems-related data, including monitoring and logging. This presentation will also cover practices for gathering and recording people-related data; including methods we can adopt from police, accident investigators, and other safety management professions to learn the most from incidents. After discussing how to gather data, I'll discuss how we can use the data to formulate actionable response plans and how to adjust existing organizational practices to avoid repeating failure.
I plan to keep the technical portions of this talk at a novice level so that it's accessible to both developers/engineers and those in non-technical roles who will be involved in incident response.
Presented at Monitorama 2017, this talk discusses how to make humans more effective "monitors" in the complex sociotechnical systems in which they work.
There are terms in our domain, terms that are fundamental to our work, terms like quality, bug, and even testing itself, that many testers would struggle to define. I’d say it’s an open secret within testing, but would it surprise our colleagues?
From CEWT #7, https://cewtblog.blogspot.com/search/label/CEWT%237
Version 2. Discusses the perception that software projects live in the simple/complicated domain as outdated and that agile recognises they are complicated/complex problem. Also discusses that the adoption of agile in previous 13 years has been treated as a simple/complicated problem and that Kanban helps us manage it as a complex problem.
My talk with Jim Kimball on the tyranny of the SLA; in it, we:
- Deconstruct the purpose of the service level agreement
- Discuss pitfalls of aspects of common SLA clauses, including how current SLAs inhibit the development of resilient systems and the cultivation of a DevOps culture
- Explore other potential SLA models that could foster healthier organizational behaviors and dynamics, and ultimately result in better technical outcomes and therefore business outcomes.
Webinar - Data driven postmortems - Jason Yee Codemotion
The DevOps movement has not only influenced the tools we use in modern development and operations engineering, but also how we work. As part of how we work, DevOps has changed how we respond when systems inevitably stop working or don't work as expected. This presentation will provide methods and techniques for gathering information and effectively using that information to avoid and mitigate failure in the future. I'll cover best practices for gathering systems-related data, including monitoring and logging. This presentation will also cover practices for gathering and recording people-related data; including methods we can adopt from police, accident investigators, and other safety management professions to learn the most from incidents. After discussing how to gather data, I'll discuss how we can use the data to formulate actionable response plans and how to adjust existing organizational practices to avoid repeating failure.
I plan to keep the technical portions of this talk at a novice level so that it's accessible to both developers/engineers and those in non-technical roles who will be involved in incident response.
Presented at Monitorama 2017, this talk discusses how to make humans more effective "monitors" in the complex sociotechnical systems in which they work.
There are terms in our domain, terms that are fundamental to our work, terms like quality, bug, and even testing itself, that many testers would struggle to define. I’d say it’s an open secret within testing, but would it surprise our colleagues?
From CEWT #7, https://cewtblog.blogspot.com/search/label/CEWT%237
Version 2. Discusses the perception that software projects live in the simple/complicated domain as outdated and that agile recognises they are complicated/complex problem. Also discusses that the adoption of agile in previous 13 years has been treated as a simple/complicated problem and that Kanban helps us manage it as a complex problem.
DOES SFO 2016 - Kevina Finn-Braun & J. Paul Reed - Beyond the Retrospective: ...Gene Kim
At DOES15, we presented the work we'd done at Salesforce to take their SRE teams to the "blameless cloud." We worked with various roles in the SRE teams so they could start asking the right questions about failure, and through the postmortem and retrospective process, begin to make lasting changes in _how_ Salesforce worked with and remediated identified failures.
But DevOps espouses less siloed thinking and more shared responsibilities, so we found postmortems within the SRE organization weren't enough. As Salesforce was moving toward a model of "service ownership," teams along
the entire software delivery value stream needed to start to understand their roadblocks to remediation and what aspects of the complex system they worked in were impeding their ability to "own their service."
We'll discuss the second phase of our work in helping these operations _and product_ teams gain a deeper understanding of service ownership, and why
just "DevOps'ing it up" wasn't quite enough on its own to help. plus we'll introduce an expanded model from last year's talk that incorporates human factors and complexity theory. These additions helped prime the teams to more effectively grapple with the challenges facing them on the road to true service ownership.
Switching horses midstream - From Waterfall to AgileDoc Norton
You’ve been working for several months on a key software initiative for the company and leadership has decided they want it faster than projected, so the team has been told they’re getting “the agile” installed next week.
“Great.”, you think, “Right in the middle of the project. Nothing like changing horses in midstream. One way or another, this will go swimmingly.”
Sarcasm and puns aside, you’ve got a point. It isn’t easy to switch methodologies in the middle of a project. Doc shares some stories from his own experiences helping teams make this change and provides a few pointers that can help you do the same.
While this talk is focused on testing, it involves the whole team, as agile methods usually do.
Slides from a workshop at the 2019 Service Design in Government conference, Edinburgh, March 2019.
The workshop challenged participants to consider:
what happens after you've done some user research for your service? Decisions made, do you move on and forget it? Or do you preserve that research for re-use and future team members? The session was an opportunity for user researchers in government to describe, compare and improve ResearchOps activities.
The Unfortunate Triumph of Process over PurposeTechWell
As a test manager, James Christie experienced two divergent views of a single project. The official version claimed that planning and documentation were excellent, with problems discovered during test execution being managed effectively. In fact, the project had no useful plans, so testers improvised test execution. Creating standardized documentation took priority over preparing for the specific problems testers would actually face during testing. The required documentation standards didn't assist testing; they actually hindered by distracting from relevant, detailed preparation. It was a triumph of process over purpose. James shows that this is a problem that testing shares with other complex disciplines. Devotion to processes and standards inhibits creativity and innovation. They provide a comfort blanket and a smokescreen of “professionalism” where following the ritual becomes more important than accomplishing the goals. Unless we address this issue, organizations will question whether testers really add value. Testers must respond by challenging unhelpful processes and the culture that encourages them. Purpose must come before process!
Data Modelling is an important tool in the toolbox of a developer. By building and communicating a shared understanding of the domain they're working with, their applications and APIs are more useable and maintainable. However, as you scale up your technical teams, how do you keep these benefits whilst avoiding time-consuming meetings every time something new comes along? This talk reminds ourselves of key data modelling technique and how our use of Kafka changes and informs them. It then examines how these patterns change as more teams join your organisation and how Kafka comes into its own in this world.
Taking a Selfie - Just Try to Resist! Doing Forensics the DevSecOps WaySonatype
Brandon Sherman, Twilio
You can’t physically touch your computing environment anymore, so how do you capture a forensic image? In this talk, learn how to take a selfie of an EC2 instance. Selfie is a tool that can jump in with an incident responder type role, trigger snapshots of a suspect instance, and copy those snapshots to a safe place. Of course, this can be automated. Did you even have to ask?
Continuing to utilize the Stream Approach to Prosper Instance Planning to Derive a Product in a Digital Portal Pipe with a competent engineering methodology. We are using an emulation of six layer glass to do this. Here we study complex products such as Appliance Ware and Pipe Methodology. in V2 we look into the ability to trace glass through the process. In V3 we study the Build and the relational value to Industry
From Content Strategy to Drupal Site Building - Connecting the dotsRonald Ashri
Content strategy is, undoubtedly, a hot topic these days. A lot is being said that spans the range from concerns regarding the ability to display content on any device to the ability to drive engagement and increase traffic through better content creation and social media strategies. In this presentation we will connect the dots between these issues and practical Drupal site-building concerns with tools that are readily available now.
We will show, through specific examples and references to available modules, how different approaches to content strategy can be practically implemented on Drupal sites. The aim is to equip Drupal site-builders with a handy toolkit that will allow them to both implement a content strategy for their sites as well as better exchange information with content strategists.
The examples will include:
- Different approaches to building content types so as to empower content creators to create a range of different structures.
- Best practices in using vocabularies (fixed, open, user-generated, moderated, etc) or where alternative categorization methods may be relevant.
We will also discuss:
- Editorial calendars and scheduling.
- The true benefit of workflows (and how, sometimes, they can be a disadvantage).
- Analytics and how the ability to measure the effects of any strategy is as important as defining the strategy itself.
Attendees will go away with practical examples and techniques that they can apply to their sites as well as a better understanding of what content strategy really is and how they can use it to improve their sites.
The examples are a result of our own experiences in helping both clients develop their content strategy as well as applying it on italymagazine.com, an in-house product of ours. We grew italymagazine.com to a relevant online digital brand with a strong community by expressing our content strategy ideas through the tools that Drupal 7 made available to us. The resulting ~250% increase in traffic over 3 months is a testament to both the value of a content strategy as well as the power of Drupal to allow you to flexibly and iteratively support it.
From Content Strategy to Drupal Site Building - Connecting the DotsRonald Ashri
The actual presentation is available on YouTube here:
https://www.youtube.com/watch?v=agcQsQfCFow
Content strategy is, undoubtedly, a hot topic these days. A lot is being said that spans the range from concerns regarding the ability to display content on any device to the ability to drive engagement and increase traffic through better content creation and social media strategies. In this presentation we will connect the dots between these issues and practical Drupal site-building concerns with tools that are readily available now.
We will show, through specific examples and references to available modules, how different approaches to content strategy can be practically implemented on Drupal sites. The aim is to equip Drupal site-builders with a handy toolkit that will allow them to both implement a content strategy for their sites as well as better exchange information with content strategists.
The examples will include:
- Different approaches to building content types so as to empower content creators to create a range of different structures.
- Best practices in using vocabularies (fixed, open, user-generated, moderated, etc) or where alternative categorization methods may be relevant.
- Building menus and navigation.
We will also discuss:
- Editorial calendars and scheduling.
- The true benefit of workflows (and how, sometimes, they can be a disadvantage).
- Analytics and how the ability to measure the effects of any strategy is as important as defining the strategy itself.
Attendees will go away with practical examples and techniques that they can apply to their sites as well as a better understanding of what content strategy really is and how they can use it to improve their sites.
The examples are a result of our own experiences in helping both clients develop their content strategy as well as applying it on italymagazine.com, an in-house product of ours. We grew italymagazine.com to a relevant online digital brand with a strong community by expressing our content strategy ideas through the tools that Drupal 7 made available to us. The resulting ~250% increase in traffic over 3 months is a testament to both the value of a content strategy as well as the power of Drupal to allow you to flexibly and iteratively support it.
Tools, Culture, and Aesthetics: The Art of DevOpsJ. Paul Reed
My DevOps Days Tel Aviv keynote: In this talk, we will examine why these now school-aged ideals remain so difficult to implement, explore why DevOps is often described as "the movement that refuses to identify itself," and what your team can do to confront the dichotomies they are likely to face as they transform how they, their colleagues, and their company go about their daily work.
More Related Content
Similar to The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce
DOES SFO 2016 - Kevina Finn-Braun & J. Paul Reed - Beyond the Retrospective: ...Gene Kim
At DOES15, we presented the work we'd done at Salesforce to take their SRE teams to the "blameless cloud." We worked with various roles in the SRE teams so they could start asking the right questions about failure, and through the postmortem and retrospective process, begin to make lasting changes in _how_ Salesforce worked with and remediated identified failures.
But DevOps espouses less siloed thinking and more shared responsibilities, so we found postmortems within the SRE organization weren't enough. As Salesforce was moving toward a model of "service ownership," teams along
the entire software delivery value stream needed to start to understand their roadblocks to remediation and what aspects of the complex system they worked in were impeding their ability to "own their service."
We'll discuss the second phase of our work in helping these operations _and product_ teams gain a deeper understanding of service ownership, and why
just "DevOps'ing it up" wasn't quite enough on its own to help. plus we'll introduce an expanded model from last year's talk that incorporates human factors and complexity theory. These additions helped prime the teams to more effectively grapple with the challenges facing them on the road to true service ownership.
Switching horses midstream - From Waterfall to AgileDoc Norton
You’ve been working for several months on a key software initiative for the company and leadership has decided they want it faster than projected, so the team has been told they’re getting “the agile” installed next week.
“Great.”, you think, “Right in the middle of the project. Nothing like changing horses in midstream. One way or another, this will go swimmingly.”
Sarcasm and puns aside, you’ve got a point. It isn’t easy to switch methodologies in the middle of a project. Doc shares some stories from his own experiences helping teams make this change and provides a few pointers that can help you do the same.
While this talk is focused on testing, it involves the whole team, as agile methods usually do.
Slides from a workshop at the 2019 Service Design in Government conference, Edinburgh, March 2019.
The workshop challenged participants to consider:
what happens after you've done some user research for your service? Decisions made, do you move on and forget it? Or do you preserve that research for re-use and future team members? The session was an opportunity for user researchers in government to describe, compare and improve ResearchOps activities.
The Unfortunate Triumph of Process over PurposeTechWell
As a test manager, James Christie experienced two divergent views of a single project. The official version claimed that planning and documentation were excellent, with problems discovered during test execution being managed effectively. In fact, the project had no useful plans, so testers improvised test execution. Creating standardized documentation took priority over preparing for the specific problems testers would actually face during testing. The required documentation standards didn't assist testing; they actually hindered by distracting from relevant, detailed preparation. It was a triumph of process over purpose. James shows that this is a problem that testing shares with other complex disciplines. Devotion to processes and standards inhibits creativity and innovation. They provide a comfort blanket and a smokescreen of “professionalism” where following the ritual becomes more important than accomplishing the goals. Unless we address this issue, organizations will question whether testers really add value. Testers must respond by challenging unhelpful processes and the culture that encourages them. Purpose must come before process!
Data Modelling is an important tool in the toolbox of a developer. By building and communicating a shared understanding of the domain they're working with, their applications and APIs are more useable and maintainable. However, as you scale up your technical teams, how do you keep these benefits whilst avoiding time-consuming meetings every time something new comes along? This talk reminds ourselves of key data modelling technique and how our use of Kafka changes and informs them. It then examines how these patterns change as more teams join your organisation and how Kafka comes into its own in this world.
Taking a Selfie - Just Try to Resist! Doing Forensics the DevSecOps WaySonatype
Brandon Sherman, Twilio
You can’t physically touch your computing environment anymore, so how do you capture a forensic image? In this talk, learn how to take a selfie of an EC2 instance. Selfie is a tool that can jump in with an incident responder type role, trigger snapshots of a suspect instance, and copy those snapshots to a safe place. Of course, this can be automated. Did you even have to ask?
Continuing to utilize the Stream Approach to Prosper Instance Planning to Derive a Product in a Digital Portal Pipe with a competent engineering methodology. We are using an emulation of six layer glass to do this. Here we study complex products such as Appliance Ware and Pipe Methodology. in V2 we look into the ability to trace glass through the process. In V3 we study the Build and the relational value to Industry
From Content Strategy to Drupal Site Building - Connecting the dotsRonald Ashri
Content strategy is, undoubtedly, a hot topic these days. A lot is being said that spans the range from concerns regarding the ability to display content on any device to the ability to drive engagement and increase traffic through better content creation and social media strategies. In this presentation we will connect the dots between these issues and practical Drupal site-building concerns with tools that are readily available now.
We will show, through specific examples and references to available modules, how different approaches to content strategy can be practically implemented on Drupal sites. The aim is to equip Drupal site-builders with a handy toolkit that will allow them to both implement a content strategy for their sites as well as better exchange information with content strategists.
The examples will include:
- Different approaches to building content types so as to empower content creators to create a range of different structures.
- Best practices in using vocabularies (fixed, open, user-generated, moderated, etc) or where alternative categorization methods may be relevant.
We will also discuss:
- Editorial calendars and scheduling.
- The true benefit of workflows (and how, sometimes, they can be a disadvantage).
- Analytics and how the ability to measure the effects of any strategy is as important as defining the strategy itself.
Attendees will go away with practical examples and techniques that they can apply to their sites as well as a better understanding of what content strategy really is and how they can use it to improve their sites.
The examples are a result of our own experiences in helping both clients develop their content strategy as well as applying it on italymagazine.com, an in-house product of ours. We grew italymagazine.com to a relevant online digital brand with a strong community by expressing our content strategy ideas through the tools that Drupal 7 made available to us. The resulting ~250% increase in traffic over 3 months is a testament to both the value of a content strategy as well as the power of Drupal to allow you to flexibly and iteratively support it.
From Content Strategy to Drupal Site Building - Connecting the DotsRonald Ashri
The actual presentation is available on YouTube here:
https://www.youtube.com/watch?v=agcQsQfCFow
Content strategy is, undoubtedly, a hot topic these days. A lot is being said that spans the range from concerns regarding the ability to display content on any device to the ability to drive engagement and increase traffic through better content creation and social media strategies. In this presentation we will connect the dots between these issues and practical Drupal site-building concerns with tools that are readily available now.
We will show, through specific examples and references to available modules, how different approaches to content strategy can be practically implemented on Drupal sites. The aim is to equip Drupal site-builders with a handy toolkit that will allow them to both implement a content strategy for their sites as well as better exchange information with content strategists.
The examples will include:
- Different approaches to building content types so as to empower content creators to create a range of different structures.
- Best practices in using vocabularies (fixed, open, user-generated, moderated, etc) or where alternative categorization methods may be relevant.
- Building menus and navigation.
We will also discuss:
- Editorial calendars and scheduling.
- The true benefit of workflows (and how, sometimes, they can be a disadvantage).
- Analytics and how the ability to measure the effects of any strategy is as important as defining the strategy itself.
Attendees will go away with practical examples and techniques that they can apply to their sites as well as a better understanding of what content strategy really is and how they can use it to improve their sites.
The examples are a result of our own experiences in helping both clients develop their content strategy as well as applying it on italymagazine.com, an in-house product of ours. We grew italymagazine.com to a relevant online digital brand with a strong community by expressing our content strategy ideas through the tools that Drupal 7 made available to us. The resulting ~250% increase in traffic over 3 months is a testament to both the value of a content strategy as well as the power of Drupal to allow you to flexibly and iteratively support it.
Tools, Culture, and Aesthetics: The Art of DevOpsJ. Paul Reed
My DevOps Days Tel Aviv keynote: In this talk, we will examine why these now school-aged ideals remain so difficult to implement, explore why DevOps is often described as "the movement that refuses to identify itself," and what your team can do to confront the dichotomies they are likely to face as they transform how they, their colleagues, and their company go about their daily work.
Has “DevOps” jumped the shark?
Some say yes; others say 2014 will be the year DevOps dons its Fonz-esque leather jacket. Whichever you believe, the marketing feeding frenzy has begun and the dilution of the “DevOps” concept to include everything (and simultaneously mean nothing) is palpable.
This talk deconstructs the meta-elements of DevOps that made it resonate so strongly with so many and allowed those familiar DevOps poster children—Netflix, Etsy, and others—to deploy the methodology with such success in their businesses. We’ll go beyond DevOps’ classical CAMS (culture, automation, metrics, and sharing) definition to discover what exactly what made DevOps relevant, and what about it is so timeless and foundational that it will make whatever-follows-DevOps relevant, too.
Is Your Team Instrument Rated? (Or Deploying 125,000 Times a Day)J. Paul Reed
J. Paul Reed's DevOpsDays Silicon Valley 2013 presentation "Is Your Team Instrument Rated?"
The presentation discusses the operational model similarities between the National Airspace System and a well-run software development shop that employs DevOps methodologies.
# Internet Security: Safeguarding Your Digital World
In the contemporary digital age, the internet is a cornerstone of our daily lives. It connects us to vast amounts of information, provides platforms for communication, enables commerce, and offers endless entertainment. However, with these conveniences come significant security challenges. Internet security is essential to protect our digital identities, sensitive data, and overall online experience. This comprehensive guide explores the multifaceted world of internet security, providing insights into its importance, common threats, and effective strategies to safeguard your digital world.
## Understanding Internet Security
Internet security encompasses the measures and protocols used to protect information, devices, and networks from unauthorized access, attacks, and damage. It involves a wide range of practices designed to safeguard data confidentiality, integrity, and availability. Effective internet security is crucial for individuals, businesses, and governments alike, as cyber threats continue to evolve in complexity and scale.
### Key Components of Internet Security
1. **Confidentiality**: Ensuring that information is accessible only to those authorized to access it.
2. **Integrity**: Protecting information from being altered or tampered with by unauthorized parties.
3. **Availability**: Ensuring that authorized users have reliable access to information and resources when needed.
## Common Internet Security Threats
Cyber threats are numerous and constantly evolving. Understanding these threats is the first step in protecting against them. Some of the most common internet security threats include:
### Malware
Malware, or malicious software, is designed to harm, exploit, or otherwise compromise a device, network, or service. Common types of malware include:
- **Viruses**: Programs that attach themselves to legitimate software and replicate, spreading to other programs and files.
- **Worms**: Standalone malware that replicates itself to spread to other computers.
- **Trojan Horses**: Malicious software disguised as legitimate software.
- **Ransomware**: Malware that encrypts a user's files and demands a ransom for the decryption key.
- **Spyware**: Software that secretly monitors and collects user information.
### Phishing
Phishing is a social engineering attack that aims to steal sensitive information such as usernames, passwords, and credit card details. Attackers often masquerade as trusted entities in email or other communication channels, tricking victims into providing their information.
### Man-in-the-Middle (MitM) Attacks
MitM attacks occur when an attacker intercepts and potentially alters communication between two parties without their knowledge. This can lead to the unauthorized acquisition of sensitive information.
### Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) Attacks
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
The Blameless Cloud: Bringing Actionable Retrospectives to Salesforce
1. K E V I N A F I N N - B R A U N
S A L E S F O R C E
J . PA U L R E E D
R E L E A S E E N G I N E E R I N G A P P R O A C H E S
D E V O P S E N T E R P R I S E S U M M I T, 2 0 1 5
T H E B L A M E L E S S C L O U D :
B R I N G I N G A C T I O N A B L E R E T R O S P E C T I V E S
T O S A L E S F O R C E
2. K E V I N A
F I N N - B R A U N
• Director of Site Reliability Service
Management at Salesforce
• Business Continuity at Yahoo
• Geeks out on Group Dynamics and
Behavior
• @kfinnbraun on
• Prepping for the zombie
apocalypse
@kfinnbraun @jpaulreed#DOES15
3. J . PA U L
R E E D
• @jpaulreed on
• Host of The Ship Show,
@shipshowpodcast on
• Principal Consultant, Release
Engineering Approaches
• Spend my days talking to
organizations about
“The DevOps™”
@kfinnbraun @jpaulreed#DOES15
4. “ S I T E R E L I A B I L I T Y ”
AT S A L E S F O R C E
• Primary operational team
supporting availability
• Acceptance and validation
activities
• Develop and implement
operational improvements for
SFDC
• “Game days”
@kfinnbraun @jpaulreed#DOES15
5. S E R V I C E R E L I A B I L I T Y H U R D L E S AT S F D C
• Inconsistent application of process, leading to inconsistent
information collection
• Incident handling/remediation crossing silo boundaries
• Confusion over service ownership, due to restructured responsibilities
• Disjointed, “heavyweight” meetings
• Postmortems centered around “The Old View” of human error
@kfinnbraun @jpaulreed#DOES15
6. L A N G U A G E O F
T H E “ O L D V I E W ”
• “5 whys”
• “Root cause” analysis
• “Why didn’t you[r team]…”
• “You[r team] should have…”
• “Best practices”
@kfinnbraun @jpaulreed#DOES15
8. T H E T I M E L I N E
• October 2014: First Meeting
• January 2015: “Blow up” HA Forum
• April 2015: Status Check, including
assessment shared with senior
leaders
• May 2015: Service ownership roles
shift
@kfinnbraun @jpaulreed#DOES15
9. T H E T I M E L I N E
• October 2014: First Meeting
• January 2015: “Blow up” HA Forum
• April 2015: Status Check, including
assessment shared with senior
leaders
• May 2015: Service ownership roles
shift
• July 2015: Initial Workshop on “The
New View”
• August 2015: Identified first group for
coaching
• August 2015 — today: Continued
focus and deep-dive on WSRR
• August 2015 — today: Weekly
sessions with the initial group
@kfinnbraun @jpaulreed#DOES15
10. Incident, Event,
Bug
Initial
Analysis
RC
Known?
Facilitator opens
investigations
and schedules
post mortem
meeting
Request RCA/
Failure Analysis
N
RC
Identified?
Identify corrective
actions and
implementation
plans; Assign
actions to scrum
teams
Y
RCM
Needed?
RCM
Process
Unable to
ascertain root
cause; update
record with “KE
Status”
Engage scrum
teams as required.
HA Forum
Y
N
Corrective
Actions
complete?
Weekly meetings
to follow up with
scrum master on
progress
Review
@HA?
Y
Y
Additional work
items from HA are
assigned.
Update record
and set status to
“resolved”
Y
N
END
END
HA? Incident Guidelines..
Severity 0,1: YES
Severity 2 : Maybe (instance & incident length?)
Functional Regression: Maybe
Incorrect/Incomplete Release: YES
Deployment Delayed or Rolled Back: Maybe
Impact to Customer/Production
or ability to release?
Tier 3 support
communicate
RCM to
customer(s)
N
R O O T C A U S E
A N A LY S I S W O R K F L O W
• Designed & implemented two
years ago
• Anchored the process around
the weekly “HA Forum”
• Intended to apply to all
incidents…
• In practice, focused on high
profile incidents
@kfinnbraun @jpaulreed#DOES15
11. Incident, Event,
Bug
Initial
Analysis
RC
Known?
from incident resolution.
Facilitator opens
investigations
and schedules
post mortem
meeting
Request RCA/
Failure Analysis
N
RC
Identified?
Identify corrective
actions and
implementation
plans; Assign
actions to scrum
teams
Y
RCM
Needed?
RCM
Process
Unable to
ascertain root
cause; update
record with “KE
Status”
Engage scrum
teams as required.
HA Forum
Y
N
Corrective
Actions
complete?
Weekly meetings
to follow up with
scrum master on
progress
Review
@HA?
Y
Y
Additional work
items from HA are
assigned.
Update record
and set status to
“resolved”
Y
N
END
END
HA? Incident Guidelines..
Severity 0,1: YES
Severity 2 : Maybe (instance & incident length?)
Functional Regression: Maybe
Incorrect/Incomplete Release: YES
Deployment Delayed or Rolled Back: Maybe
Impact to Customer/Production
or ability to release?
Tier 3 support
communicate
RCM to
customer(s)
N
@kfinnbraun @jpaulreed#DOES15
12. Incident, Event,
Bug
Initial
Analysis
RC
Known?
from incident resolution.
Facilitator opens
investigations
and schedules
post mortem
meeting
Request RCA/
Failure Analysis
N
RC
Identified?
Identify corrective
actions and
implementation
plans; Assign
actions to scrum
teams
Y
RCM
Needed?
RCM
Process
Unable to
ascertain root
cause; update
record with “KE
Status”
Engage scrum
teams as required.
HA Forum
Y
N
Corrective
Actions
complete?
Weekly meetings
to follow up with
scrum master on
progress
Review
@HA?
Y
Y
Additional work
items from HA are
assigned.
Update record
and set status to
“resolved”
Y
N
END
END
HA? Incident Guidelines..
Severity 0,1: YES
Severity 2 : Maybe (instance & incident length?)
Functional Regression: Maybe
Incorrect/Incomplete Release: YES
Deployment Delayed or Rolled Back: Maybe
Impact to Customer/Production
or ability to release?
Tier 3 support
communicate
RCM to
customer(s)
N
R O O T C A U S E
A N A LY S I S W O R K F L O W
I N R E A L I T Y
• Silo transition boundaries evident
in the workflow
• Some had little/no contact, via
the process, with other teams
required to perform their job
• Sampling of incident reports
uncovered consistent
inconsistencies
• The “Bermuda Blob”
@kfinnbraun @jpaulreed#DOES15
13. G E T T I N G A F E E L F O R T H E W E AT H E R
@kfinnbraun @jpaulreed#DOES15
15. H E A D F I R S T I N T O T H E S T O R M
@kfinnbraun @jpaulreed#DOES15
16. L A N G U A G E :
M AT T E R S
• “HA Forum” ➡ “WSRR”
• “WAR” (What is it good for?)
• Postmortem versus Retrospective
• Problem Team versus Solution
Team
• Root Cause versus Proximate
Cause
@kfinnbraun @jpaulreed#DOES15
17. B E H AV I O R :
M AT T E R S
• Intra-team behavior
• Inter-team behavior
• This is not “#NAFB”
• “People in complex systems create
safety. … The occasional human
contribution to failure occurs
because complex systems need an
overwhelming human contribution
for safety.” — Sydney Dekker
@kfinnbraun @jpaulreed#DOES15
18. S T R U C T U R E : M AT T E R S
@kfinnbraun @jpaulreed#DOES15
19. S T R U C T U R E : M AT T E R S
@kfinnbraun @jpaulreed#DOES15
20. “ B L A M E L E S S ”
“ P O S T M O R T E M S ” ?
• Brené Brown, research
sociologist, on vulnerability
• “Blame is a way to discharge
pain and discomfort”
• Postmortem has a heavy
connotation
• “Awesome postmortems?”
Really?!
@kfinnbraun @jpaulreed#DOES15
22. LanguageBehaviors
Novice Competent Proficient ExpertBeginner
“Incidents are bad;
my job is on the line”
“I’m getting sent to the
principal’s office because
of this outage”
Completes
the
post-incident
“paperwork”
No formal retrospective/
hallway retrospectives @kfinnbraun - #DOES15 - @jpaulreed
23. LanguageBehaviors
Novice Competent Proficient ExpertBeginner
“Incidents are bad;
my job is on the line”
“I’m getting sent to the
principal’s office because
of this outage”
“Let’s fix this as
fast as possible”
“What’s the correct fix to
avoid this specific issue
in the future?”
Completes
the
post-incident
“paperwork”
No formal retrospective/
hallway retrospectives
Some
information
(inconsistently)
recorded
Jump to a
focus on why
@kfinnbraun - #DOES15 - @jpaulreed
24. LanguageBehaviors
Novice Competent Proficient ExpertBeginner
“Incidents are bad;
my job is on the line”
“I’m getting sent to the
principal’s office because
of this outage”
“Let’s fix this as
fast as possible”
“What’s the correct fix to
avoid this specific issue
in the future?”
“Let’s review the
timeline/incident
report to answer that”
“We need to find the root
cause of this incident”
Completes
the
post-incident
“paperwork”
No formal retrospective/
hallway retrospectives
Some
information
(inconsistently)
recorded
Jump to a
focus on why
Follows the prescribed
format for retrospectives
Have and incorporate
complete dataset for the incident
into the retrospective
@kfinnbraun - #DOES15 - @jpaulreed
25. LanguageBehaviors
Novice Competent Proficient ExpertBeginner
“Incidents are bad;
my job is on the line”
“I’m getting sent to the
principal’s office because
of this outage”
“Let’s fix this as
fast as possible”
“What’s the correct fix to
avoid this specific issue
in the future?”
“Let’s review the
timeline/incident
report to answer that”
“We need to find the root
cause of this incident”
“Now that we’ve established
what happened,
how did it happen?”
“How did these
multiple factors
influence our
complex system?
Completes
the
post-incident
“paperwork”
No formal retrospective/
hallway retrospectives
Some
information
(inconsistently)
recorded
Jump to a
focus on why
Follows the prescribed
format for retrospectives
Have and incorporate
complete dataset for the incident
into the retrospective
Identifies
inherent bias
in self
and others
Perspectives solicited from all involved
team members/functional groups
@kfinnbraun - #DOES15 - @jpaulreed
26. LanguageBehaviors
Novice Competent Proficient ExpertBeginner
“Incidents are bad;
my job is on the line”
“I’m getting sent to the
principal’s office because
of this outage”
“Let’s fix this as
fast as possible”
“What’s the correct fix to
avoid this specific issue
in the future?”
“Let’s review the
timeline/incident
report to answer that”
“We need to find the root
cause of this incident”
“Now that we’ve established
what happened,
how did it happen?”
“How did these
multiple factors
influence our
complex system?
“How does our team/system
contribute to our successes?”
“What can we
incorporate from
this incident to
better respond
next time?”
Completes
the
post-incident
“paperwork”
No formal retrospective/
hallway retrospectives
Some
information
(inconsistently)
recorded
Jump to a
focus on why
Follows the prescribed
format for retrospectives
Have and incorporate
complete dataset for the incident
into the retrospective
Identifies
inherent bias
in self
and others
Perspectives solicited from all involved
team members/functional groups
Able to facilitate
retrospectives by
healthily helping
others address
tendency to blame/
personal & systemic bias
Retrospective outcomes
are fed back into the
system and prioritized
@kfinnbraun - #DOES15 - @jpaulreed
27. R E T R O S P E C T I V E S FA C I L I TAT E T H E
S E R V I C E ( A N D D E V E L O P M E N T ! )
I M P R O V E M E N T P R O C E S S
@kfinnbraun @jpaulreed#DOES15
28. B E I N G “ T O O B U S Y ” T O L E A R N
O R I M P R O V E M E A N S Y O U A R E I N
A D O W N WA R D S P I R A L ,
B Y D E F I N I T I O N
@kfinnbraun @jpaulreed#DOES15
29. I T ’ S N O T A B O U T T H E O U T C O M E .
I T ’ S A B O U T T H E R E S P O N S E .
@kfinnbraun @jpaulreed#DOES15
30. W H Y + H O W
I S M O R E I M P O R TA N T T H A N
W H AT
@kfinnbraun @jpaulreed#DOES15
31. Y O U A R E N E V E R D O N E .
@kfinnbraun @jpaulreed#DOES15
32. Y O U . A R E . N E V E R . D O N E .
@kfinnbraun @jpaulreed#DOES15
33. O U R F O R E C A S T
F O R T H E F U T U R E
• Evolving the concept of Service
Ownership
• Salesforce-specific
Retrospective Guides
• Global “live-site” coaching
• Refocus on getting the
business what it wants
@kfinnbraun @jpaulreed#DOES15
34. AV E N U E S F O R C O L L A B O R AT I O N
• How does the described Dreyfus model apply in
other organizations?
• Would love to hear stories from other enterprises
about their retrospective process, who does
them, and where they live within the organization
@kfinnbraun @jpaulreed#DOES15
36. P H O T O C R E D I T S
• Slide 1: https://en.wikipedia.org/wiki/File:Golden_Fog,_San_Francisco.jpg
• Slide 4: Courtesy Kevina Finn-Braun/Salesforce
• Slide 6: https://www.flickr.com/photos/hannaneh/6464986121
• Slide 7: https://www.youtube.com/watch?v=_DEToXsgrPc#t=1h5m50s
• Slide 13: http://kathmajp.weebly.com/all-movie-reviews/movie-review-twister
• Slide 14: http://thevane.gawker.com/heres-everything-they-got-wrong-and-right-in-the-
movi-1609968202
• Slide 15: https://www.flickr.com/photos/ravedelay/17761863929
@kfinnbraun @jpaulreed#DOES15
37. P H O T O C R E D I T S
• Slide 16: Screenshot of aviationweather.gov
• Slide 17: https://www.flickr.com/photos/ravedelay/17534032771/
• Slide 18: https://www.youtube.com/watch?v=8veT5QspylE#t=15m30s
• Slide 19: https://www.flickr.com/photos/jkirkhart35/4984385396
• Slide 20: https://www.youtube.com/watch?v=iCvmsMzlF7o
• Slide 33: https://commons.wikimedia.org/wiki/File:Rainbow_background.jpg
• Slide 35: https://en.wikipedia.org/wiki/File:Clouds_spilling_over_San_Francisco.jpg
@kfinnbraun @jpaulreed#DOES15