2. WHO I AM
• Name: Jon Hildebrand
• Contact: @snoopj123 on Twitter
• Location: Kansas City, Missouri, USA
• Day Job: Formerly a Cloud Architect for a US-
Midwest based managed service provider
• Free Time: Trying to find a baseball game to
watch, raising a four-year-old, blogging, and being
a student of DevOps
• Accolades include: VMware vExpert (Five years),
Cisco Champion (Four years), NetApp United (Two
years), recurring Tech Field Day delegate
3. WHAT WENT SO WRONG WITH
IT AND CULTURE?
• Ever hear of Frederick Nietzsche?
• “The Birth of Tragedy from the Spirit of Music”
• The dichotomy Apollonian and Dionysian thought with regards
to ancient philosophers
• Apollonian thought being rational and reasoned
• Dionysian thought being more irrational and chaotic
• Over time, in IT, we’ve applied multiple amounts of rationality
and reason to our processes
• Unfortunately, we forget about the chaotic nature of the
system we try to control
4. THE GOLDEN TRIANGLE
• The constant struggle for IT
organizations world wide is effectively
managing this triangle
• Pay too much attention to one of the
areas and the other two will suffer
• Too often, many problematic to
failing IT organizations are too
focused the technology angle
5. THE TRUTH OF THE MATTER,
BROUGHT TO YOU BY OPENSTACK SUMMIT 2016
<insert tech name>
6. CAMS
• The four main pillars of DevOps
• Culture
• Automation
• Measurement
• Sharing
• Has been known to include L to
represent Lean software
development practices (CALMS)
7. THE PHOENIX PROJECT AND THE
THREE WAYS
• The First Way – Systems Thinking
• The Second Way – Amplify Feedback Loops
• The Third Way – Create a culture of continuous
learning and experimentation
8. CULTURE IN DEVOPS!
• Often overlooked by those trying to make the initial transition
• Focus more on the business goals rather than on technology fiefdoms
• Requires many behavior changes, down to an individual level
• Without changes to behaviors, a culture may never be properly established, dooming the DevOps
movement to failure before even getting to the technology
“You can’t directly change culture. But you can change behavior, and behavior becomes culture.”
- Lloyd Taylor, VP Infrastructure, Ngmoco
9. ORGANIZATIONAL CONTROLS
Organizations tend to put methods of controls over various aspects of what we call our day-to-day work.
Most organizations will fall under three categories:
• Pathological
• Bureaucratic
• Generative
10. YOUR CURRENT SITUATION
Pathological Organizations Bureaucratic Organizations Generative Organizations
Information Handling Information is hidden Information may be ignored Information is actively sought
Messenger Handling “Shoot the messenger” Messengers are tolerated Messengers are trained
Responsibilities Shirking of responsibilities Compartmentalized
responsibilities
Responsibilities are shared
Team Bridging Highly discouraged Allowed, but generally
discouraged
Rewarded
Failure Handling Failure is covered up Treated with either mercy or
punishment
Causes inquiry
New Idea Handling New ideas are crushed New ideas create problems New ideas welcomed
Reflect on your own career…
I’ll bet that many have experienced the first two types of organizations…
Source: Dr. Ron Westrum, “A typology of organisation culture”, BMJ Quality and Safety 13
11. TO ERR IS HUMAN
• We all make mistakes...
• However, most of the time, our
mistakes in our day-to-day work
can likely be traced back not to
the last action that lead to failure,
but to the system that lead to the
action
“Human error is not our cause for troubles; instead, human error is a
consequence of the design of the tools that we gave them.” - Dr. Sidney Dekker
12. THE BLAME
CULTURE EPIDEMIC
• One of the worst tools an
organization can use when handling
failure is resorting to “Name, Blame,
and Shame”
• Human nature is to immediately
blame, however, blame is the least
productive reaction in a crisis
• Unfortunately, chronic use of this as a
practice usually leads to Fear
13. ”
““Our work is almost always performed within a complex system and how management
chooses to react to failures and accidents leads to a culture of fear, which then makes it
unlikely that problems and failure signals are ever reported. The result is that problems
remain hidden until a catastrophe occurs.” – Excerpt from “The DevOps Handbook”
“When response to incidents and accidents are seen as unjust, it can impede safety
investigations, promoting fear rather than mindfulness in people who do safety-critical
work, making organizations more bureaucratic rather than more careful, and
cultivating professional secrecy, evasion, and self-protection.” – Excerpt from “Just
Culture: Restoring Trust and Accountability in Your Organization”, by Dr. Sidney Dekker
14. WHY FAILURE IS A GOOD THING
“By removing blame, you remove fear; by removing fear, you enable honesty; and honesty enables prevention.” –
Bethany Macri, Engineer @ Etsy
• DevOps thrives on experimenting; experimentation means failures WILL occur
• Used to make better processes
• This leads to the concept of organization learning
• Organization learning is a process of creating, retaining, and transferring information within an organization
• Doing so will lead to many personal benefits that greatly benefit the organization
• You become more self-diagnosing and self-improving
• You become more skilled at detecting and solving problems
15. DEVOPS PRACTICES TO CREATE A LEARNING-BASED CULTURE
• Blameless post-mortems
• Honest question, how many of you are actually doing post-mortems after events?
• The idea here is to get to the inquiry about the failure, rather than cast the ”name, blame, shame” net upon
who did the work in question
• Also great opportunity to gather useful information for the entire organization and published for all to see
• These releases tend to help in reestablishing trust and accountability to customers impacted by said event
• Controlled Introduction of Failures
• Creates plenty of opportunity for practice/mastery (a major core DevOps principle)
“Coping, fire fighting, and making do were gradually replaced through the organization by a dynamic of
identifying opportunities for process and product improvement. As those opportunities were identified and
the problems were investigated, the pockets of ignorance that they reflected were converted into nuggets
of knowledge.” – Dr. Steven Spear
16. AMAZON S3 “OUTAGE” – A POST-MORTEM EXAMPLE
• Interesting usage of terms
• Amazon “team member” only mentioned a single time
• The term “process” was used four times
• Half of post-mortem devoted to upcoming changes to
operational procedures
• A quarter of the document was devoted to describing
the growing pains of S3 and a high-level resiliency
design of S3 and how it’s recovery took longer than
expected
• The last line of this document states: “We will do
everything we can to learn from this event and use it
to improve our availability even further.”
17. BRITISH AIRWAYS – PROVIDING AN EXAMPLE OF WHAT NOT TO DO
• On May 27th, 2017, British Airways suffered a major systems
outage resulting in many grounded flights during a holiday
weekend
• The outage is estimated to have cost BA nearly $112 million
(US$)
• BA seems intent on placing the blame for the outage on a
single engineer
• That single engineer, while authorized to be in the position
they were, is being reported as “not authorized to do what he
did.”
• There has been criticism of BA’s handling of the blame when
the question of how a single engineer could cause this much
of a problem with a single event
• While there is a partial amount of blame to be placed on the
engineer, BA is not at faultless here
• Backup systems failed miserable. This is a design and testing
failure of a major system.
18. DEATH TO THE HERO
• Everyone loves the hero...
• You can easily identify the ”hero” of the
organization by finding one of the most burned
out persons
• They generally work extremely long hours, are
usually single-handedly troubleshooting issues,
and fire fighting to keep services up and
running.
• Generally doing this solo
• No shared burden (DevOps core principle is
about shared responsibilities)
• Promotes a ton of personal gain at the expense
of team effectiveness “A culture that rewards firefighting breeds arsonists.”
19. THE TEAM
• It can’t be stressed enough that DevOps
culture also promotes many intra-team and
inter-team interactions
• Collaboration and affinity are big keys to
highly successful DevOps cultures
• Affinity – Process of building inter-team
relationships, navigating differing goals and
metrics while keeping in mind shared
organizational goals and fostering empathy
and learning between different groups of
people
• Do not forget, that you still have to be able
to work with your own team first before
working with others“That’s a team, gentlemen, and either, we heal,
now, as a team, or we will die as individuals.” –
Tony D’Amato, Any Given Sunday
20. KEY TAKEAWAYS
• Before you start with DevOps and technology, realize that an entire organization needs to
change it’s culture, all the way down to how team members interact
• Without a culture change, most DevOps initiatives are doomed to immediate failure
• While culture change needs to be an organizational imperative, the individuals are ultimately
responsible for the behavioral changes need for cultural success
• Also, just because DevOps instills this sort of culture doesn’t mean non-development IT
organizations can’t either. Many of these points are rather universal across most practices
21. RECOMMENDED
PUBLICATIONS
• Highly recommend “The Goal” as a lead
in into “The Phoenix Project”. You’ll
recognize many points from “The Goal”
in “The Phoenix Project”
• Also, goes without saying since I’m
following Gene Kim, that ”The DevOps
Handbook” is another great read about
things leading up to the technical side of
DevOps
• Lastly, I really enjoyed “Effective DevOps”
as it had plenty of focus on the
managerial side of DevOps teams. You
can find plenty of more information
about topics not discussed or deeper
discussion on silos, effective
communication, and developing team
trust