In Area 1 we defined the scope of our system by creating an end to end process that reaches from requirements all the way to running services. Now in Area 2 we are going to focus on giving everyone feedback on and visibility into that system so we can improve upon it.
What’s the goal of this area? We’ll it’s pretty obvious. The goal is to provide feedback and visibility
But why? to what end? Is it just data for the sake of curiosity? Like all of DevOps we have to look at it through the lens of “Why?”
The whole point of feedback and visibility is to align your organization’s improvement efforts. “ Align” -- that’s the most interesting word in that phrase. Let’s think about where DevOps problems come from. I’m going to assume that your organization is full of smart people with good intentions (if not you have bigger problems). If everyone is smart and wants the company to succeed, then why do DevOps problems exist? Because individuals and groups become misaligned to the point of becoming silos. Think about the classic examples... Dev ends up seeing their world one and takes dozens of daily actions according to that world view. Ops see the world another way and takes a whole different series of action during the day. Both are right from their perspective. Both are wrong from organization perspective. Misalignment ensues.
How do you align your organization... You must do two things. #1 is pretty straightforward and outside of the scope of this presentation #2 is shared situational awareness... (definition)
Allspaw isn’t going to like this photo... he’s a much more handsome devil in real life... but pay no attention to him... look at the what’s on the wall behind him. Those are one example of how you create situational awareness. Those screens are there radiate situational awareness.
The first step towards shared situational awareness is to give everyone the same visibility into the 4 key types of data that fuels alignment... Application Data, Infrastructure Data, Business Data, and People and Process Data. Let’s quickly look at how to do that.
Step #1.... Make all infrastructure data visible... this is your classic operations metrics... Network, Disk I/O, Memory, Utilization, etc... But don’t assume everyone is a hardcore sys admin... so provide the metrics in an application context. Standardize collection and analysis across all shared environments... if the first time a developer sees feedback is in production, don’t expect them to be able to make heads or tails of it... collect the same data and present the same feedback in all preproduction environments Focus on deviations... avoid burying people with data... use things like Statistical Process Control charts to point out deviations from the norm
You probably already have a lot of this data as well...Performance, faults, availability, logs, etc... Focus on making this a shared effort between dev and ops... dev defines, ops enables, everybody can view Focus on easy self-service... if adding a metric feels like a schema change then people will avoid it. If it’s as simple as adding a single line of code then people will do it. Get your org addicted to meaningful data... not just “all data”... teach everyone to keep the noise down and keep application and system output clean.
Business data... you probably also have a lot of this data... Sales, signups, churn, clickstream, etc... The problem usually that this data is sitting in silos without operational context. Focus on linking technical and process metrics with KPIs set by the business. (Amazon order rate... everything else is keyed off of that) Your goal is have everyone in the organization understand the direct links between their day to day activity and the goals set by their executives. For example, I’m a developer and my decisions effect how long a handoff or promotion of a release to a new environment takes... which negatively impacts cycle time... which negatively impacts the business goal of shorter time to market for feature requests.
While you like have lots of application, infrastructure, and business data... very few organizations have much data about the human activity that goes on inside their organization. If your goal is to improve your service delivery capabilities and solve your DevOps problems... it makes sense to have visibility into and metrics about those processes, right? Change activity, quality, cycle time, effectiveness, etc... capture it, store it, graph it. I don’t mean time tracking or individual productivity... I’m talking about the performance and effectiveness of the organization and it’s critical processes. Start with two straightforward things.... 1. Visualize flow across the entire lifecycle -- tools like Kanban or a delivery pipeline visualization are easy wins and provide powerful effects. It makes the problems obvious to all and you can swarm the troops to fix 2. Record change events and overlay them on every graph you have -- Change is the root of outages (at a minimum a change introduced the thing that cause the outage)... raise everyone’s awareness of what changed and when it changed.
Combine situational awareness with clear operating goals and you get a platform for driving continuous improvement. Self-organizing behavior... people know how to tell if they are doing the right thing.Consistent/predictable behavior... when aligned I know what actions to expect from my colleagues since we are working with the same operating rules while looking at the same data.You enable “Swappable parts”... New people or people new to a role know what is expected and their actions are guided by an understanding of what the current state is and which direction they are supposed to move it.
Buy big monitors and paint the walls with them.... it’s cheap and has psychological effect. Let’s look at what John has on the walls behind him... ... OK, you are now ready to kick your DevOps improvement program into high gear. On to Area 3....
What are we looking at in devops? Alignment... between dev and ops... but overall organizational alignment How do we get alignment... One hack is embedding ops knowledge into dev... There many examples that we can draw from lean... e.g., poka-yoke (error proofing), move the pain forward
In the next 10 minutes I am going to focus on a specific hack that has been used by quite a few companies. Embedding an Ops guy into a development organization... #I have joked that this is called the homeless mode.. #Simply put an organization puts an individual from the operations org to the dev org for some period of time. #Few days, more likely 3 months to a year.. some even longer or permanent.
#Empathy, Feel the pain in the tribe, influence and change the outcome through relationships #tribal language - names of config files, conventions, standards, directory names and structures. #getting Dev used to patterns of fault tolerance #Andon cord... stop the line early.. Bring the pain forward... Notes: #Simply put an organization puts an individual from the operations org to the dev org for some period of time. #Some examples might be a few days, more likely 3 months to a year.. some even longer or permanent. There’s a great story where a guy went into dev from ops and three years later when he came back to ops they thought he was an embedded dev to ops guy because of turnover. flattening the knowledge delivery chain, value chain as one system #Systems thinking end to end... flattening the knowledge delivery chain, value chain as one system.. #Empathy, Feel the pain in the tribe, and do something about it, pair and be involved in things, influence and change the outcome through relationships .. Pragmatic solutions: “not logging enough? Logging too much. Each can cause work. get rid of the you guys syndrone.. stop the blame game... It’s hard to yell at those “idiots” when one lives amongst you. # tribal language - names of config files, conventions, standards, directory names and structures. Jargon: common terminology Dev in London and Ops in NYC... You guys have all sorts of crazy names for things like you call soccer football. #KATA Educating Dev so they can think like Ops: .. through repetition it becomes part of your routine.... example like like feature flags, metrics collection within application, measurement of resource usage, etc.. Consider that app will always have finite user resources: refactoring user registration: shouldn’t take more CPU than it currently does; #getting Dev used to patterns of fault tolerance, not evenly distributed activity amongst development teams).. #We want everyone to think in a new way.. no fear of failure... #Help in Prioritize backlog to manage technical debt (& non-functional requirements)Allocate 20% of Dev cycles to non-functional requirements (Product Management) #Andon cord... stop the line early.. Bring the pain forward... How many times does something have to wait till get into production to fid it wasn't operationally sound. Tell Sales no... , never make an offer like that, because it wasn’t nearly as easy as we thought, and here’s a better proposal, negotiate other terms
#Two large teams (dev and Ops). Take a senior ops guy and put him/her in dev for 6 months to a year. Dotted line authority. (Silverpop Dan Nemic) #When building a new project team (i.e., dev) put an ops guy on the team as a cross functional team member . A lot of startups do this because they have no choice; however National Instruments (Ernest Muller) did this on a new “devops” project. #Mercenaries - Build a team of resources that can be used as a pool of resources when needed .. a few days, 3 months a year... These teams typically have a strong group of cross functional experts. DRW trading (Chris Read) #Specialized Teams - growing up the maturity level an organization might start having specialty pools of mercenaries (e.g. DBA’s. Security experts< ...). #I had to throw in NoOps in here as a pattern in that they run one big Dev org that “they say does not have operations”. When in fact they do have operations they are just embedded in the dev org. Netflix runs like this and it works very well for them. However, there are some issues .. culture and cost. Culture this works at Netflix because they really hire well and the cost of this type of operation means a lot of commitment to automation
Example... Subject Matter ExpertsExample ...Automation SpecialistExample...Big Picture #In the mercenaries example you need plumbers. A team might be hurting in a specific area and either the team realizes this itself or someone external to the team suggest it. they need expert in Linux, network, database: typically driven or a backlog or lower skillset deficit in dev (e.g., performance issues due to database, but that’s all we know; or Linux discovery work) Sometimes exec may say, “you’re dying in networking; get an ops person who knows what they’re doing”ops does a review, and recommends that they get an SRE #One of the ways to tackle technical debt is to put an automation specialist into the dev organization.focus on automation for the business unit (e.g., we’ve lost control of configurations, deployment, and it’s all a mess, we have no idea what’s in production or automation in deployment: Not create work for Dev; work at the edges, minimal impact, create standardization models without impact After it’s all done, then explain to Dev what you’ve done responsibility for automation code... Slack capacity issues ... Jumpstart ... prioritize backlog slack time 20% o manage technical debt.. pairing... touching keyboard... make a dev work with the ops embedding knowledge where we need it... #Big picture person: talk to the management team, paint vision, big picture , liase between Ops, Dev and business management (Management boundary issue)
A local bloke... John is a poster child for area2 Chris is a poster child for area 4... Jez Humble calls him his mentor... He wears the coolest shirts
Explain chaos monkey ...compliance monkey...
#Don’t embed social misfits... The guys who just like to get things done on there own w/o anybody else help are usually not good candidates for embedding. Embedding is a social experiment to change organizational behavior. #Understand the motivation of an individual that wants to be embedded. Don’t look for hero motivation look for sense of accomplishment motivation. This job can be somewhat thankless from an external viewpoint. In american football lineman... He get his motivation for winning the game and knows he was a big part of it. Motivation is not from being a hero, but from accomplishment (doesn’t need personal credit) vs. hero castle and worship # It’s important to maintain the previous relationships with Ops. Attending standups. external night out get togethers. You don’t want to loose the tribal connections and thought process. Plus it can also mend bridges.. in that “he ain’t that bad once you get to know him”... if they have to be reminded a year later than the experiment did’t work... At the end of the day you are trying to break down silos... Socialize, go out to the bars with them, they remain part of the original (virtual team)
2012 Velocity London: DevOps Patterns Distilled
DevOps Patterns Distilled Patrick Debois (@patrickdebois)Damon Edwards (@damonedwards) Gene Kim (@realgenekim) John Willis (@botchagalupe) Velocity Europe 2012 1
Every Company Is An IT Company… 95% of all capital projects have an IT component… 50% of all capital spending is technology-related Where we need to be… IT is always in the way (again…) We are here…
The DevOps Cookbook (Coming H1 2013) John Allspaw (@allspaw) Patrick Debois (@patrickdebois)Damon Edwards (@damonedwards) Gene Kim (@realgenekim) Mike Orzen (@mikeorzen_leanit) John Willis (@botchagalupe) 8
The First Way:Systems Thinking (Left To Right) Understand the flow of work Always seek to increase flow Never unconsciously pass defects downstream Never allow local optimization to cause global degradation Achieve profound understanding of the system
The Second Way:Amplify Feedback Loops (Right to Left) Understand and respond to the needs of all customers, internal and external Shorten and amplify all feedback loops: stop the line when necessary Create quality at the source Create and embed knowledge where we need it
The Third Way:Culture Of Continual Experimentation & Learning
“Devops Areas”a way to ‘codify’ problems/solutions
Area 1: Extend delivery to production “think Jez Humble” Area 1DEV OPS
Area 2: Extend operations feedback to project think “John Allspaw” DEV OPS Area2
Area 3: Embed Dev into Ops think “Adrian Cockcroft” Area 3DEV OPS
Area 4: Embed Ops into Dev think “Chris Read”DEV OPS Area 4
Area 3: Embed Project knowledge into Operations Area 1: Extend delivery to productionDEV OPS Area 2: Extend operations feedback to project Area 4: Embed Operations knowledge into Project
The Third Way:Culture Of Continual Experimentation & Learning Foster a culture that rewards: Experimentation (taking risks) and learning from failure Repetition is the prerequisite to mastery Why? You need a culture that keeps pushing into the danger zone And have the habits that enable you to survive in the danger zone
Area 1:Extend Continuous Deliver Into Production Patrick Debois @patrickdebois 21
GOALS Big Goal Refocus on Business View End-To-End DEV Practical Goal QAOPS Get conversation started Business Bring the pain forward 22 22
Step #1 - Re-Establish Trust Co-location Teams Face to Face meetings IRC, Chat, Group feeling Align Management Goals HR Policies 23
Anti-Pattern #6 - Organizational Inertia Group experiment: > 80% people invest: Investors 80$ , other 0$ < 80% people invest: Investors -10$ , other 0$ Convergance to invest or not invest depends on initial group decision Nash Equilibrium - Game Theory 36
Outcomes Business Goal(s) Shared Process Trust People Robust Technology 37
Area 2:Provide Feedback and Visibility Damon Edwards @damonedwards 38
GOAL Provide feedback and visibility ...but why? 40
GOAL Provide feedback and visibilityto align your organization’s improvement efforts 41
HOW DO YOU ALIGN YOUR ORGANIZATION? 1. Clear goals and operating instructions 2. Shared situational awareness 42
HOW DO YOU CREATE SHARED SITUATIONALAWARENESS? 43
44 People & Process DataApplication Situational Infrastructure Data Awareness Data Business Data FOUR TYPES OF DATA YOU NEED
Step 1: MAKE ALL INFRASTRUCTURE DATA VISIBLE DaBusi • Network, Disk I/O, Memory, Utilization, etc... • Present data in context of the application • Standardize and extend to allational Data environmentsreness Application • Create awareness of deviations from norm Dat& Pro Peo 45
STEP 2: MAKE ALL APPLICATION DATA VISIBLE ata iness• Performance, faults, availability, logs, etc...• Dev takes ownership of instrumenting their applications, but anyone can view or Data eness Infrastructure extend tional• Enable self-service metric creation (“one line of code”)• Increase signal, decrease noise ata ocess ople 46
STEP 3: BREAK BUSINESS DATA OUT OF IT’S SILO Data Awareness Data Infrastructure Situational Application Data Process • Sales, signups, churn, clickstream, etc... Key Business Metric • Make goals explicit (KPIs, one metric that matters) Secondary Business Metric • Link all other metrics to business metrics Technical/Process Metric • Empower improvement by showing cause and effect My activity 47
STEP 4: COLLECT AND VISUALIZE ORGANIZATION &PROCESS DATA • Change activity, quality, cycle time,effectiveness, etc... • Focus on effectiveness, not efficiency • Visualize the flow across the entire lifecycle • Capture change data and enable overlays on any graph Data Business 48
49 Organization & Process DataApplication Situational Infrastructure Data Awareness Data Business Data USE TO DRIVE CONTINUOUS IMPROVEMENT
Goals Shorten and amplify feedback loops Create knowledge and capabilities where we need it Ensure that we’re optimizing for the entire system 53
“We found that when we woke up developersat 2am, defects got fixed faster than ever” Patrick LightbodyFounder/CEO, BrowserMob 54
IT Operations As The Developers’ Best Friend Tom Limoncelli Patrick Debois Adrian Cockcroft 55
Require That Dev Initially Maintain Their Own ServiceSource: Tom Limoncelli, Google (Usenix 2012) 56
Test Whether Developers Qualify For IT Operations Resources Types/frequency of pager alerts Maturity of monitoring System architecture review Release process Defect counts and severity Production hygieneSource: Tom Limoncelli, Google (Usenix 2012) 57
Return Fragile Services Back To DevSource: Tom Limoncelli, Google (Usenix 2012) 58
Integrate Dev Into IT Operations Integrate Dev into IT Operations escalation processes Have Dev cross-train IT Operations staff Have Dev improve the environment 59
Why• Seeing End to End• Sharing the Pain• Operations Andon Cord• Create a Common Language• Educate Dev to Think Like Ops• Flattening Knowledge Chain• Create Patterns of Fault Tolerance• Manage Technical Debt 64
Engagement Models for Embedding• One Off• Cross Functional Teams• Mercenaries• Specialized Teams• NoOps 65
Design for Operations• Improve Application• Config files• Instrumentation• logging• Improve Environment• Configuration Management• Immune system (BDD) 66
@cread• DevOps Facilitator at DRW• London, United Kingdom• www.chris-read.net 67
Institutionalize IT Operations Knowledge• Building Reusable IT Operations• Embedded Operations• Design• Architecture• Controls• Monitoring• Deployment 68
Break Things Early And Often “Do painful things more frequently, so you can make it less painful… We don’t get pushback from Dev, because they know it makes rollouts smoother.” -- Adrian Cockcroft, Architect, Netflix 69
You Don’t Choose Chaos Monkey…Chaos Monkey Chooses You 70
When IT Fails: A Business Novel andThe DevOps Cookbook Coming January 15, 2013 and Q1 2013 “The lessons in When IT Fails might just save your business if IT fails for you. Every IT executive should share this book with their business peers.” -James Turnbull, VP Operations, Puppet Labs and author of “Pro Puppet” “The greatest IT management book of our generation.” –Branden Williams, CTO Marketing, RSA “This book will have a profound effect on IT, just as The Goal did for manufacturing.’ - Jez Humble, co-author of the Jolt award-winning book Continuous Delivery, and Principal at ThoughtWorks Studios.
Our Mission: Positively Impact The LivesOf One Million IT Workers By 2017 For these slides, the “Top 10 Things You Need To Know About DevOps,” Rugged DevOps resources, and updates on the books: Or text “[email_address] 75271” to +1 (858) 598-3980 Or signup at: http://www.instantcustomer.com/go/75271 Or email email@example.com