A DevOps Discussion


Published on

Discussion on the reasoning behind and approach to eliminating the gap between Development and Operations

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Talk about this “DevOps movement” and see if there is interest in going further.This is something that has been on my mind a lot lately, and has shown up in a number of blogs and podcasts. The actual movement has been growing for a number of years.Polaris is organizing this ALM room for the user group and I feel this is an area of ALM that is underserved.Is there interest in this?
  • Who are you?Where do you work?What do you do?How do you work?What is your development style? (Agile, traditional, etc.)
  • This is a visualization I found from a Microsoft whitepaper a few years back. What is ALM? It isn’t just a buzzword or synonym for SDLC.The lifecycle starts from the first idea and ends when the plug is pulled, data archived, and code mothballed.What parts of the lifecycle do you plan for? Do you plan for release? End of life? When? Who is involved?Has anyone here ever been involved in an app end-of-life? I’ve only been involved in replacements.What about the deployment? When do you plan for that? Who is involved?The type and scope of an application will determine who the players are and how they contribute. Tonight, we’re going to be generally talking about websites, but this is applicable in one form or another to many different application types. Generally speaking, there are three aspects to ALM : Governance, Development, OperationsIn order of appearancePeople tend to be assigned in this fashion.SME/PM=GovernanceDev=devSysadmin=opsWhat do your groups look like? Does this line up with your company? That little dotted line is what gets us in trouble.
  • This is a more recent Microsoft graphic on ALM, And I think the differences from the previous slide really show the evolution we’re seeing in the industry.For one thing, it is a cycle, not a line.
  • Code is written. Time to get this out into the wild.
  • Does your release look like this?
  • Fire in the hole?
  • What do your releases look like? Who is involved? What do you track?How often do you release? How often COULD you release?What are the bottlenecks?Change controlhttp://www.flickr.com/photos/69214385@N04/9243391443/
  • Business requirement to release 10+ times per day.How would your current release process work with 10/dayhttp://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • Part of our problem with Operations is we treat it as a transition “from” this “to” that, but a release shouldn’t be a one way street.How do you know if it is working? Is anyone using it? What errors are happening?We’ll talk mostly about the output tonight, but the return trip is important too.
  • Traditional error notification system.
  • So we have this Wall of Confusion…On one side of the wall are the developers.On the other side are the operations folks.In the middle is, very little at all.In many companies they might as well be on different planets. Different departments, different locations, different management, etc.Ops people rarely have a heads up that changes are coming.Dev people rarely have any clue how production is doing unless it is on fire.The Wall is the product of conflicting motivations, incompatible toolsets, and different processes.Stable vs. change. Ops (rightfully) sees change as synonym to outage.Ops tools vs.Dev tools. Integrations? Environments?Dev process for release in test, QA, etc. Ops process for production releaseAll of this manifests as A lack of communicationbotched releases (rollbacks), finger pointingGeneralunhappinessThese manifestations result in
  • Software will always have bugs. So long as humans are writing it. We find bugs by testing.The later in thedev cycle a bug is found, the more expensive it is to fix. Already released. Impacting customers. Harder to reproduce.How quickly can you turn around a fix that is found in production?How much of your process goes out the window when a prod error is found? Does that help or hurt?How many devs have edited files directly on the production servers.
  • Releases are hard, thus infrequent.I saw a presentation where they said we have “Release PTSD”.We don’t like doing things that are painful, so we put it off as long as we can. Our releases become larger with the backlogged work, which increases the risk of the release, which makes it more painful, so we put if off.
  • Step back and think about this from the perspective of Governance. The code is written, tested, user approved, and ready to go…. And then we trip over deployment. That’s like the cyclist who celebrated prematurely and crashed a few feet before the finish line.
  • Is anyone using what you’ve released? Are they using it the way you expected?What is informing the backlog of work for the next release?
  • So how do we fix it? The transitions between Development and OperationsWhat can we do to better transition from one to the other?How can we get data back from ops to inform our dev? What are the pain points? What is never used?
  • “DevOps” is just a portmanteau of “Development” and “Operations”. What does it mean?An Agile solution to the “Wall of Confusion”Agile doesn’t believe in walls. Everyone works on the same team towards the goal.
  • When Agile first came out, almost everyone thought it was crazy. Now it is normal.Why?What did it get us?Agile brought the stakeholder into the dev process. Eliminated the wall between business and devs. Business reps work with devs through the process instead of firing a set of requirements over the wall and waiting 6-18 months to see the results.Smaller, frequent releasable increments so business could see/assess progress and re-prioritize, if needed.So what would that look like if applied it to release?
  • Wall is the result of ITnot practicing what we preached to the business. We told the business they needed to get more involved in the development, while we stayed away from operations.https://commons.wikimedia.org/wiki/File:Agile-vs-iterative-flow.jpg
  • But if we are afraid of releases, how do we get past this? History has taught us we can’t be confident our changes won’t result in outages. So how do we build confidence in our release infrastructure?Like ALM, DevOps has three aspects. Metrics, coordination, and automation.
  • Measurement is how we build confidence in anything. Outages suck. Failures suck. Nobody likes failures. But are they really a bad as you remember? Our memories aren’t great with this stuff. Not all incidents are change/release related, but we will link them together in our heads.How many releases resulted in some kind of incident? How bad was that incident? How do you know?Track change frequency/sizeTrack incident frequency/size/severity/root cause/ Time-to-Detect/Time-To-Resolve)Change : incident ratioTrack over time : Mean TTD/Mean MTR, Change success/failure ratesSHARE THESE METRICS. Communicate them. Make them easy to access.Gathering these metrics will cross disciplines/areas, which brings us to the next point, which is that we need to coordinate.
  • Respect and recognize one another as meaningful team members. Not enemies. Make use of one another’s expertise.Provide meaningful feedback about pain points.Devs tell ops about what the upcoming changes will entail. What will change, what are the risks, what are the symptoms it has gone wrong? What are the contingency plans?Plan for failure. If you don’t plan to fail (or believe you can avoid failure), you aren’t planning your recovery.Firedrills, chaosmonkey, escalationplans.
  • Automating release so it is consistent, testable, and quick. Automate it you control when releases happen, instead of conforming to when they can happen.One click build, one click deploy (with logging)Enables small, frequent changes.
  • DevOps doesn’t necessarily mean 10+ releases/day and continuous deployment. It gives you that option, but it may not be the right choice. When Agile first came out, many people approached it as “Great, now we don’t have to write anything down”. DevOps can give that impression too. If you automatically release to production after every check-in, you will have bugs. Maybe that works for your organization, maybe it won’t. Flickr had bugs, but they were able to fix them quickly. Banks probably ought not go that route.Might be scripts to spin up additional virtual machines, apply configuration, publish changes across farms, etc.
  • Do you see the problem with the older diagram now? Operations isn’t involved until just barely before deployment. And in many organizations, there isn’t any overlap at all.
  • InfrastructureWill it run on your infrastructure? Did you ask before you built the latest hawtness?Does the architecture meet the company models for deployment, security, storage, etc.?Non-functional requirements.Bringing non-functional requirements about releases, prod environment, etc. into the dev process early, rather than after the fact.
  • Feature flags = private betas, soft/dark launches – Feature flags are like source control branches built into code.Metrics/reporting baked into the product and designed from the outset, rather than as an afterthought.Source control/branching structure. (i.e. “always ship trunk”)Shared source control, everyone knows where to look. All that automation will likely result in scripts which are code, which should be versioned.
  • OutputInfrastructure automation - Chef/Puppet/Cobbler – spinning up virtual machinesrelease management/process tracking – InRelease/BuildMasterBuild – TeamCity/TFS/CruiseControlInputAnalytics (e.g. Google)Debug/Diagnosis – ELMAH, ETW, Usage/errors/performance - PreEmptive Analytics, New Relic
  • Thanks to Polaris, for making this possible and letting me take time to do these.Thanks to my ladies, who make it all worthwhile.
  • Did this resonate at all? Is this interesting? What more would you like to hear about? What are you interested in?
  • A DevOps Discussion

    1. 1. A DevOps Discussion We wrote some code… now what?
    2. 2. http://www.flickr.com/photos/buddawiggi/5987710858 Introductions
    3. 3. ALM Visualized
    4. 4. Application Lifecycle Management
    5. 5. http://www.flickr.com/photos/22077905@N00/3455858227/ Transition to Operations
    6. 6. http://www.flickr.com/photos/amagill/129804585/ Release?
    7. 7. http://www.flickr.com/photos/dvids/3716920088/ Release?
    8. 8. http://www.flickr.com/photos/69214385@N04/9243391443/ Release
    9. 9. Flickr
    10. 10. http://www.flickr.com/photos/90475107@N00/6097925485/ Feedback
    11. 11. http://www.flickr.com/photos/36989019@N08/4349003896 “The website is down.”
    12. 12. http://www.flickr.com/photos/59263064@N00/4561665366 The “Wall of Confusion”
    13. 13. http://www.flickr.com/photos/31031835@N08/8061697361 Bugs are expensive
    14. 14. http://www.flickr.com/photos/42729630@N06/4356879478/ Infrequent releases
    15. 15. http://www.flickr.com/photos/gabaus/5770042200/ Disappointed Governance
    16. 16. Developing blind http://www.flickr.com/photos/joshwept/5357377072/
    17. 17. http://commons.wikimedia.org/wiki/File:Trebuchet_Castelnaud.jpg Removing the Wall
    18. 18. http://www.flickr.com/photos/bohman/207181171/ Enter “DevOps”
    19. 19. http://www.flickr.com/photos/usfwsmtnprairie/8594464975/ What was the point of Agile?
    20. 20. What can Agile do for the Wall of Confusion
    21. 21. http://www.flickr.com/photos/86979666@N00/7623744452/ How?
    22. 22. http://www.flickr.com/photos/aussiegall/286709039/ Metrics
    23. 23. Coordination http://www.flickr.com/photos/23212428@N00/4302079406/
    24. 24. http://www.flickr.com/photos/10506540@N07/3517227492 Automation
    25. 25. http://www1.assumption.edu/users/bniece/spectra/HiResolution/Ws.jpg The Spectrum of DevOps
    26. 26. ALM (Re)Visualized
    27. 27. Architecture http://www.flickr.com/photos/42302958@N00/161864682
    28. 28. http://www.flickr.com/photos/bradmontgomery/8007012137/ Code
    29. 29. http://www.flickr.com/photos/florianric/7263382550/ Tools
    30. 30. http://www.flickr.com/photos/10506540@N07/3517227492 Special Thanks
    31. 31. Thank you! Josh Gillespie Sr. Consultant Josh.Gillespie@polarissolutions.com