With the move to virtualization and cloud-like IT architectures, we create and destroy computers instantly. The rate of architectural change is so fast, it must be automated to be workable. A new discipline -- dubbed DevOps -- is rising to the challenge. It's a cultural and technological shift in how IT systems are managed from creation to decommissioning. Because it gives development teams far greater control and involvement in operational functions, DevOps is tearing down the walls between development and operations, leading to a more collaborative, automated, agile approach to IT.
In this session, operations expert Jesse Robbins will look at how DevOps promises to change many operations fundamentals as we move to more agile IT atop elastic computing environments.
Speaker - Jesse Robbins, CEO and Co-Founder, Opscode
3. “DevOps is the ability to create
and deploy reliable software to
an unreliable platform that
scales horizontally.”
http://radar.oreilly.com/2007/10/operations-is-a-competitive-ad.html 3
4. DevOps
Culture
Slide Modified from John Allspaw - http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
5. This isn’t new
‣ Theory of Constraints
‣ Lean / JIT
‣ Six Sigma
‣ Toyota Production System
‣ Agile
‣ etc...
5
8. Types of Constraints
‣ Equipment
The way equipment is currently used limits the
ability of the system.
‣ People
Lack of skilled people limits the system.
‣ Policy
A written or unwritten policy hinders or prevents
the system from working effectively.
8
9. Process for Ongoing Improvement
1) Identify the constraint
Find the resource or policy that prevents the organization from
obtaining more of the goal
2) Decide how to exploit the constrant
Make sure the constraint's time is not wasted doing things that
it should not do
3) Subordinate all other processes to above decision
Align the whole system or organization to support the decision
made above
4) Elevate the constraint
If required or possible, increase capacity of the constraint; "buy
more"
5) If, as a result of these steps, the constraint has moved, return to
Step 1.
Don't let inertia become the constraint.
9
11. Papers of the Research Society of Commerce and Economics, Vol. XXXXVII No. 2
Figure 4. Comparing coercive systems/procedures with enabling ones (Table 2 in Adler
1999, http://www.ob.shudo-u.ac.jp/jimuhp/souken/web/magazine/pdf/com/shou47-2austenfeld.pdf
p. 44)
14. “It’s not my code, it’s your machines!
Spock Scotty
Little bit weird Pulls levers & turns knobs
Sits closer to the boss Easily excited
Thinks too hard Yells a lot in emergencies
Slide Courtesy of John Allspaw - http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr 14
15. No ngerpointing
Slide Courtesy of John Allspaw - http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
http://www. ickr.com/photos/rocketjim54/2955889085/
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr 15
16. Fingerpointyness
problem!!!
argggh! fixed
freaking out, blaming, figuring it
fixing things
not talking, covering whining, out
finding fault ass hiding.
hurt egos
time
Slide Courtesy of John Allspaw - http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
17. Being productive
problem!!!
argggh! fixed
figuring it fixing things feeling move
out guilty on with
life
time
Slide Courtesy of John Allspaw - http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
Artur’s example - every 2 weeks WoW goes down for 2 hours, their customers accept this.
Theory of Constraints (TOC) is an overall management philosophy introduced by Dr. Eliyahu M. Goldratt i n his 1984 book tit led The Goal , that is geared to help organiz ations c ontinually achieve their goal. [1] The title comes from the contention that any manageable system is limited in achieving more of its goal by a very small number of constraints, and that there is always at least one constraint. The TOC process seeks to identify the constraint and restructure the rest of the organization around it. A Romance Set in a Factory about Process Innovation. It’s the story of Development (Engineering) & Operations (Manufacturing). Ironically, and perhaps telling for me as the CEO of a Tools Company, this book was written to promote software that Mr. Goldratt’s company created. However, once people understood the fundamental changes in CULTURE... they no longer needed the software. Standing on the Shoulders of Giants Goldratt published an article [ citation needed ] and gave talks [15] with the tit le "Standing on the Shoulders of Giants" in which he gives credit for many of the core ideas of Theory of Constraints. Goldratt has sought many times to show the correlation between various improvement methods. However, many Goldratt adherents often denigrate other methodologies as inferior to TOC [ citation needed ] .
friendster: is cache just warming up, or is this a code problem?
flickr engineers and ops just assume it was their code that failed. Parallelizing the figuring it out followed by coordinated response. “hey, it was me, here’s how we’re going to fix it”
just because you worked with a cowboy 5 years ago doesn’t mean you need to treat everyone that way. don’t assume the worst in everyone
just saying “no” is another way of saying “i don’t care about your problem” What problem are they trying to solve?How can you help them?if i don’t like their proposal, what is my alternative?If someone is saying “no”, why are they saying “no”?can you come up with a solution that meets their nee
Not hiding stuff just because you’re afraid someone will say no or fuck with itif john is going to say “no” it’s probably for a good reason
1 is about new features - ops need to be involved early and often as new features are specced and implemented 2 devs need to trust that infrastructure isn’t going to change underneath them, and will continue to perform and support their app Otherwise, you’re just a bunch of fucking cowboys. And cowboys are losers.
Ops people - make sure developers can see what’s happening on the system without going through you. Giving someone a read-only shell account on production hardware is low risk, and means they can see what’s happening. Make sure they can access all your metrics and monitoring systems. Let them see what’s going on
My prior work on something we called “GameDay” Fire Drills Created an iterative program to improve company-wide availability through fault-injection into Amazon’s most critical systems. This program continues to be effective in three ways: Preparation for GameDay drives the identification and mitigation of risks and impact from failure, reducing both the frequency of failure (MTBF) and duration of recovery (MTTR). Participation in GameDay builds individual confidence in recovering systems after failure and under stress. This strengthens individual and cultural ability to anticipate, mitigate, respond to, and recover from failures of all types. The GameDay exercises trigger and expose “latent defects”, allowing the organization to decide when it will discover them, instead of letting that be determined by the next real disaster.
The OODA loop (for o bserve, o rient, d ecide, and a ct) is a concept originally applied to the combat operations process , often at the strategic level in both t he milita ry operations. It is now also often applied to understand commercial operations and learning processes. The concept was developed by military strategist and USAF Colon el John Boyd . The O ODA loop has become an i mpor tant concept in both b usiness and military strategy . According to Boyd, decision-making occurs in a recurring cycle o f observe-orient-decide-act. An entity (whether an individual or an organization) that can process this cycle quickly, observing and reacting to unfolding events more rapidly than an opponent, can thereby "get inside" the opponent's decision cycle and gain the advantage. Frans Osinga argues that Boyd's own views on the OODA loop are much deeper, richer, and more comprehensive than the common interpretation of the 'rapid OODA loop' idea. [2] Boyd developed the concept to explain how to direct one's energies to defeat an adversary and survive. Boyd emphasized that "the loop" is actually a set of interacting loops that are to be kept in continuous operation during combat. He also indicated that the phase of the battle has an important bearing on the ideal allocation of one's energies. Boyd’s diagram shows that all decisions are based on observations of the evolving situation tempered with implicit filtering of the problem being addressed. These observations are the raw information on which decisions and actions are based. The observed information must be processed to orient it for further making a decision. In notes from his talk “Organic Design for Command and Control”, Boyd said, The second O, orientation – as the repository of our genetic heritage, cultural tradition, and previous experiences – is the most important part of the O-O-D-A loop since it shapes the way we observe, the way we decide, the way we act. [3] As stated by Boyd and shown in the “Orient” box, there is much filtering of the information through our culture, genetics, ability to analyze and synthesize, and previous experience. Since the OODA Loop was designed to describe a single decision maker, the situation is usually much worse than shown as most business and technical decisions have a team of people observing and orienting, each bringing their own cultural traditions, genetics, experience and other information. It is here that decisions often get stuck, [4] which does not lead to winning, since In order to win, we should operate at a faster tempo or rhythm than our adversaries--or, better yet, get inside [the] adversary's Observation-Orientation-Decision-Action time cycle or loop. ... Such activity will make us appear ambiguous (unpredictable) thereby generate confusion and disorder among our adversaries--since our adversaries will be unable to generate mental images or pictures that agree with the menacing as well as faster transient rhythm or patterns they are competing against. [3] The OODA loop that focuses on strategic military requirements, was adapted for business and public sector operational continuity planning. Compare it with the Plan Do Check Act (PDCA) cycle or Shewhart cycle , which focuses on the operational or tactical level of projects. [5] As one of Boyd's colleagues, Harry Hillaker , put it in "John Boyd, USAF Retired, Father of the F16 " [6] : The key is to obscure your intentions and make them unpredictable to your opponent while you simultaneously clarify his i nte ntions. That is, operate at a faster tempo to generate rapidly changing conditions that inhibit your opponent from adapting or reacting to those changes and that suppress or destroy his awareness. Thus, a hodgepodge of confusion and disorder occur to cause him to over- or under-react to conditions or activities that appear to be uncertain, ambiguous, or incomprehensible. Writer Robert Greene wrote in an article called OODA and You [7] that the proper mindset is to let go a little, to allow some of the chaos to become part of his mental system, and to use it to his advantage by simply creating more chaos and confusion for the opponent. He funnels the inevitable chaos of the battlefield in the direction of the enemy. [ edit ]
The second O, orientation – as the repository of our genetic heritage, cultural tradition, and previous experiences – is the most important part of the O-O-D-A loop since it shapes the way we observe, the way we decide, the way we act. [3] As stated by Boyd and shown in the “Orient” box, there is much filtering of the information through our culture, genetics, ability to analyze and synthesize, and previous experience. Since the OODA Loop was designed to describe a single decision maker, the situation is usually much worse than shown as most business and technical decisions have a team of people observing and orienting, each bringing their own cultural traditions, genetics, experience and other information. It is here that decisions often get stuck, [4] which does not lead to winning.
Many good tools open source tools, and three common ones: CFengine is most widely deployed, we used Puppet, then Opscode created Chef and we’re very proud of it. Puppet has been around for a while and there is good documentation & a book about it. Uses it’s own language for configuration, which some people seem to really like. Strong sysadmin community adoption. You can buy direct support for Puppet software from PuppetLabs. Chef is a year and a half old now. Over 130 contributors, thousands of sites, partners building an ecosystem. Chef Cookbooks site to help get you started using over 100 cookbooks for every part of your infrastructure. Very rapid uptake among developers because “it’s just ruby” and can be embedded into your code as part of your app. Opscode provides a hosted platform for a small fee per node (free while in Alpha!)
important point here is that dev’s not only know where this is and have access, but watch it as obsessively as operations. Sometimes more. (paul took this screenshot)
not just cpu, memory, disk and network also application level stuff
devs makes these because they want to ops make it easy to make these - we have a framework where applications write stats file and ganglia will slurp it up
Velocity Web Performance and Operations Conference Fast by Default Web performance and operations is an emerging discipline which requires incredible breadth, focusing less on specific technologies and more on how the entire system works together. While people often specialize in particular components, great engineers and developers understand web performance and operations in relation to the whole. The best are able to fly to the 50,000 foot view and see the entire system in motion and then zoom in to microscopic levels and examine the tiny movements of an individual part. View the full Velocity schedule >> Now in its third year, Velocity—the Web Performance and Operations conference from O'Reilly Media—is the premier conference that: Provides your web ops and dev teams direct access to the training, technologies and skills that will have the most immediate impact on your bottom line Gives you the keys to the "fast by default" toolkit for automating your infrastructure Throws aside the silos and redefines your team's understanding of the whole industry ecosystem Gives you direct face time with the biggest rock stars in the industry and access to the biggest network of your peers Shows off the biggest gallery of "show me now" case studies Topics and Themes for Velocity 2010 Velocity attendees are a unique tribe of disaster gurus, special ops professionals, software developers, academics, sysadmins, developers, engineers, and more. They're the hidden heroes that keep the trains running on time, the people you call at 3am when your site is down. They're the most skilled at repeatedly rising to the challenge of finding solutions to complex problems and are always on the leading edge of emerging technologies and solutions. View the full Velocity schedule >> Velocity gives attendees access to speakers and the in-depth, technical content that will move the dial furthes t for Web Ops and Development professionals most responsible for the health of their company's IT infrastructure. Topics and themes we're focusing on in 2010 include: Mobile Web Peformance Multiple Data Center Management Network Latency Cloud Computing Scalable Video and Social Gaming HTML5 Configuration Management Webcaching and Memecaching Green Architecture Metrics