DevOps: Solving the engineering productivity challenge (whole issue)Large enterprises today are often full of good ideas , but take too long to deliver the infrastructure and application changes needed to support process changes. A major culprit is how IT manages change and the risks that system changes createContrast that approach with how things are done now at web-based companies, even very large ones such as Facebook, Google, and Netflix. These companies strive for continuity of operations by embracing change. As a result, they are much better aligned to what the business needs today and more responsive to what the business needs tomorrow. The evolution from lean and agile to antifragile (Business feature)The idea that businesses might be experiencing an unprecedented amount of “stress and disorder” should come as no surprise. So how can enterprises turn stress and disorder into a positive? One answer may lie in how some web-scale companies are making frequent daily changes a routine part of their cloud-based services. The tools and techniques behind continuous delivery and deployment (Tech feature)In continuous delivery, frequent – even daily or hourly – incremental code releases replace what used to transpire over weeks or months. Enterprises in established industries are now moving toward more frequent code releases with many of the same tools and techniques in use at web-scale companies.A CIO’s DevOps-style approach to resolving the agility-stability paradox (CIO feature)For decades, CIOs have applied methods and adopted tools that lead to at least some form of stability. But their preoccupation with stability has meant a lack of responsiveness, particularly when considering a fast-paced business environment. IT can transform itself by adopting agility without giving up stability, but CIOs will have to change the IT mindset first.
Hi folks, and thanks for taking the time to join us today. This issue of the Tech Forecast is focused on DevOps. DevOps is a working style designed to encourage closer collaboration between developers and operations people. (Dev+Ops=DevOps.) It’s an outgrowth of agile development, and agile has been around since 2001, before the advent of cloud computing in the late 2000s. Since that time, a native cloud development style has emerged that enterprises are finding quite compelling. DevOps encourages extensive automation and workflow redesign that’s inspired by this native cloud development so that developers can release small bits of code frequently (in a more or less continuous delivery cycle) and yet not disrupt the operational environment.The idea to focus our research efforts on DevOps began when Bo heard LinkedIn’s CTO speak at a Big Data conference at UC Berkeley earlier this year. The CTO talked about the toolchain LinkedIn had been using to accelerate its development workflow. LinkedIn was one of the cloud native web companies who were using novel tools and forms of automation that emerged in the public cloud to manage their online operations, and they were also creating their own tools and donating them to open source.Another example we came across was Instagram’stoolchain. Instagram’s story is worth retelling to give you a sense for what these new capabilities make possible and why enterprises might be interested.In Instagram’s case, Mike Krieger and Kevin Systrom, who were two software developers with little knowledge of back-end systems, built and launched Instagram in 2010 on a single server from one of their homes. On the first day, 25,000 users signed up for the mobile photo-sharing service that many of us use or are familiar with. Within two years, Instagram had 2 million users. It acquired a million users in just 12 hours after it began offering an Android version. By May 2013, nine months after Facebook acquired Instagram for $1 billion, the photo-sharing service had more than 100 million active users per month. So the question becomes, how did such a tiny effort scale that quickly and maintain operational stability?Krieger and Systrom recognized that trying to achieve the absolute stability of a traditional enterprise system was unrealistic given the exponential growth they were experiencing. Instead, they designed the system to take advantage of the uncertainty even as the growth itself forced fundamental change to their system. With this approach, they solved a succession of scaling problems one at a time, as they arose, using open-source databases and various data handling and other techniques identified with the help of open developer communities.In April 2012, Krieger acknowledged that they had learned the real meaning of scaling: “Replacing all components of a car while driving it at 100 mph.”
What were the resources Krieger and Systrom able to tap into on a shoestring? A lot of the resources they used are related to what’s happened in the public cloud. A prime example of how the cloud has changed software development is GitHub, the commercial version of the open source Git online environment developed by Linux inventor Linus Torvalds. Github is now the hottest open source version control system. Part of GitHub’s appeal is its social features, which promote collaboration. That kind of social network effect caused Github to grow to more than 2.7 million users and 4.6 million repositories by the end of 2012, as shown in this chart.Charles Oppenheimer, CEO of Prizzm, which offers the MightBuy shopping app, says that companies doing development now have to pay attention to what gets posted on Github. “This is the language of open source now, and it’s all about source control and how you manage changes to code. If you aren’t coding this way, you aren’t speaking the new language. I don’t hire anybody who doesn’t have GitHub contributions. It’s so I can see their code.” There’s a visibility there that the community is starting to depend on.GitHub Inc. has also launched GitHub Enterprise, a version of Git for an organization’s internal network that’s protected by the customer’s existing authentication. Many, many developer teams are finding Github and other tools helpful in improving efficiency, and enterprise software vendors are beginning to emulate or build on top of what the open source community has created. Much of the code in open source has been donated by LinkedIn, Facebook, Twitter and other social networking companies who want to see what other developers will do with their code. They’ve decided to release the code as open source because they consider data their core competency, not their code.
So how does an environment like Github fit in with what has come before? Before, software took its cue from manufacturing. A hundred years ago with the assembly line, companies tried to eliminate product variability. Taylor's "scientific management" transformed craft production into mass production with documentation, structured processes and tools. Soon command and control management techniques refined during World War II imposed standards from the top down- epitomized by cars that were very uniform in their appearance. Today product variability is intentional and value creating—that’s enabled by more and more specific flows of information, represented by customization as shown in this diagram.Edwards Deming, who pioneered total quality management, recognized that quality and value were compromised when information flowed only in only one direction– from the top down. He taught the Japanese and then others about quality circles and plan-do-check-act (PDCA), an iterative continuous improvement cycle. In Deming’s vision, feedback from shop floor employees is key - but still in service to mass production of standard products. Mass personalization at the product level via flexibly programmable robotics now results in one-off car purchasing options - creating the appearance of a car built by hand "just for you," somewhat similar to what a crafts person might have done in 1890. The feedback-response loops used when companies run software in the cloud and adopt DevOps methods echo, articulate and accelerate Deming’s two-way information flow, and those methods are influencing what’s happening in 3D printing, for example.
As I mentioned earlier, DevOps is a working style designed to encourage closer collaboration between developers and operations people. Traditionally those groups have been working more or less at cross purposes. DevOps collaboration seeks to reduce the friction between the two groups by addressing the root causes of the friction, making it possible to increase the volume and flow of production code, improve the efficiency of developer teams and make it so they're not alienating the ops people who must guard the stability of the system. One of those root causes of friction is poor workflow design. DevOps workflow includes buffers, compartmentalization and extensive monitoring and testing--a very extensive and well designed pipeline, but also a rich feedback loop as we’re showing in this diagram. It's a test-driven environment. When the small bits of code get deployed, the individual changes to the user experience tend to be minor or limited to a small subaudience initially.But this sort of testing is becomes part and parcel of the delivery and deployment cycle. Rather than performing lengthy tests of every bit in a massive code release, organizations are rolling small batches of code into production and quickly rolling them back if the infrastructure or users report problems. The changes are often so small that this becomes a very manageable technique. One CIO told us that instead of measuring twice and cutting once, they measure once and cutting three times.He said, “I’m going to spend half my time writing tests for the things I deliver, rather than just delivering the feature itself.”Instead of trying to avoid failure, some continuous delivery practitioners intentionally crash parts of their infrastructure to learn if essential services survive, and then use what they learned to improve the services.
That’s a pattern that many companies who have DevOps teams are very familiar with. Rebel Labs did a study that revealed how DevOps-style IT allocates its time differently, with much more time spend on automation, testing, infrastructure improvements and education as you see here in this graphic. Essentially, the teams are spending lots of time automating aspects of the infrastructure and creating tests that help them deliver more resilient software. Each DevOps team member has a higher awareness of the whole system and how parts of the system work together.
So DevOps emphasizes malleability as a way of keeping pace with change. But many companies don’t have the ability to adopt DevOps quickly and instantly reap the benefits. Contrast the way established enterprises adopt a DevOps approach with what cloud native Instagram did from the start or an Ancestry.com (a pre-cloud enterprise) did within a couple of years as we’re showing in this illustration. Instagram is a couple of years old, and Ancestry is a couple of decades old.By contrast, Nationwide’s been around since 1926 and has presumably had computer systems for over fifty years. In spite of that legacy, they’ve adopted some agile development practices and are looking towards DevOps. Vijay Gopal acknowledges that Nationwide has “some antiquated systems that have not been developed with the degree of consistency or using the principles that we would like to have. We’re investing significantly to move toward that goal. But we also need to keep in mind that we operate in a regulated industry, and we can’t release code that could compromise customer information or some of the financial and sensitive data that we have. We would need to prove development in some of the noncritical areas of our business first.”An older and larger enterprise just has to deal with many, many issues when trying to make changes.
So now that you have some background on DevOps, we can talk a bit about antifragility and how DevOps and antifragility are related. We make reference throughout the issue to NassimTaleb'sAntifragility theory. Antifragilityis a popular business book and an organizational management theory which appeals to some of the leading DevOps thinkers we're interviewed or read. Taleb as you know is the author of The Black Swan and a former Wall Street trader. While black swans are the rare catastrophes that could cripple an industry (such as the financial services industry encountered with credit default swaps), antifragility is a rethinking of how organizations should operate to survive assuming the eventuality of black swans. Conceptually, antifragile organizations actually thrive in disrupted environments.This illustration draws the comparison between an ideal antifragile enterprise state and earlier phases of enterprise evolution. We’ve represented the fragile organization as a dinosaur who can’t respond to unexpected threats such as asteroids that damage large portions of the ecosystem. Robust companies have armor like a rinoceros but can’t survive catastrophic events either. Agile organizations have some chance of survival, and antifragile organizations actually thrive when shocked. The metaphor we’re using here depicts the bacteria that survived Chernobyl.So in this sense antifragile organizations embrace change to a greater extent than agile organizations do, and the cloud can enable these organizations to turn on a dime. It’s not only possible to change rapidly—rapid adaptation can be their natural state.
An antifragile approach anticipates and builds in responses to unpredictable failures in components within a cloud infrastructure to detect and remove vulnerabilities. Some of these failure-causing tools are now available as open source software. These include Netflix Chaos Monkey and Stelligent Havoc, which randomly shut down Amazon web services instances to help developers plan for such disruptions and fix weak points in their infrastructures. This kind of shock testing of operational code underscores how at least some businesses are trying to survive in the 21st century. In this illustration, we’re showing how the antifragile approach builds on what agile development is known for—rapid development cycles, scrums and sprints. The antifragile approach embraces more and more frequent and aggressive testing. It’s an evolution of agile that the cloud and DevOps methods make possible for the first time.
In our opinion, antifragility describes an ideal business state that a DevOps methodology can help enterprises work towards, an evolution that takes us from Deming forward to a state beyond agile, a state of continual responsiveness and ability to grow in environments that see frequent and sometime radical shifts in the marketplace. But our current work in agile help lay the groundwork. As this illustration shows, it’s an evolutionary path to antifragility, not a revolutionary one.
DevOps: Solving the engineering productivity challenge (Slides from PwC External Webcast)
DevOps: Solving the
PwC Technology Forecast
July 25, 2013
Tom DeGarmo, US Technology Consulting Leader
Bo Parker, Center for Technology & Innovation
Alan Morrison, Center for Technology & Innovation
Guest: John Esser, Director of Engineering
Productivity and Agile Development, Ancestry.com
Technology forecast quarterly: Emerging trends
in enterprise IT
Planned for Oct/Nov release: Future of enterprise apps
Tech Forecast 2013, Issue 2
The evolution from lean and agile to antifragile (Business
The tools and techniques behind continuous delivery and
deployment (Tech feature)
A CIO’s DevOps-style approach to resolving the agility-
stability paradox (CIO feature)
Tech Forecast 2013, Issue 2
Lothar Schubert and Laurence Sweeney
How rapid social network growth at Instagram
relates to enterprise agile development
GitHub (a social network and code repository for
developers) growth, 2008–2012
Before, manufacturing techniques informed software
development; now, software development leads
Cloud computing enables companies to embrace change and deliver services on demand.
Main characteristics of the DevOps philosophy
DevOps=agile, automated development and deployment in the era of the cloud
Overall work week for DevOps vs. traditional IT
Enterprise DevOps teams must navigate the maze
of legacy systems and processes
The fragile versus the robust, agile and antifragile
Continual testing and improvement anticipate and
How antifragility fits in with other change
DevOps, continuous improvement and antifragility have their roots in Deming, TQM, Lean, Kaizen….
Conversation with John Esser of Ancestry.com