What do the "Cool Kids" know about DevOps?

© 2014 IBM Corporation
Session: 2427
What the Cool Kids are Doing
with DevOps
Bill Holtshouser
Senior Strategist, Mobile, DevOps, Cloud
IBM Rational

Please note…
IBM’s statements regarding its plans, directions, and intent are subject to change
or withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general
product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment,
promise, or legal obligation to deliver any material, code or functionality.
Information about potential future products may not be incorporated into any
contract. The development, release, and timing of any future features or
functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM
benchmarks in a controlled environment. The actual throughput or performance
that any user will experience will vary depending upon many factors, including
considerations such as the amount of multiprogramming in the user’s job stream,
the I/O configuration, the storage configuration, and the workload processed.
Therefore, no assurance can be given that an individual user will achieve results
similar to those stated here.
1

Introduction
• This session is based on an examination of a series of “born on the
web” companies to see what common patterns and other learnings can
be derived from their DevOps journeys, with the goal of extracting
guidance for IBM’s clients
• We used only publicly available information such as published
conference presentations, company blogs, videos, news stories and
white papers
• Important: Everything here is strictly our opinion; none of the
companies mentioned reviewed or endorsed these opinions in any way!
2

Key Takeaways
• “Born on the Web” startups like Etsy, Netflix and others have been
leaders in applying a DevOps approach to SW development and delivery
– but they are essentially built from the ground up to do so
• These companies display numerous common DevOps-related traits in
the areas of Culture, Organization, Practices, Automation and
Measurements
• Although your enterprise won’t be able to replicate all aspects of these
“cool kid” companies and how they have applied DevOps (nor should
you even try), there are some important learnings from them that
can inform your own DevOps approach
3

4
Does this story sound familiar?

One way to address the issue…
5

Believe it or not, Dev and Ops weren’t always separate
“Back in the dawn of the computer
age, there was no distinction between
dev and ops. If you developed, you
operated. You mounted the tapes, you
flipped the switches on the front panel,
you rebooted when things crashed, and
possible even replaced the burned out
vacuum tubes. And you got to wear a
geeky white lab coat…”
“Dev and ops started to separate in the
‘60s, when programmers dumped boxes of
punch cards into readers and “computer
operators” scurried around mounting tapes
in response to IBM JCL. The operators also
pulled printouts from line printers and put
them in labeled cubbyholes, where you got
your output filed under your last name.”
– John Alspaw, Etsy
6

So…just who are these “Cool Kids” anyway?
7

Sidebar: Continuous Delivery is more than just “fast
Continuous Integration”
Continuous Delivery
• Websites, SaaS offerings
• Multiple pushes to
production per day
• Highly decoupled,
independent feature sets
• Single image/single
stream
• New practices and
patterns
Continuous
Integration
• Traditional applications,
appliances, mobile apps,
Web APIs
• Delivery to production
every few days to weeks
• Coordinated releases,
multiple version streams
• Established Agile
practices
Continuous
Engineering
• Complex embedded
systems
• Complex product
release and update
cycles
• Management of
variants and versions
• Engineering practices
8

Five essential elements of “Cool Kids” DevOps
success
Organization
Practices
Culture Automation
Measure-
ment
9

• Trust leads to an acceptance of “reasonable” risk
– Organization, tools, automation, instrumentation can all reduce risk
• Risk = PROBABILITY of Error x COST of Error
– Not all risks are created equal; zero risk is unattainable
– Cost depends on Time to Fix
• Learning from mistakes > blame
– …but there is still Karma: repeated mistakes may lead to loss of privilege
Cool Kids and Culture - key learnings
Culture
At Etsy, employees have a high degree of creative freedom and, when things go wrong,
accountability without blame. “We actually trust people,” CTO Chad Dickerson says. He
calls the approach a “radical decentralization of authority.” – Inc. Magazine, 12/13
1
0
• ALL exhibit a high degree of delegation
– …which leads to velocity
• In order to delegate, the Cool Kids trust… but verify
– E.g. via instrumentation, measurement

Re-defining the attitude towards “failure”
11
• NetFlix allows
failure to happen
continuously, and
want their SW to be
able to deal with it;
in fact they take
steps to encourage
errors (Simian
Army)
• In reality they look
at “failure” as simply
another STEP in the
SW development
process
http://techblog.netflix.com/2011/07/netflix-simian-army.html

• Adopt an “Ops First” design mentality
– Don’t build what you can’t manage
• Recognize the importance of build
– They don’t just give the build system to the “worst programmer”
or newest hire, but establish a focused role
Cool Kids and Culture – more learnings
Culture
12

Bottom line: a culture of trust is required
13
Rapid delivery
requires low
risk
Small
feature sets
Independent
services
Progressive
exposure
Rapid
feedback
Reliable
rollback
High
delegation
& trust
Risk = Probability of error
x Cost of error
Culture

Adrian Cockcroft of Netflix on Culture
“Culture is very hard to create or modify but easy to destroy.
This is because everyone has to buy into it for it to be effective,
and then every manager has to hire only people who are
compatible with the culture, and also get rid of people who turn
out not to fit in, even if they are doing good work.
So the short answer is: start a new company from scratch
with the culture you want, and pay a lot of attention to who
you hire. I don't think it is possible to do a culture shift if
there are more than a roomful of people involved.
Even with a roadmap and a guide, you probably won't be able
to follow this path if you are in a large established company.
Your existing culture won't let you.”
http://perfcap.blogspot.com/2012/03/ops-devops-and-noops-at-netflix.html
14

Organization follows Culture
Traditional Culture DevOps Culture
My priority is to
deliver code…
fast.
My priority is to
keep the site up
and running.
We’re all on the
same team! Want
some pizza?
15
Organ-
ization

• Conway’s Law (you build what you are) applies
– …also applies to how you’re organized
• Feature teams, not platform teams
– Small teams: “two pizza” rule
• Organize for an “end-to-end” responsibility for delivery
– Positive approach to fixing mistakes – learning, not “blame and shame”
• Many common patterns are seen in QA…
– Shared responsibility across a team, everybody does QA, or co-located QA
– Small Quality Engineering CoE team provides common tools/practices
– But NOT a separate/antagonostic QA org (“clean up your own mess”)
• Small DevOps “toolsmith” teams
– A.K.A. Systems Release Engineering
– Provide common tools & processes for automation, logging, monitoring…
– There to help, NOT to do it for you
• Finally - no “throwing it over the wall”…
Organization follows Culture
16
Organ-
ization

…basically, you need to be getting away from this
17

Practices that “make perfect” for the Cool Kids
Practices
• “Light” planning and specs
– Etsy high level planning done in 60 day chunks and two
week periods; specs kept very light – no more than what is
required
• Cut the cord with traditional release process
– Developers coordinate and drive the release of their own
code without need for a centralized release cycle
– Netflix goes farther than most: “NoOps”
• Speed, speed, speed
– Its all about rapid deployment; some deploy updates to their site 25x
per day
• Progressive rollout of new features, “dark” releases
– Concept of “config flags”, new features there but not yet enabled, then
launched with simple switch in the code
• They talk about it…a LOT
– Lots of internal and external forums / blogs among the Cool Kids
– Example: Etsy “Code as Craft” site www.codeasdraft.com
18

• Most of these companies manage a single production
image that they completely control
– The don’t have to worry about shipping releases to
customers who might or might not install those releases
• …therefore there are no branches in their version
control – everything is checked into the trunk
Practices: a single image simplifies things
Practices
19

• Testing everything on every check-in is good…but it
isn’t the endgame
– LinkedIn has only a few thousand unit tests
• Testing in a non-production environment can reach a
point of diminishing returns
– Ever-growing lists of unit tests, often testing very obscure
scenarios, often overlapping and redundant
– Limited by your ability to predict real world scenarios
• LinkedIn practice: get to production environment as
soon as practical
– Progressive rollout minimizes the risk when deploying to
production…
Practices: “Continuous Delivery Heresy”
(Yes, you can do too much testing)
Practices
20

• Progressive rollout of new features, “dark” releases:
– Deploy to one server with all features disabled to ensure no
performance or resource regressions (also known as “canarying”)
– Turn on features for a small population, and measure (“smoke test”)
– Turn it on for up to 1% of users, and measure
– Progressively roll out to all servers, continuing to measure
– Config Flags (also known as feature flags or gatekeepers [LinkedIn])
control which users see which features
• In order to successfully do Progressive Rollout, you’ll need
two more of our five essential elements:
– Automation, both to progressively roll out and to roll back if a
problem is discovered
– Measurement (tied to Instrumentation), in order to be able to rapidly
measure the impact
Practices: Progressive Rollout
Practices
21

Progressive Rollout console at Facebook
Practices
22

• These companies tend to avoid “release-defining
features” that can hold up the entire release
• Cool Kids pattern: release features when they are
ready - the release train waits for nobody
– Also known as date-based releases - the date of release is
fixed, but the features in that release are flexible
• For this to work, you must respect forward and
backwards compatibility of API (service) interfaces
Practices: Fire When Ready!
Practices
23

• In general, the Cool Kids automate as much as
possible
– Etsy has invested a lot in automated unit / functional
testing, dev tooling and monitoring, use of dashboards
– Netflix has a heavy degree of automation across the
board
• Automate even the infrastructure, but keep it simple
– LinkedIn, Flickr and Netflix generally build up their
infrastructure from just a single OS image
– From here, configure individual servers using automated
scripts driven by tool of choice (e.g. IBM UrbanCode)
– Also commonly seen was use of “Phoenix” servers (vs.
“Snowflakes”), which can be re-built at any time then
“burned to the ground” if needed
• … but only automate what can be measured
Cool Kids and Automation Auto-
mation
24

Think you don’t need to keep an eye on automation?
http://windowsitpro.com/windows-7/aggressive-configmgr-based-windows-7-deployment-takes-down-emory-university
“During TechEd 2014, the Emory University IT department prepared and deployed
Windows 7 upgrades to the campuses computers. If you've worked with ConfigMgr
at all, you know that there are checks-and-balances that can be employed to ensure
that only specifically targeted systems will receive an OS upgrade. In Emory
University's case, the check-and-balance method failed and instead of delivering
the upgrade to applicable computers, delivered Windows 7 to ALL computers
including laptops, desktops, and even servers.
I'll stop for a second to let you take that in.
Yes, even servers.
By the time it was realized what exactly had happened, the Windows 7 sequence
had repartitioned, reformatted, and installed Windows 7. Emory IT powered off the
ConfigMgr server, hoping to stop the deployment before it was too late, but – it was
too late. Even the ConfigMgr server had been repartitioned and reformatted…”
– Windows IT Pro, May 19, 2014

Finally: Instrument and Measure
26
• LinkedIn: “Measurement is better than prediction”
• Provide a common framework to make it easy for developers to
choose what to log simply by tagging or registering it
– “Push” from services works better than “pull” or polling
– In many cases, developers need do no more than push key/value pairs
to a logging system
– LinkedIn collects 500K+ metrics per minute at an average of 400
metrics per service
• Instrument user behaviors to improve the user experience
– Esurance: “we mined the data to figure out what people were doing
most often, make those tasks the most prominent and make them
addressable in as few clicks as possible”
• Metrics dashboards also display deployment activity
– So if there’s a problem, you can easily tie the start time of the issue to
the preceding pushes
Measure
-ment

• LinkedIn developed and then open
sourced tools for monitoring and
graphing data being pushed to its logs…
Monitoring at LinkedIn
inGraph, inFormed
Measure
-ment
27

So…what are the Cool Kids DevOps takeaways?
28
Culture
• Cultural change takes time – take reasonable steps
– Team-building, cross-training, improved communication
– Maybe include your Ops team in requirements / feature
reviews and planning (e.g. via IBM RRC, RTC)
• Don’t turn your organization upside down
– Experiment on a few smaller, low-risk projects
– Maybe create DevOps "center of excellence"
– Tear down walls between teams
Organi-
zation
• “Continuous Integration” is a good starting point
– Push all builds to the last stage before release
– Eat your own dog food (get employees involved to test)
– Try progressive rollout or dark release of features
Practices

So…what are the Cool Kids DevOps takeaways?
29
Auto-
mation
• Start by automating a few areas that you can easily see
and track the results from
– E.g. Test / build pipeline, possibly using UrbanCode Deploy
• First, assess your current process and consider the
changes you want to make – then consider how to
measure them
– Instrument and measure anything you intend to automate
Measure
-ment
• But above all, be honest
– Assess your own DevOps maturity and aspirations – where are you
and where do you want to be?

30
IBM can help: DevOps Adoption Framework delivers
measurable outcomes
Enable lean adoption of DevOps capabilities
Adoption Model
Self-assessments
Adoption paths
Adoption services
Solutions
Practices
Tooling
Services
Steer Product-based
Agile
Automated
Collaborative
Optimizing
More
Predictable
More
Transparent
More
Continuous
Process-based
Process-heavy
Manual
Silo-ed
Develop/Test
Deploy
Operate
Inefficient Leaner
Leaner and
Smarter
Continuous
Customer
Feedback &
Optimization
Collaborative
Development
Continuous Release and
Deployment
Continuous
Monitoring
Continuous
Business Planning
Continuous
Testing
Operate Develop/
Test
Deploy
Steer
DevOps
Continuous
Feedback
Community
Stories
Enablement
Feedback
Where and
How to Get
Lean
Expertise
and
Technologies
Knowledge
sharing

31
Where to start: DevOps Adoption Roadmap
Assess desired outcome and supporting practices to drive strategy and rollout
What am I
trying to
achieve?
 Think through business-level drivers for improvement
 Define measurable goals for your organizational investment
 Look across silos and include key Dev and Ops stakeholders
Where am I
currently?
 What do you measure and currently achieve
 What don’t you measure, but should to improve
 What practices are difficult, incubating, well-scaled
 How do your team members agree with these findings
What are my
priorities?
 Start where you are today and where your improvement goals
 Consider changes to People, Practices and Technology
 Prioritize change using goals, complexities and dependencies
Step1Step2Step3
Current Practice
Assessment
Objective & Prioritized
Capabilities
Business Goal
Determination
What new
practices
should help
me grow?
Step4
 Understand your appetite for cross-functional change
 Target improvements with the biggest bang for the buck
 Roadmap and agree on an actionable plan
 Use measurable milestones that include early wins Strategy/Roadmap

32
Connect with me on Twitter at @BillHoltshouser or LinkedIn at
www.linkedin.com/pub/bill-holtshouser/4/815/66a/

Acknowledgements and Disclaimers
© Copyright IBM Corporation 2012. All rights reserved.
– U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
IBM, the IBM logo, ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United
States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a
trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information
was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is
available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
Other company, product, or service names may be trademarks or service marks of others.
Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all
countries in which IBM operates.
The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are
provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice
to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is
provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of,
or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the
effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the
applicable license agreement governing the use of IBM software.
All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may
have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these
materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific
sales, revenue growth or other results.
33

What do the "Cool Kids" know about DevOps?

More Related Content

What's hot

Similar to What do the "Cool Kids" know about DevOps?

Recently uploaded

What do the "Cool Kids" know about DevOps?