The presentation is an overview of four key areas for the management of cloud computing/virtual infrastructure and the open source tools available to achieve success managing these environments in each area.
Provisioning – Bringing server images online using build systems best suited for clouds and virtual machines (e.g. OpenNebula, Cobbler, OpenQRM)
Configuration – Abstraction and application of configuration policies across virtual infrastructure. e.g cfengine, Chef, Puppet
Command and Control – Scaling and management of administration activities across nodes and application tiers. e.g. ControlTier, Capistrano
Monitoring – Keeping the situational awareness of dynamic environments has never been more challenging, too Nagios, HypericHQ, Zenoss Core
Tools discussed will be chosen based on merit and their ability to interact with other tools to form a systems management toolchain.
Presented by:
Mark Hinkle, VP of Community at Zenoss
www.socializedsoftware.com
John M. Willis
VP of Services, Opscode
www.johnmwillis.com
Alex Honor
Project Leader Control Tier
Founder DTO Solutions
www.controltier.org
http://www.oscon.com/oscon2010/public/schedule/detail/13949
7. DevOps Toolchain Project
Alex Honor
Project Leader, ControlTier Open Source Project
Co-Founder, DTO Solutions
8. DevOps Toolchain Project Goal
Share and discuss...
DevOps problems
Toolchains
Best practices and lessons learned
9. DevOps Toolchain Project Goal
Share and discuss...
DevOps problems
Toolchains
Best practices and lessons learned
needs a name!
10. History of DevOps Toolchain Project
Noticed clients building ad-hoc toolchains: 2008 - 2009
Version 1.1 August 15, 2009
Fully automated provisioning paper: Oct 2009
Web Ops 2.0:
Achieving Fully Automated Provisioning
Contributors:
Damon Edwards, DTO Solutions
Andrew Schafer, Reductive Labs
Anthony Shortland, DTO Solutions
Alex Honor, ControlTier Project
Lee Thompson, Former VP & Chief Technologist of E*TRADE Financial
Creative Commons Licensed (Attribution - Share Alike)
OpsCamp Austin: Jan 2010
O’Reilly Velocity Online: Mar 2010
Google group “devops-toolchain”: Mar 2010
OpsCamp SF: May 2010
O’Reilly Velocity Conference: June 2010
11. Who are the ‘devops-toolchain’ group members?
http://groups.google.com/group/devops-toolchain
12. Who are the ‘devops-toolchain’ group members?
People:
System administrators
Application developers
Open source software tool
developers
Software product managers
Generalists and process
methodologists
http://groups.google.com/group/devops-toolchain
13. Who are the ‘devops-toolchain’ group members?
People: Organizations:
System administrators E-Commerce
Application developers Search
Open source software tool Social media
developers
Gaming
Software product managers
Industrial process
Generalists and process
methodologists Financial
Commercial / OSS ISVs
http://groups.google.com/group/devops-toolchain
14. Lots of interesting contributions and discussions
Ernest Mueller’s Scott Mcarty’s
Vlad’s case study
http://groups.google.com/group/devops-toolchain
15. Sample of Discussion Topics
• Unix Like Tool Chains
• Open questions on unified pipe architecture
• Taxonomy (still TBD!)
• Distribution methods: package vs file, rscyn/murder vs yum/rpm vs DFS
• Configuration management: RPMs vs puppet/cfengine/chef tool?
• Rollback methodologies for package and config mgt tools
• Sizing a Devops team (what is a devops team?)
• Controlling and timing package release and config mgt tools
• Log management (aggregating, crunching, charting)
• Change detection
• Scripting language choices
http://groups.google.com/group/devops-toolchain
17. quality of the web operations that support these businesses has lagged behind.
Outages are all too common. High variability and defect rates are bemoaned but have
We’re inspired by other conceptual models
become an accepted reality. Key engineers spend all day (and sometimes all night)
mired in deployment issues and bottlenecks. And topping it all off, what tooling that
does exist are usually a custom one-offs that are brittle and expensive to maintain.
Today’s business of operating software over the Web as a revenue producing service is a
” Today’s business of dramatic departure from the days when software was primarily produced for delivery on
operating software over the physical mediums and IT Operations was considered a back-of-the-house support
Web as a revenue producing function. Shouldn't we be completely rethinking our tooling and operational capabilities
service is a dramatic to match these new innovations?
departure from the days
when software was primarily In short, we need to get out of Web Operations 1.0 -- mired in legacy tools, outdated
produced for delivery on approaches, and low expectations -- and into Web Operations 2.0 where tools and
procedures are built from the ground up for highly efficient, reliable, and agile
Industrial Control Automation
physical mediums...”
operations.
Runbook Automation
Control
Eventing, Alarm Mgmt
Charting, History, SPC
Measurement Instrumentation
System
There are multiple factors that go into achieving excellence in Web Operations, but the
linchpin that holds it all together is a fully automated provisioning system.
In this paper we will be:
1. Defining what we mean by "fully automated provisioning"
2. Explaining why virtualization and cloud computing efforts fail without fully
automated provisioning capabilities
3. Proposing a reference open source tool chain for fully automated provisioning
4. Describing a live implementation where a leading online retailer is actively rolling
out a fully automated provisioning system using all open source tools
2 Web Ops 2.0: Achieving Fully Automated Provisioning
18. quality of the web operations that support these businesses has lagged behind.
Outages are all too common. High variability and defect rates are bemoaned but have
We’re inspired by other conceptual models
become an accepted reality. Key engineers spend all day (and sometimes all night)
mired in deployment issues and bottlenecks. And topping it all off, what tooling that
does exist are usually a custom one-offs that are brittle and expensive to maintain.
Today’s business of operating software over the Web as a revenue producing service is a
” Today’s business of dramatic departure from the days when software was primarily produced for delivery on
operating software over the physical mediums and IT Operations was considered a back-of-the-house support
Web as a revenue producing function. Shouldn't we be completely rethinking our tooling and operational capabilities
service is a dramatic to match these new innovations?
departure from the days
when software was primarily In short, we need to get out of Web Operations 1.0 -- mired in legacy tools, outdated
produced for delivery on approaches, and low expectations -- and into Web Operations 2.0 where tools and
procedures are built from the ground up for highly efficient, reliable, and agile
Industrial Control Automation
physical mediums...”
operations.
Runbook Automation
Control
Eventing, Alarm Mgmt
Charting, History, SPC
Measurement Instrumentation
System
There are multiple factors that go into achieving excellence in Web Operations, but the
linchpin that holds it all together is a fully automated provisioning system.
In this paper we will be:
Unix Tool Pipeline
1. Defining what we mean by "fully automated provisioning"
2. Explaining why virtualization and cloud computing efforts fail without fully
automated provisioning capabilities
3. Proposing a reference open source tool chain for fully automated provisioning
4. Describing a live implementation where a leading online retailer is actively rolling
out a fully automated provisioning system using all open source tools
2 Web Ops 2.0: Achieving Fully Automated Provisioning
19. quality of the web operations that support these businesses has lagged behind.
Outages are all too common. High variability and defect rates are bemoaned but have
We’re inspired by other conceptual models
become an accepted reality. Key engineers spend all day (and sometimes all night)
mired in deployment issues and bottlenecks. And topping it all off, what tooling that
does exist are usually a custom one-offs that are brittle and expensive to maintain.
Today’s business of operating software over the Web as a revenue producing service is a
” Today’s business of dramatic departure from the days when software was primarily produced for delivery on
operating software over the physical mediums and IT Operations was considered a back-of-the-house support
Web as a revenue producing function. Shouldn't we be completely rethinking our tooling and operational capabilities
service is a dramatic to match these new innovations?
departure from the days
when software was primarily In short, we need to get out of Web Operations 1.0 -- mired in legacy tools, outdated
produced for delivery on approaches, and low expectations -- and into Web Operations 2.0 where tools and
procedures are built from the ground up for highly efficient, reliable, and agile
Industrial Control Automation
physical mediums...”
operations.
Runbook Automation
Control
Eventing, Alarm Mgmt
Charting, History, SPC
Measurement Instrumentation
System
There are multiple factors that go into achieving excellence in Web Operations, but the
linchpin that holds it all together is a fully automated provisioning system.
In this paper we will be:
Unix Tool Pipeline
1. Defining what we mean by "fully automated provisioning"
2. Explaining why virtualization and cloud computing efforts fail without fully
automated provisioning capabilities
3. Proposing a reference open source tool chain for fully automated provisioning
4. Describing a live implementation where a leading online retailer is actively rolling
out a fully automated provisioning system using all open source tools
2 Web Ops 2.0: Achieving Fully Automated Provisioning
Brent Chapman’s Incident Command System
21. Integrated vs “integrate-able” tools
Integrated
Turn key solution that
provides end to end
functionality for the problem
domain
22. Integrated vs “integrate-able” tools
vs.
Integrated
Turn key solution that
provides end to end
functionality for the problem
domain
23. Integrated vs “integrate-able” tools
vs.
Integrated Integrate-able
Turn key solution that Chosen set of complementary
provides end to end independent parts than can
functionality for the problem be joined to solve a problem
domain domain
37. Legacy IT
Different perspective, lack of coordination
Cartoon originally copyrighted by the authors; G. Renee Guzlas, artist
38. Legacy Monitoring Perspective
Types of Monitoring Data Collection
• Availability Monitoring – Binary, Moment in Time
• Performance Monitoring – Two Dimensions, Time
• SNMP
and State
• Change Management – Comparisons of states in
• SSH
Time
• Event Management – Normalizing Randomness
• WMI
•
•
Synthetic Transactions – Simulated Experiences
Business Service Management (BSM) –
• Syslog
$$$ Consequences of IT Performance
• Proprietary Agents
39. The Myth of the Nines
Availability % Downtime per Year Downtime per Month Downtime per Week
99.9% (three nines) 8.76 hours 43.2 minutes 10.1 minutes
99.95% 4.38 hours 21.56 minutes 5.04 minutes
99.99% (four nines) 52.6 minutes 4.32 minutes 1.01 minutes
99.999% (five 5.26 minutes 25.9 seconds 6.05 minutes
nines)
99.9999% (six 31.5 seconds 2.59 seconds .0605 seconds
nines)
•Average polling interval for monitoring? 5 minutes?
•Even super human operations people can’t be alerted and take action in under 5 minutes.
•One outage per year could drop service level to three nines.
40. Legacy Systems Management:
Fragmented Awareness
Global dashboard is a difficult mash-up of
disparate systems or doesn’t exist. No
communication, No automation
Database
Provisioning Configuration Management Performance & Availability Management
Analytics server Analytics server
Process server Process server Configuration Process server
Database database
Database
Multiple data models across disciplines with no Each management discipline
common object model managed has its own separate
Agent Agent Agent
product (UI, process,
database, and domain specific
Multiple agents required for each discipline language)
and platform
41. Systems Management DevOps Style:
Integrated Model, Interactive, Automated
Application Application
Op. System Op. System
Virtual Machine Virtual Machine
Physical/Virtual/Cloud Infrastructure
42. Systems Management DevOps Style:
Integrated Model, Interactive, Automated
Application Application
Op. System Op. System
Virtual Machine Virtual Machine
Physical/Virtual/Cloud Infrastructure
43. Example – Broadcast Company
Large premium television content provider serves national cable network with content
served from Linux servers.
• Servers are automatically built using configuration
management software
• As servers are brought into service configuration
management inserts hosts into CMDB used by
monitoring database
• One way interaction between configuration
management and monitoring system
• Reports are generated to determine which systems
are compliant
45. Example - Geeknet
Hundreds of servers, serving web, databases, and other infrastructure for some of the
world’s most highly trafficked websites – over 40 million visitors per month.
• Servers are automatically built using configuration
management software
• Discovery tool finds infrastructure and populates a
CMDB then spits out information to scripts that
translate information to BIND configurations for DNS
• Monitoring tool adds hosts to polling tool to start
monitoring servers for availability
• As infrastructure changes systems are updated
automatically
• Servers can be spun up and managed in minutes,
not hours automatically with little or no human
interaction
63. Operations
http://covers.oreilly.com/images/9780596007836/lrg.jpg
Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
64. Operations
• Say “Yes”.
http://covers.oreilly.com/images/9780596007836/lrg.jpg
Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
65. Operations
• Say “Yes”.
• You never liked rack
and stack that much
anyway.
http://covers.oreilly.com/images/9780596007836/lrg.jpg
Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
66. Operations
• Say “Yes”.
• You never liked rack
and stack that much
anyway.
• You have never
been more critical.
http://covers.oreilly.com/images/9780596007836/lrg.jpg
Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
67. Operations
• Say “Yes”.
• You never liked rack
and stack that much
anyway.
• You have never
been more critical.
• Just get out of the
way.
http://covers.oreilly.com/images/9780596007836/lrg.jpg
Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
73. Resources
Is a declarative description of the state you desire a part
of your node to be in.
http://www.flickr.com/photos/xiaming/382205902/sizes/l/
74. Resources
Is a declarative description of the state you desire a part
of your node to be in.
http://www.flickr.com/photos/xiaming/382205902/sizes/l/
75. Resources
• Is of a given type.
Is a declarative description of the state you desire a part
of your node to be in.
http://www.flickr.com/photos/xiaming/382205902/sizes/l/
76. Resources
• Is of a given type.
• Has a name.
Is a declarative description of the state you desire a part
of your node to be in.
http://www.flickr.com/photos/xiaming/382205902/sizes/l/
77. Resources
• Is of a given type.
• Has a name.
• Has attributes.
Is a declarative description of the state you desire a part
of your node to be in.
http://www.flickr.com/photos/xiaming/382205902/sizes/l/
78. Resources
• Is of a given type.
• Has a name.
• Has attributes.
• Takes actions to
bring the resource to
a declared state.
Is a declarative description of the state you desire a part
of your node to be in.
http://www.flickr.com/photos/xiaming/382205902/sizes/l/
79. Recipies
Applies resources in the order they are specified
http://www.flickr.com/photos/roadsidepictures/2478953342/sizes/o/
80. Recipies
Applies resources in the order they are specified
• Can include other
recipes.
http://www.flickr.com/photos/roadsidepictures/2478953342/sizes/o/
81. Recipies
Applies resources in the order they are specified
• Can include other
recipes.
• Is just Ruby.
http://www.flickr.com/photos/roadsidepictures/2478953342/sizes/o/