OSCON 2010: Open Source Tool Chains for Cloud Computing

  • 6,326 views
Uploaded on

The presentation is an overview of four key areas for the management of cloud computing/virtual infrastructure and the open source tools available to achieve success managing these environments in …

The presentation is an overview of four key areas for the management of cloud computing/virtual infrastructure and the open source tools available to achieve success managing these environments in each area.

Provisioning – Bringing server images online using build systems best suited for clouds and virtual machines (e.g. OpenNebula, Cobbler, OpenQRM)

Configuration – Abstraction and application of configuration policies across virtual infrastructure. e.g cfengine, Chef, Puppet

Command and Control – Scaling and management of administration activities across nodes and application tiers. e.g. ControlTier, Capistrano

Monitoring – Keeping the situational awareness of dynamic environments has never been more challenging, too Nagios, HypericHQ, Zenoss Core

Tools discussed will be chosen based on merit and their ability to interact with other tools to form a systems management toolchain.

Presented by:
Mark Hinkle, VP of Community at Zenoss
www.socializedsoftware.com

John M. Willis
VP of Services, Opscode
www.johnmwillis.com

Alex Honor
Project Leader Control Tier
Founder DTO Solutions
www.controltier.org

http://www.oscon.com/oscon2010/public/schedule/detail/13949

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
6,326
On Slideshare
0
From Embeds
0
Number of Embeds
7

Actions

Shares
Downloads
537
Comments
0
Likes
22

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Open Source Tool Chain for Cloud Computing Copyright © 2010 Opscode, Inc - All Rights Reserved 1
  • 2. John Willis Opscode VP of Services @botchagalupe Mark Hinkle Zenoss VP of Community @mrhinkle Alex Honor DTO Founder @alexhonor Copyright © 2010 Opscode, Inc - All Rights Reserved 2
  • 3. What is Devops?
  • 4. Devops •Culture People •Automation Process •Measurement •Sharing Tools
  • 5. Devops Sharing Automation
  • 6. Devops Sharing Automation •Tool Chain Project •Monitoring and control •Provisioning, Config and Systems Integrations
  • 7. DevOps Toolchain Project Alex Honor Project Leader, ControlTier Open Source Project Co-Founder, DTO Solutions
  • 8. DevOps Toolchain Project Goal Share and discuss... DevOps problems Toolchains Best practices and lessons learned
  • 9. DevOps Toolchain Project Goal Share and discuss... DevOps problems Toolchains Best practices and lessons learned needs a name!
  • 10. History of DevOps Toolchain Project Noticed clients building ad-hoc toolchains: 2008 - 2009 Version 1.1 August 15, 2009 Fully automated provisioning paper: Oct 2009 Web Ops 2.0: Achieving Fully Automated Provisioning Contributors: Damon Edwards, DTO Solutions Andrew Schafer, Reductive Labs Anthony Shortland, DTO Solutions Alex Honor, ControlTier Project Lee Thompson, Former VP & Chief Technologist of E*TRADE Financial Creative Commons Licensed (Attribution - Share Alike) OpsCamp Austin: Jan 2010 O’Reilly Velocity Online: Mar 2010 Google group “devops-toolchain”: Mar 2010 OpsCamp SF: May 2010 O’Reilly Velocity Conference: June 2010
  • 11. Who are the ‘devops-toolchain’ group members? http://groups.google.com/group/devops-toolchain
  • 12. Who are the ‘devops-toolchain’ group members? People: System administrators Application developers Open source software tool developers Software product managers Generalists and process methodologists http://groups.google.com/group/devops-toolchain
  • 13. Who are the ‘devops-toolchain’ group members? People: Organizations: System administrators E-Commerce Application developers Search Open source software tool Social media developers Gaming Software product managers Industrial process Generalists and process methodologists Financial Commercial / OSS ISVs http://groups.google.com/group/devops-toolchain
  • 14. Lots of interesting contributions and discussions Ernest Mueller’s Scott Mcarty’s Vlad’s case study http://groups.google.com/group/devops-toolchain
  • 15. Sample of Discussion Topics • Unix Like Tool Chains • Open questions on unified pipe architecture • Taxonomy (still TBD!) • Distribution methods: package vs file, rscyn/murder vs yum/rpm vs DFS • Configuration management: RPMs vs puppet/cfengine/chef tool? • Rollback methodologies for package and config mgt tools • Sizing a Devops team (what is a devops team?) • Controlling and timing package release and config mgt tools • Log management (aggregating, crunching, charting) • Change detection • Scripting language choices http://groups.google.com/group/devops-toolchain
  • 16. We’re inspired by other conceptual models
  • 17. quality of the web operations that support these businesses has lagged behind. Outages are all too common. High variability and defect rates are bemoaned but have We’re inspired by other conceptual models become an accepted reality. Key engineers spend all day (and sometimes all night) mired in deployment issues and bottlenecks. And topping it all off, what tooling that does exist are usually a custom one-offs that are brittle and expensive to maintain. Today’s business of operating software over the Web as a revenue producing service is a ” Today’s business of dramatic departure from the days when software was primarily produced for delivery on operating software over the physical mediums and IT Operations was considered a back-of-the-house support Web as a revenue producing function. Shouldn't we be completely rethinking our tooling and operational capabilities service is a dramatic to match these new innovations? departure from the days when software was primarily In short, we need to get out of Web Operations 1.0 -- mired in legacy tools, outdated produced for delivery on approaches, and low expectations -- and into Web Operations 2.0 where tools and procedures are built from the ground up for highly efficient, reliable, and agile Industrial Control Automation physical mediums...” operations. Runbook Automation Control Eventing, Alarm Mgmt Charting, History, SPC Measurement Instrumentation System There are multiple factors that go into achieving excellence in Web Operations, but the linchpin that holds it all together is a fully automated provisioning system. In this paper we will be: 1. Defining what we mean by "fully automated provisioning" 2. Explaining why virtualization and cloud computing efforts fail without fully automated provisioning capabilities 3. Proposing a reference open source tool chain for fully automated provisioning 4. Describing a live implementation where a leading online retailer is actively rolling out a fully automated provisioning system using all open source tools 2 Web Ops 2.0: Achieving Fully Automated Provisioning
  • 18. quality of the web operations that support these businesses has lagged behind. Outages are all too common. High variability and defect rates are bemoaned but have We’re inspired by other conceptual models become an accepted reality. Key engineers spend all day (and sometimes all night) mired in deployment issues and bottlenecks. And topping it all off, what tooling that does exist are usually a custom one-offs that are brittle and expensive to maintain. Today’s business of operating software over the Web as a revenue producing service is a ” Today’s business of dramatic departure from the days when software was primarily produced for delivery on operating software over the physical mediums and IT Operations was considered a back-of-the-house support Web as a revenue producing function. Shouldn't we be completely rethinking our tooling and operational capabilities service is a dramatic to match these new innovations? departure from the days when software was primarily In short, we need to get out of Web Operations 1.0 -- mired in legacy tools, outdated produced for delivery on approaches, and low expectations -- and into Web Operations 2.0 where tools and procedures are built from the ground up for highly efficient, reliable, and agile Industrial Control Automation physical mediums...” operations. Runbook Automation Control Eventing, Alarm Mgmt Charting, History, SPC Measurement Instrumentation System There are multiple factors that go into achieving excellence in Web Operations, but the linchpin that holds it all together is a fully automated provisioning system. In this paper we will be: Unix Tool Pipeline 1. Defining what we mean by "fully automated provisioning" 2. Explaining why virtualization and cloud computing efforts fail without fully automated provisioning capabilities 3. Proposing a reference open source tool chain for fully automated provisioning 4. Describing a live implementation where a leading online retailer is actively rolling out a fully automated provisioning system using all open source tools 2 Web Ops 2.0: Achieving Fully Automated Provisioning
  • 19. quality of the web operations that support these businesses has lagged behind. Outages are all too common. High variability and defect rates are bemoaned but have We’re inspired by other conceptual models become an accepted reality. Key engineers spend all day (and sometimes all night) mired in deployment issues and bottlenecks. And topping it all off, what tooling that does exist are usually a custom one-offs that are brittle and expensive to maintain. Today’s business of operating software over the Web as a revenue producing service is a ” Today’s business of dramatic departure from the days when software was primarily produced for delivery on operating software over the physical mediums and IT Operations was considered a back-of-the-house support Web as a revenue producing function. Shouldn't we be completely rethinking our tooling and operational capabilities service is a dramatic to match these new innovations? departure from the days when software was primarily In short, we need to get out of Web Operations 1.0 -- mired in legacy tools, outdated produced for delivery on approaches, and low expectations -- and into Web Operations 2.0 where tools and procedures are built from the ground up for highly efficient, reliable, and agile Industrial Control Automation physical mediums...” operations. Runbook Automation Control Eventing, Alarm Mgmt Charting, History, SPC Measurement Instrumentation System There are multiple factors that go into achieving excellence in Web Operations, but the linchpin that holds it all together is a fully automated provisioning system. In this paper we will be: Unix Tool Pipeline 1. Defining what we mean by "fully automated provisioning" 2. Explaining why virtualization and cloud computing efforts fail without fully automated provisioning capabilities 3. Proposing a reference open source tool chain for fully automated provisioning 4. Describing a live implementation where a leading online retailer is actively rolling out a fully automated provisioning system using all open source tools 2 Web Ops 2.0: Achieving Fully Automated Provisioning Brent Chapman’s Incident Command System
  • 20. Integrated vs “integrate-able” tools
  • 21. Integrated vs “integrate-able” tools Integrated Turn key solution that provides end to end functionality for the problem domain
  • 22. Integrated vs “integrate-able” tools vs. Integrated Turn key solution that provides end to end functionality for the problem domain
  • 23. Integrated vs “integrate-able” tools vs. Integrated Integrate-able Turn key solution that Chosen set of complementary provides end to end independent parts than can functionality for the problem be joined to solve a problem domain domain
  • 24. Commercial vs. Open Source Tools
  • 25. Commercial vs. Open Source Tools Commercial (Integrated)
  • 26. Commercial vs. Open Source Tools vs. Commercial (Integrated)
  • 27. Commercial vs. Open Source Tools vs. Commercial Open Source (Integrated) (Integrate-able)
  • 28. Example: “Programmable Infrastructure” http://www.webadminblog.com/
  • 29. Example: Game operator’s framework Monitoring: nagios, jcollectd, rrd, jmx Deployment: ControlTier, Liquibase Build: Hudson, SVN, Maven
  • 30. Example: KaChing’s continuous deployment Monitoring: nagios, jcollectd, rrd, jmx Deployment: Custom app, rpm/yum Build: Hudson, SVN, ant http://eng.kaching.com/
  • 31. Generalized architecture Control Provisioning Release Orchestration Deploy CI Server Issue tracker Artifact Config repository Dispatcher management SCM Build Repository OS boot/ Scheduler install Code Workflows Model Resources Configuration Events Trending Asset Identity inventory Topology Trending Reporting Host naming CMDB Sources Monitoring
  • 32. Release management toolchain yum/yast artifactory Repository archiva DFS/openEFS HTTTP eggs tgz/zip Artifact gems rpm/deb/pkg/msi perlmod jar/war/ear make sbt Build maven rake ant/ivy phing svn cvs SCM git hg bugzilla Tracker mantis trac
  • 33. Monitoring and control toolchain ControlTier Runbook Automation Jobscheduler OpenScheduler nagios Op Console, Control zenoss opennms Alarm Mgmt rrdtool Charting, History, SPC cacti Measurement Instrumentation
  • 34. Provisioning toolchain Capistrano ControlTier Command and Application Service Fabric Control Orchestration Func mCollective Provisioning Activity Bcfg2 cfengine System Configuration Chef Configuration Puppet Smart Frog Cloud: OS Install: Cloud or VM Xen Kickstart Bootstrapping OS lxc Jumpstart Image Install openVZ Cobbler Launch Eucalyptus OpenQRM KVM xCAT
  • 35. Join us! • Define missing tool chains • Fix tool lists • Keep working on taxonomy • Document and share experience about your tool chain http://groups.google.com/group/devops-toolchain
  • 36. Monitoring and Control Mark Hinkle VP of Community, Zenoss Inc.
  • 37. Legacy IT Different perspective, lack of coordination Cartoon originally copyrighted by the authors; G. Renee Guzlas, artist
  • 38. Legacy Monitoring Perspective Types of Monitoring Data Collection • Availability Monitoring – Binary, Moment in Time • Performance Monitoring – Two Dimensions, Time • SNMP and State • Change Management – Comparisons of states in • SSH Time • Event Management – Normalizing Randomness • WMI • • Synthetic Transactions – Simulated Experiences Business Service Management (BSM) – • Syslog $$$ Consequences of IT Performance • Proprietary Agents
  • 39. The Myth of the Nines Availability % Downtime per Year Downtime per Month Downtime per Week 99.9% (three nines) 8.76 hours 43.2 minutes 10.1 minutes 99.95% 4.38 hours 21.56 minutes 5.04 minutes 99.99% (four nines) 52.6 minutes 4.32 minutes 1.01 minutes 99.999% (five 5.26 minutes 25.9 seconds 6.05 minutes nines) 99.9999% (six 31.5 seconds 2.59 seconds .0605 seconds nines) •Average polling interval for monitoring? 5 minutes? •Even super human operations people can’t be alerted and take action in under 5 minutes. •One outage per year could drop service level to three nines.
  • 40. Legacy Systems Management: Fragmented Awareness Global dashboard is a difficult mash-up of disparate systems or doesn’t exist. No communication, No automation Database Provisioning Configuration Management Performance & Availability Management Analytics server Analytics server Process server Process server Configuration Process server Database database Database Multiple data models across disciplines with no Each management discipline common object model managed has its own separate Agent Agent Agent product (UI, process, database, and domain specific Multiple agents required for each discipline language) and platform
  • 41. Systems Management DevOps Style: Integrated Model, Interactive, Automated Application Application Op. System Op. System Virtual Machine Virtual Machine Physical/Virtual/Cloud Infrastructure
  • 42. Systems Management DevOps Style: Integrated Model, Interactive, Automated Application Application Op. System Op. System Virtual Machine Virtual Machine Physical/Virtual/Cloud Infrastructure
  • 43. Example – Broadcast Company Large premium television content provider serves national cable network with content served from Linux servers. • Servers are automatically built using configuration management software • As servers are brought into service configuration management inserts hosts into CMDB used by monitoring database • One way interaction between configuration management and monitoring system • Reports are generated to determine which systems are compliant
  • 44. Example - Gameday
  • 45. Example - Geeknet Hundreds of servers, serving web, databases, and other infrastructure for some of the world’s most highly trafficked websites – over 40 million visitors per month. • Servers are automatically built using configuration management software • Discovery tool finds infrastructure and populates a CMDB then spits out information to scripts that translate information to BIND configurations for DNS • Monitoring tool adds hosts to polling tool to start monitoring servers for availability • As infrastructure changes systems are updated automatically • Servers can be spun up and managed in minutes, not hours automatically with little or no human interaction
  • 46. Unlegacy Future: Devops Development Operations
  • 47. Provisioning, Configuration Management and Systems Integration John Willis VP of Services - Opscode, Inc.
  • 48. What Does Configuration Management In The Cloud Mean?
  • 49. Did They Lie? Copyright © 2010 Opscode, Inc - All Rights Reserved 35
  • 50. Did They Lie? I did’ not have “operational” relations with that provider Copyright © 2010 Opscode, Inc - All Rights Reserved 35
  • 51. Caveat Emptor • Provisioning • Configuration Management •Systems Integration
  • 52. Provisioning Nodes opslb01 opsws01 opsws02 opsdm01 opsds01 opsds02 Copyright © 2010 Opscode, Inc - All Rights Reserved 37
  • 53. Configuration Management Roles loadbalancer webserver dbmaster dbslave Copyright © 2010 Opscode, Inc - All Rights Reserved 38
  • 54. Systems Integration Load Balancer Recipes haproxy Web Server Web Server apache2 myssql DB Master DB Slave DB Slave Disk Disk Disk Copyright © 2010 Opscode, Inc - All Rights Reserved 39
  • 55. What Do Developers Want? Copyright © 2010 Opscode, Inc - All Rights Reserved 40
  • 56. For Developers...
  • 57. For Developers... • Self Service Operations
  • 58. For Developers... • Self Service Operations • The infrastructure is the application (and vice versa)
  • 59. For Developers... • Self Service Operations • The infrastructure is the application (and vice versa) • Minimize Bottlenecks
  • 60. For Developers... • Self Service Operations • The infrastructure is the application (and vice versa) • Minimize Bottlenecks • The “Right” Tools
  • 61. What Does Operations Want? Copyright © 2010 Opscode, Inc - All Rights Reserved 42
  • 62. Copyright © 2010 Opscode, Inc - All Rights Reserved 43
  • 63. Operations http://covers.oreilly.com/images/9780596007836/lrg.jpg Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
  • 64. Operations • Say “Yes”. http://covers.oreilly.com/images/9780596007836/lrg.jpg Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
  • 65. Operations • Say “Yes”. • You never liked rack and stack that much anyway. http://covers.oreilly.com/images/9780596007836/lrg.jpg Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
  • 66. Operations • Say “Yes”. • You never liked rack and stack that much anyway. • You have never been more critical. http://covers.oreilly.com/images/9780596007836/lrg.jpg Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
  • 67. Operations • Say “Yes”. • You never liked rack and stack that much anyway. • You have never been more critical. • Just get out of the way. http://covers.oreilly.com/images/9780596007836/lrg.jpg Lean into it appears courtesy of Cliff Moon, of Dynomite fame: http://twitter.com/moonpolysoft
  • 68. Industry Shifts Be bold-and mighty forces will come to your aid Basil King Copyright © 2010 Opscode, Inc - All Rights Reserved 45
  • 69. Infrastructure as Code Development Team focus IDE/Workbench Agile methodology Source Control Copyright © 2010 Opscode, Inc - All Rights Reserved 46
  • 70. Devops and Automation Agile Operations Operations as Code Configuration Management Infrastructure as Code Agile Infrastructure Copyright © 2010 Opscode, Inc - All Rights Reserved 47
  • 71. Operations as Code Copyright © 2010 Opscode, Inc - All Rights Reserved 48
  • 72. Role Based Services name "webserver" description "Systems that serve HTTP traffic" run_list( "role[base]", "recipe[apache2]", "recipe[apache2::mod_ssl]" ) default_attributes( "apache" => { "listen_ports" => [ "80", "443" ] } ) override_attributes( "apache" => { "max_children" => "50" } ) Copyright © 2010 Opscode, Inc - All Rights Reserved 49
  • 73. Resources Is a declarative description of the state you desire a part of your node to be in. http://www.flickr.com/photos/xiaming/382205902/sizes/l/
  • 74. Resources Is a declarative description of the state you desire a part of your node to be in. http://www.flickr.com/photos/xiaming/382205902/sizes/l/
  • 75. Resources • Is of a given type. Is a declarative description of the state you desire a part of your node to be in. http://www.flickr.com/photos/xiaming/382205902/sizes/l/
  • 76. Resources • Is of a given type. • Has a name. Is a declarative description of the state you desire a part of your node to be in. http://www.flickr.com/photos/xiaming/382205902/sizes/l/
  • 77. Resources • Is of a given type. • Has a name. • Has attributes. Is a declarative description of the state you desire a part of your node to be in. http://www.flickr.com/photos/xiaming/382205902/sizes/l/
  • 78. Resources • Is of a given type. • Has a name. • Has attributes. • Takes actions to bring the resource to a declared state. Is a declarative description of the state you desire a part of your node to be in. http://www.flickr.com/photos/xiaming/382205902/sizes/l/
  • 79. Recipies Applies resources in the order they are specified http://www.flickr.com/photos/roadsidepictures/2478953342/sizes/o/
  • 80. Recipies Applies resources in the order they are specified • Can include other recipes. http://www.flickr.com/photos/roadsidepictures/2478953342/sizes/o/
  • 81. Recipies Applies resources in the order they are specified • Can include other recipes. • Is just Ruby. http://www.flickr.com/photos/roadsidepictures/2478953342/sizes/o/
  • 82. Load Balancer Example Copyright © 2010 Opscode, Inc - All Rights Reserved 52
  • 83. Load Balancer Example Copyright © 2010 Opscode, Inc - All Rights Reserved 52
  • 84. Systems Integration Load Balancer Web Server Web Server DB Master DB Slave DB Slave Disk Disk Disk Copyright © 2010 Opscode, Inc - All Rights Reserved 53
  • 85. Tale of Two Startups “Traditional” Operations Operations - The “Secret Sauce” 50 50 40 40 # of Hours 30 30 20 20 Hardware OS Install 10 10 Config Upkeep 20 20 15 15 Servers 10 10 New 5 5 Existing 0 0 1 2 3 4 5 6 7 9 10 11 12 1 2 3 4 5 6 7 9 10 11 12 Week # Week # (http://radar.oreilly.com/archives/2007/10/operations-advantage.html) Copyright © 2010 Opscode, Inc - All Rights Reserved 54
  • 86. Tale of Two Startups “Traditional” Operations Operations - The “Secret Sauce” 50 50 This is the secret of 40 40 Cloud Computing. Every other virtue stems from # of Hours 30 30 here. 20 20 Hardware OS Install 10 10 Config Upkeep 20 20 15 15 Servers 10 10 New 5 5 Existing 0 0 1 2 3 4 5 6 7 9 10 11 12 1 2 3 4 5 6 7 9 10 11 12 Week # Week # (http://radar.oreilly.com/archives/2007/10/operations-advantage.html) Copyright © 2010 Opscode, Inc - All Rights Reserved 54