Puppet Deployment at OnApp


        Wai Keen Woon
        CTO, CDN Division
        waikeen.woon@onapp.com
WARNING	




<ObligatoryPlug>
About OnApp
       A leading provider of software for hosts



The leading cloud                     The instant global CDN for hosts
management software for
hosts

                   OnApp launched July 1st 2010
                     Deep industry knowledge
                          Backed by LDC
                 100+ employees in US, EU, APAC
Vital Statistics


                   1 in 3
                   public clouds


                 800+
              cloud deployments


                   300+
                   global clients
Customer Stories
Instant CDN that gives you…



75+       PoPs



low       cost, high margin



get       paid for idle capacity
OK.
       	




</ObligatoryPlug>
Systems Overview

l    Core & Development
      l    ~20 physical servers
      l    ~200 VMs
      l    Homogeneous environment – 64-bit Debian everywhere
      l    Mainly use OpenVZ and KVM for virtualization
l    CDN Delivery Edge Servers
      l    100+ servers in 60+ cities
      l    Running on the OnApp platform – either Xen or KVM
l    Puppet integral to our setup – since day 1
Why Puppet?

l    More reliable configuration of servers. Less need to
      “run ssh in a for loop” and miss out something.
l    Self-documenting – our manifests are almost able to
      bootstrap an empty server.
      l    Our manifests can't bootstrap an empty environment yet.
      l    Limitation – manifests describe what/where/how something
            is setup, but doesn't describe *why*.
l    Nice syntax – easy on the eyes. Comprehensive builtin
      resource types. Able to fallback to dumb ways of doing
      things if required (use file, exec et al).
Core Infra Environments

l    Systems manifest describes everything.
l    Three environments:




                         β
What Would OnApp Setup...

l    Essential utilities (tcpdump, less, vim, etc).
l    Users & their SSH keys, sudoers.
      l    Developer's shell => /bin/false if production
l    Base firewall rules.
l    Nagios agent.
l    Set uniform locality settings: UTC timezone,
      en_US.UTF-8 locale.
l    SMTP that smarthosts to our central relay.
l    Syslogd for remote logs to central logging server.
l    Finally, the services.
Core Infra Manifest Excerpt
$portal_domain   = "portal.alpha.onappcdn.com"        node
                                                        "monitoring.alpha.onappcdn.com" {
$portal_db_host = "portal.alpha.onappcdn.com"
                                                           include base
$portal_db_user = "aflexi_webportal"
                                                           include s_db_monitoring
                                                           include s_monitoring_server
$auth_nameservers = { "ns1" => "175.143.72.214",
                                                           include collectd::rrdcached
                         "ns2" => "175.143.72.214",
                                                           include s_munin
                         "ns3" => "175.143.72.214",
                                                           include s_monitoring_alerts
                         "ns4" => "175.143.72.214",
                                                           include s_monitoring_graph
                     }
                                                      }

$monitoring_host_server =                                 class collectd::rrdcached {
                                                           package { "rrdcached":
      [ "monitoring.alpha.onappcdn.com",
        "dns.alpha.onappcdn.com" ]                             ensure     => latest,
                                                           }
                                                           service { "rrdcached":
 BLUE       – env config definitions                           ensure     => running,
 RED            – node definitions                         }
 GREEN – class definitions                            }
Package Repo Integration

l    Jenkins builds debs of our code and stores it into an apt
      repository for the environment it is built for.
l    Puppet keeps packages up-to-date (ensure => latest)
      and restarts services on package upgrades.
      Puppet-agent[25431]:
      (/Stage[main]/Debian/Exec[apt-get-update]/returns) executed
      successfully

      puppet-agent[25431]:
      (/Stage[main]/Python::Aflexi::Mq/Package[python-aflexi-mqcore]/
      ensure)
      ensure changed '7065.20120530.113915-1' to '7066.20120604.090916-1'

      puppet-agent[25431]:
      (/Stage[main]/S_mq/Service[worker-rabbitmq])
      Triggered 'refresh' from 1 events

      puppet-agent[25431]: Finished catalog run in 16.08 seconds
Nagios Integration

l    Plugs into nagios – uses “exported resources”
Nagios Integration

Server manifest                           Nagios service manifest
                                           *collects the resources to check
 *exports the service that is checked


 @@nagios_service { "check_load_$fqdn":    Nagios_service <<| tag == "onappcdn.cm" |>>
                                           {
check_command =>                             target => "/etc/n3/conf.d/services.cfg",
         "check_nrpe_1arg!check_load",       require => Package["nagios3"],
   use           => "generic-service",       notify => Exec["reload-nagios"],
   host_name     => $fqdn,                 }
   service_description => "check_load",
   tag                 => $domain,
 }
Nagios Integration

l    What's logged on the nagios server when puppet runs?
      puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/
      Nagios_host[hrm.onappcdn.com]/ensure) created

      puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/
      Nagios_service[check_load_hrm.onappcdn.com]/ensure) created

      nagios3: Nagios 3.2.1 starting... (PID=5601)

      puppet-agent[15293]: (/Stage[main]/Nagios::Base/Exec[reload-
      nagios]) Triggered 'refresh' from 8 events
Monitoring Puppet Itself

l    Lots of tools/dashboards out there to achieve this.
l    For us: “grep -i err */syslog”. Dumb, but works until we
      need to Really Address it.
l    Common issues:
      l  Puppet gets “stuck”. And only one puppet instance

          can run at any one time.
      l  Manifest errors – syntax, merge issues.


      l  Badly-written manifests (vague dependencies,

          conditions/commands not robust enough).
      l  An important dependent resource failing (e.g. apt-get

          install fails due to dpkg-configure error).
File/Dir Organization

l    We use git to revision control our                                   l    Common branch
                                                                                  Manifests/
      puppet manifests.                                                                alpha.pp
                                                                                       beta.pp
l    Style we adopted mainly comes                                               Modules/
                                                                                       Base/
      from Hunter Haugen*                                                              Users/
l    A branch for each environment,                                       l    Alpha env branch
                                                                                   Modules/
      plus a “common” branch.                                                           Python/
                                                                                   Services/
l    Each branch checked out as a                                                      Nameserver/
      separate directory in /etc/puppet/                                   l    Beta env branch
      environments/$env                                                            Modules/
                                                                                        Python/
l    And puppetmaster's includedir                                                Services/
                                                                                        Nameserver/
      configured to that directory.
       * - http://hunnur.com/blog/2010/10/dynamic-git-branch-puppet-environments/
File/Dir Organization

l    Common goes into its own branch – for convenience;
      less merging needed for manifests that we are Really
      Sure won't differ between environments.
l    System manifest into common/manifests/$env.pp
      l    Initially tried putting manifest into alpha/beta/omega
            branches as site.pp – merge hell.
l    Introduced extra variable - $effective_env
      l    Abstracts the puppet environment name, from the
            environment that the manifest runs in.
File/Dir Organization

l    Hotfixes branch off omega and merged to alpha/beta/
      omega.
l    Development branches off alpha
      l    This branch can be trialed as a separate environment (use
            --environment to specify custom env on puppet client).
      l    Merge to alpha → beta → omega.
      l    Or merge as feature branch to any other environment.
l    “git diff branchA branchB” - differences are shown
      clearly between environments.
Edge Servers

l    Our edge servers are hosted on OnApp cloud (only).
l    When creating an edge server, the cloud control panel
      l    Instantiates a VM from a lightly-customized Debian image.
      l    Configures the package repositories.
      l    Issues a puppet run to set up.
l    Advantage of setting it up through puppet instead of a
      “gold image” - our system can be installed on bare
      metal if needed, can be reproducibly installed on
      $future_debian_release
Edge Servers

l    Our edge servers are hosted on OnApp cloud (only).
l    When creating an edge server, the control panel
      instantiates a VM from a lightly-customized Debian
      image, and issues a puppet run to set it up.
Edge Servers – External Node Classifier

l    No text manifest – all code, using “external node
      classifier”.
l    Assign variables and classes specific to the edge
      server through node classifier. E.g. its password, the
      services it runs.
l    In python,

          output = {}
          output[“classes”] = [ “class1”, “class2” ]
          output[“parameters”] = { “param1”: “value1” }
          print yaml.dump(output)
Edge Servers – External Node Classifier

l    This YAML-encoded structure...
      $ puppet-nodeclassifier 85206671.onappcdn.com

      classes: [base, nginx ]
      parameters: { edge_secret_key: 86zFsrM7Ma, monitoring_domain:
      monitoring.alpha.onappcdn.com }


l    … is equivalent to this textual manifest:
      node 85206671.onappcdn.com {
        $edge_secret_key = “86zFsrM7Ma”
        $monitoring_domain = “monitoring.alpha.onappcdn.com”
        include base
        include nginx
      }
Edge Servers Storedconfigs

l    Puppet stores facts about the edge servers into
      MySQL.
l    We make minimal use of this – for example sizing
      nginx's in-memory cache depending on the amount of
      memory it has.
l    Could probably use more e.g. set # threads based on
      cpu core count.
l    The data's always there if we ever want to query it...
Q&A

l    Questions? Comments?


l    P/S – final plug – we're hiring sysadmins!
PuppetCamp SEA 1 - Puppet Deployment  at OnApp

PuppetCamp SEA 1 - Puppet Deployment at OnApp

  • 1.
    Puppet Deployment atOnApp Wai Keen Woon CTO, CDN Division waikeen.woon@onapp.com
  • 2.
  • 3.
    About OnApp A leading provider of software for hosts The leading cloud The instant global CDN for hosts management software for hosts OnApp launched July 1st 2010 Deep industry knowledge Backed by LDC 100+ employees in US, EU, APAC
  • 4.
    Vital Statistics 1 in 3 public clouds 800+ cloud deployments 300+ global clients
  • 5.
  • 6.
    Instant CDN thatgives you… 75+ PoPs low cost, high margin get paid for idle capacity
  • 7.
    OK. </ObligatoryPlug>
  • 8.
    Systems Overview l  Core & Development l  ~20 physical servers l  ~200 VMs l  Homogeneous environment – 64-bit Debian everywhere l  Mainly use OpenVZ and KVM for virtualization l  CDN Delivery Edge Servers l  100+ servers in 60+ cities l  Running on the OnApp platform – either Xen or KVM l  Puppet integral to our setup – since day 1
  • 9.
    Why Puppet? l  More reliable configuration of servers. Less need to “run ssh in a for loop” and miss out something. l  Self-documenting – our manifests are almost able to bootstrap an empty server. l  Our manifests can't bootstrap an empty environment yet. l  Limitation – manifests describe what/where/how something is setup, but doesn't describe *why*. l  Nice syntax – easy on the eyes. Comprehensive builtin resource types. Able to fallback to dumb ways of doing things if required (use file, exec et al).
  • 10.
    Core Infra Environments l  Systems manifest describes everything. l  Three environments: β
  • 11.
    What Would OnAppSetup... l  Essential utilities (tcpdump, less, vim, etc). l  Users & their SSH keys, sudoers. l  Developer's shell => /bin/false if production l  Base firewall rules. l  Nagios agent. l  Set uniform locality settings: UTC timezone, en_US.UTF-8 locale. l  SMTP that smarthosts to our central relay. l  Syslogd for remote logs to central logging server. l  Finally, the services.
  • 12.
    Core Infra ManifestExcerpt $portal_domain = "portal.alpha.onappcdn.com" node "monitoring.alpha.onappcdn.com" { $portal_db_host = "portal.alpha.onappcdn.com" include base $portal_db_user = "aflexi_webportal" include s_db_monitoring include s_monitoring_server $auth_nameservers = { "ns1" => "175.143.72.214", include collectd::rrdcached "ns2" => "175.143.72.214", include s_munin "ns3" => "175.143.72.214", include s_monitoring_alerts "ns4" => "175.143.72.214", include s_monitoring_graph } } $monitoring_host_server = class collectd::rrdcached { package { "rrdcached": [ "monitoring.alpha.onappcdn.com", "dns.alpha.onappcdn.com" ] ensure => latest, } service { "rrdcached": BLUE – env config definitions ensure => running, RED – node definitions } GREEN – class definitions }
  • 13.
    Package Repo Integration l  Jenkins builds debs of our code and stores it into an apt repository for the environment it is built for. l  Puppet keeps packages up-to-date (ensure => latest) and restarts services on package upgrades. Puppet-agent[25431]: (/Stage[main]/Debian/Exec[apt-get-update]/returns) executed successfully puppet-agent[25431]: (/Stage[main]/Python::Aflexi::Mq/Package[python-aflexi-mqcore]/ ensure) ensure changed '7065.20120530.113915-1' to '7066.20120604.090916-1' puppet-agent[25431]: (/Stage[main]/S_mq/Service[worker-rabbitmq]) Triggered 'refresh' from 1 events puppet-agent[25431]: Finished catalog run in 16.08 seconds
  • 14.
    Nagios Integration l  Plugs into nagios – uses “exported resources”
  • 15.
    Nagios Integration Server manifest Nagios service manifest *collects the resources to check *exports the service that is checked @@nagios_service { "check_load_$fqdn": Nagios_service <<| tag == "onappcdn.cm" |>> { check_command => target => "/etc/n3/conf.d/services.cfg", "check_nrpe_1arg!check_load", require => Package["nagios3"], use => "generic-service", notify => Exec["reload-nagios"], host_name => $fqdn, } service_description => "check_load", tag => $domain, }
  • 16.
    Nagios Integration l  What's logged on the nagios server when puppet runs? puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/ Nagios_host[hrm.onappcdn.com]/ensure) created puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/ Nagios_service[check_load_hrm.onappcdn.com]/ensure) created nagios3: Nagios 3.2.1 starting... (PID=5601) puppet-agent[15293]: (/Stage[main]/Nagios::Base/Exec[reload- nagios]) Triggered 'refresh' from 8 events
  • 17.
    Monitoring Puppet Itself l  Lots of tools/dashboards out there to achieve this. l  For us: “grep -i err */syslog”. Dumb, but works until we need to Really Address it. l  Common issues: l  Puppet gets “stuck”. And only one puppet instance can run at any one time. l  Manifest errors – syntax, merge issues. l  Badly-written manifests (vague dependencies, conditions/commands not robust enough). l  An important dependent resource failing (e.g. apt-get install fails due to dpkg-configure error).
  • 18.
    File/Dir Organization l  We use git to revision control our l  Common branch Manifests/ puppet manifests. alpha.pp beta.pp l  Style we adopted mainly comes Modules/ Base/ from Hunter Haugen* Users/ l  A branch for each environment, l  Alpha env branch Modules/ plus a “common” branch. Python/ Services/ l  Each branch checked out as a Nameserver/ separate directory in /etc/puppet/ l  Beta env branch environments/$env Modules/ Python/ l  And puppetmaster's includedir Services/ Nameserver/ configured to that directory. * - http://hunnur.com/blog/2010/10/dynamic-git-branch-puppet-environments/
  • 19.
    File/Dir Organization l  Common goes into its own branch – for convenience; less merging needed for manifests that we are Really Sure won't differ between environments. l  System manifest into common/manifests/$env.pp l  Initially tried putting manifest into alpha/beta/omega branches as site.pp – merge hell. l  Introduced extra variable - $effective_env l  Abstracts the puppet environment name, from the environment that the manifest runs in.
  • 20.
    File/Dir Organization l  Hotfixes branch off omega and merged to alpha/beta/ omega. l  Development branches off alpha l  This branch can be trialed as a separate environment (use --environment to specify custom env on puppet client). l  Merge to alpha → beta → omega. l  Or merge as feature branch to any other environment. l  “git diff branchA branchB” - differences are shown clearly between environments.
  • 21.
    Edge Servers l  Our edge servers are hosted on OnApp cloud (only). l  When creating an edge server, the cloud control panel l  Instantiates a VM from a lightly-customized Debian image. l  Configures the package repositories. l  Issues a puppet run to set up. l  Advantage of setting it up through puppet instead of a “gold image” - our system can be installed on bare metal if needed, can be reproducibly installed on $future_debian_release
  • 22.
    Edge Servers l  Our edge servers are hosted on OnApp cloud (only). l  When creating an edge server, the control panel instantiates a VM from a lightly-customized Debian image, and issues a puppet run to set it up.
  • 23.
    Edge Servers –External Node Classifier l  No text manifest – all code, using “external node classifier”. l  Assign variables and classes specific to the edge server through node classifier. E.g. its password, the services it runs. l  In python, output = {} output[“classes”] = [ “class1”, “class2” ] output[“parameters”] = { “param1”: “value1” } print yaml.dump(output)
  • 24.
    Edge Servers –External Node Classifier l  This YAML-encoded structure... $ puppet-nodeclassifier 85206671.onappcdn.com classes: [base, nginx ] parameters: { edge_secret_key: 86zFsrM7Ma, monitoring_domain: monitoring.alpha.onappcdn.com } l  … is equivalent to this textual manifest: node 85206671.onappcdn.com { $edge_secret_key = “86zFsrM7Ma” $monitoring_domain = “monitoring.alpha.onappcdn.com” include base include nginx }
  • 25.
    Edge Servers Storedconfigs l  Puppet stores facts about the edge servers into MySQL. l  We make minimal use of this – for example sizing nginx's in-memory cache depending on the amount of memory it has. l  Could probably use more e.g. set # threads based on cpu core count. l  The data's always there if we ever want to query it...
  • 26.
    Q&A l  Questions? Comments? l  P/S – final plug – we're hiring sysadmins!