PuppetCamp SEA 1 - Puppet Deployment at OnApp


Published on

Wai Keen Woon, CTO CDN Division OnApp Malaysia, gave an interesting overview of what the Puppet architecture at OnApp looks like. The CDN division at OnApp is a large provider of CDN services, and as such makes a very interesting candidate for a case study.

Published in: Technology
  • Be the first to comment

PuppetCamp SEA 1 - Puppet Deployment at OnApp

  1. 1. Puppet Deployment at OnApp Wai Keen Woon CTO, CDN Division waikeen.woon@onapp.com
  2. 2. WARNING <ObligatoryPlug>
  3. 3. About OnApp A leading provider of software for hostsThe leading cloud The instant global CDN for hostsmanagement software forhosts OnApp launched July 1st 2010 Deep industry knowledge Backed by LDC 100+ employees in US, EU, APAC
  4. 4. Vital Statistics 1 in 3 public clouds 800+ cloud deployments 300+ global clients
  5. 5. Customer Stories
  6. 6. Instant CDN that gives you…75+ PoPslow cost, high marginget paid for idle capacity
  7. 7. OK. </ObligatoryPlug>
  8. 8. Systems Overviewl  Core & Development l  ~20 physical servers l  ~200 VMs l  Homogeneous environment – 64-bit Debian everywhere l  Mainly use OpenVZ and KVM for virtualizationl  CDN Delivery Edge Servers l  100+ servers in 60+ cities l  Running on the OnApp platform – either Xen or KVMl  Puppet integral to our setup – since day 1
  9. 9. Why Puppet?l  More reliable configuration of servers. Less need to “run ssh in a for loop” and miss out something.l  Self-documenting – our manifests are almost able to bootstrap an empty server. l  Our manifests cant bootstrap an empty environment yet. l  Limitation – manifests describe what/where/how something is setup, but doesnt describe *why*.l  Nice syntax – easy on the eyes. Comprehensive builtin resource types. Able to fallback to dumb ways of doing things if required (use file, exec et al).
  10. 10. Core Infra Environmentsl  Systems manifest describes everything.l  Three environments: β
  11. 11. What Would OnApp Setup...l  Essential utilities (tcpdump, less, vim, etc).l  Users & their SSH keys, sudoers. l  Developers shell => /bin/false if productionl  Base firewall rules.l  Nagios agent.l  Set uniform locality settings: UTC timezone, en_US.UTF-8 locale.l  SMTP that smarthosts to our central relay.l  Syslogd for remote logs to central logging server.l  Finally, the services.
  12. 12. Core Infra Manifest Excerpt$portal_domain = "portal.alpha.onappcdn.com" node "monitoring.alpha.onappcdn.com" {$portal_db_host = "portal.alpha.onappcdn.com" include base$portal_db_user = "aflexi_webportal" include s_db_monitoring include s_monitoring_server$auth_nameservers = { "ns1" => "", include collectd::rrdcached "ns2" => "", include s_munin "ns3" => "", include s_monitoring_alerts "ns4" => "", include s_monitoring_graph } }$monitoring_host_server = class collectd::rrdcached { package { "rrdcached": [ "monitoring.alpha.onappcdn.com", "dns.alpha.onappcdn.com" ] ensure => latest, } service { "rrdcached": BLUE – env config definitions ensure => running, RED – node definitions } GREEN – class definitions }
  13. 13. Package Repo Integrationl  Jenkins builds debs of our code and stores it into an apt repository for the environment it is built for.l  Puppet keeps packages up-to-date (ensure => latest) and restarts services on package upgrades. Puppet-agent[25431]: (/Stage[main]/Debian/Exec[apt-get-update]/returns) executed successfully puppet-agent[25431]: (/Stage[main]/Python::Aflexi::Mq/Package[python-aflexi-mqcore]/ ensure) ensure changed 7065.20120530.113915-1 to 7066.20120604.090916-1 puppet-agent[25431]: (/Stage[main]/S_mq/Service[worker-rabbitmq]) Triggered refresh from 1 events puppet-agent[25431]: Finished catalog run in 16.08 seconds
  14. 14. Nagios Integrationl  Plugs into nagios – uses “exported resources”
  15. 15. Nagios IntegrationServer manifest Nagios service manifest *collects the resources to check *exports the service that is checked @@nagios_service { "check_load_$fqdn": Nagios_service <<| tag == "onappcdn.cm" |>> {check_command => target => "/etc/n3/conf.d/services.cfg", "check_nrpe_1arg!check_load", require => Package["nagios3"], use => "generic-service", notify => Exec["reload-nagios"], host_name => $fqdn, } service_description => "check_load", tag => $domain, }
  16. 16. Nagios Integrationl  Whats logged on the nagios server when puppet runs? puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/ Nagios_host[hrm.onappcdn.com]/ensure) created puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/ Nagios_service[check_load_hrm.onappcdn.com]/ensure) created nagios3: Nagios 3.2.1 starting... (PID=5601) puppet-agent[15293]: (/Stage[main]/Nagios::Base/Exec[reload- nagios]) Triggered refresh from 8 events
  17. 17. Monitoring Puppet Itselfl  Lots of tools/dashboards out there to achieve this.l  For us: “grep -i err */syslog”. Dumb, but works until we need to Really Address it.l  Common issues: l  Puppet gets “stuck”. And only one puppet instance can run at any one time. l  Manifest errors – syntax, merge issues. l  Badly-written manifests (vague dependencies, conditions/commands not robust enough). l  An important dependent resource failing (e.g. apt-get install fails due to dpkg-configure error).
  18. 18. File/Dir Organizationl  We use git to revision control our l  Common branch Manifests/ puppet manifests. alpha.pp beta.ppl  Style we adopted mainly comes Modules/ Base/ from Hunter Haugen* Users/l  A branch for each environment, l  Alpha env branch Modules/ plus a “common” branch. Python/ Services/l  Each branch checked out as a Nameserver/ separate directory in /etc/puppet/ l  Beta env branch environments/$env Modules/ Python/l  And puppetmasters includedir Services/ Nameserver/ configured to that directory. * - http://hunnur.com/blog/2010/10/dynamic-git-branch-puppet-environments/
  19. 19. File/Dir Organizationl  Common goes into its own branch – for convenience; less merging needed for manifests that we are Really Sure wont differ between environments.l  System manifest into common/manifests/$env.pp l  Initially tried putting manifest into alpha/beta/omega branches as site.pp – merge hell.l  Introduced extra variable - $effective_env l  Abstracts the puppet environment name, from the environment that the manifest runs in.
  20. 20. File/Dir Organizationl  Hotfixes branch off omega and merged to alpha/beta/ omega.l  Development branches off alpha l  This branch can be trialed as a separate environment (use --environment to specify custom env on puppet client). l  Merge to alpha → beta → omega. l  Or merge as feature branch to any other environment.l  “git diff branchA branchB” - differences are shown clearly between environments.
  21. 21. Edge Serversl  Our edge servers are hosted on OnApp cloud (only).l  When creating an edge server, the cloud control panel l  Instantiates a VM from a lightly-customized Debian image. l  Configures the package repositories. l  Issues a puppet run to set up.l  Advantage of setting it up through puppet instead of a “gold image” - our system can be installed on bare metal if needed, can be reproducibly installed on $future_debian_release
  22. 22. Edge Serversl  Our edge servers are hosted on OnApp cloud (only).l  When creating an edge server, the control panel instantiates a VM from a lightly-customized Debian image, and issues a puppet run to set it up.
  23. 23. Edge Servers – External Node Classifierl  No text manifest – all code, using “external node classifier”.l  Assign variables and classes specific to the edge server through node classifier. E.g. its password, the services it runs.l  In python, output = {} output[“classes”] = [ “class1”, “class2” ] output[“parameters”] = { “param1”: “value1” } print yaml.dump(output)
  24. 24. Edge Servers – External Node Classifierl  This YAML-encoded structure... $ puppet-nodeclassifier 85206671.onappcdn.com classes: [base, nginx ] parameters: { edge_secret_key: 86zFsrM7Ma, monitoring_domain: monitoring.alpha.onappcdn.com }l  … is equivalent to this textual manifest: node 85206671.onappcdn.com { $edge_secret_key = “86zFsrM7Ma” $monitoring_domain = “monitoring.alpha.onappcdn.com” include base include nginx }
  25. 25. Edge Servers Storedconfigsl  Puppet stores facts about the edge servers into MySQL.l  We make minimal use of this – for example sizing nginxs in-memory cache depending on the amount of memory it has.l  Could probably use more e.g. set # threads based on cpu core count.l  The datas always there if we ever want to query it...
  26. 26. Q&Al  Questions? Comments?l  P/S – final plug – were hiring sysadmins!