Puppet Deployment at OnApp


Published on

From PuppetCamp Southeast Asia 2012 in Kuala Lumpur, Malaysia.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Puppet Deployment at OnApp

  1. 1. Puppet Deployment at OnApp Wai Keen Woon CTO, CDN Division waikeen.woon@onapp.com
  2. 2. WARNING <ObligatoryPlug>
  3. 3. About OnApp A leading provider of software for hostsThe leading cloud The instant global CDN for hostsmanagement software forhosts OnApp launched July 1st 2010 Deep industry knowledge Backed by LDC 100+ employees in US, EU, APAC
  4. 4. Vital Statistics 1 in 3 public clouds 800+ cloud deployments 300+ global clients
  5. 5. Customer Stories
  6. 6. Instant CDN that gives you…75+ PoPslow cost, high marginget paid for idle capacity
  7. 7. OK. </ObligatoryPlug>
  8. 8. Systems Overviewl  Core & Development l  ~20 physical servers l  ~200 VMs l  Homogeneous environment – 64-bit Debian everywhere l  Mainly use OpenVZ and KVM for virtualizationl  CDN Delivery Edge Servers l  100+ servers in 60+ cities l  Running on the OnApp platform – either Xen or KVMl  Puppet integral to our setup – since day 1
  9. 9. Why Puppet?l  More reliable configuration of servers. Less need to “run ssh in a for loop” and miss out something.l  Self-documenting – our manifests are almost able to bootstrap an empty server. l  Our manifests cant bootstrap an empty environment yet. l  Limitation – manifests describe what/where/how something is setup, but doesnt describe *why*.l  Nice syntax – easy on the eyes. Comprehensive builtin resource types. Able to fallback to dumb ways of doing things if required (use file, exec et al).
  10. 10. Core Infra Environmentsl  Systems manifest describes everything.l  Three environments: β
  11. 11. What Would OnApp Setup...l  Essential utilities (tcpdump, less, vim, etc).l  Users & their SSH keys, sudoers. l  Developers shell => /bin/false if productionl  Base firewall rules.l  Nagios agent.l  Set uniform locality settings: UTC timezone, en_US.UTF-8 locale.l  SMTP that smarthosts to our central relay.l  Syslogd for remote logs to central logging server.l  Finally, the services.
  12. 12. Core Infra Manifest Excerpt$portal_domain = "portal.alpha.onappcdn.com" node "monitoring.alpha.onappcdn.com" {$portal_db_host = "portal.alpha.onappcdn.com" include base$portal_db_user = "aflexi_webportal" include s_db_monitoring include s_monitoring_server$auth_nameservers = { "ns1" => "", include collectd::rrdcached "ns2" => "", include s_munin "ns3" => "", include s_monitoring_alerts "ns4" => "", include s_monitoring_graph } }$monitoring_host_server = class collectd::rrdcached { package { "rrdcached": [ "monitoring.alpha.onappcdn.com", "dns.alpha.onappcdn.com" ] ensure => latest, } service { "rrdcached": BLUE – env config definitions ensure => running, RED – node definitions } GREEN – class definitions }
  13. 13. Package Repo Integrationl  Jenkins builds debs of our code and stores it into an apt repository for the environment it is built for.l  Puppet keeps packages up-to-date (ensure => latest) and restarts services on package upgrades. Puppet-agent[25431]: (/Stage[main]/Debian/Exec[apt-get-update]/returns) executed successfully puppet-agent[25431]: (/Stage[main]/Python::Aflexi::Mq/Package[python-aflexi-mqcore]/ ensure) ensure changed 7065.20120530.113915-1 to 7066.20120604.090916-1 puppet-agent[25431]: (/Stage[main]/S_mq/Service[worker-rabbitmq]) Triggered refresh from 1 events puppet-agent[25431]: Finished catalog run in 16.08 seconds
  14. 14. Nagios Integrationl  Plugs into nagios – uses “exported resources”
  15. 15. Nagios IntegrationServer manifest Nagios service manifest *collects the resources to check *exports the service that is checked @@nagios_service { "check_load_$fqdn": Nagios_service <<| tag == "onappcdn.cm" |>> {check_command => target => "/etc/n3/conf.d/services.cfg", "check_nrpe_1arg!check_load", require => Package["nagios3"], use => "generic-service", notify => Exec["reload-nagios"], host_name => $fqdn, } service_description => "check_load", tag => $domain, }
  16. 16. Nagios Integrationl  Whats logged on the nagios server when puppet runs? puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/ Nagios_host[hrm.onappcdn.com]/ensure) created puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/ Nagios_service[check_load_hrm.onappcdn.com]/ensure) created nagios3: Nagios 3.2.1 starting... (PID=5601) puppet-agent[15293]: (/Stage[main]/Nagios::Base/Exec[reload- nagios]) Triggered refresh from 8 events
  17. 17. Monitoring Puppet Itselfl  Lots of tools/dashboards out there to achieve this.l  For us: “grep -i err */syslog”. Dumb, but works until we need to Really Address it.l  Common issues: l  Puppet gets “stuck”. And only one puppet instance can run at any one time. l  Manifest errors – syntax, merge issues. l  Badly-written manifests (vague dependencies, conditions/commands not robust enough). l  An important dependent resource failing (e.g. apt-get install fails due to dpkg-configure error).
  18. 18. File/Dir Organizationl  We use git to revision control our l  Common branch Manifests/ puppet manifests. alpha.pp beta.ppl  Style we adopted mainly comes Modules/ Base/ from Hunter Haugen* Users/l  A branch for each environment, l  Alpha env branch Modules/ plus a “common” branch. Python/ Services/l  Each branch checked out as a Nameserver/ separate directory in /etc/puppet/ l  Beta env branch environments/$env Modules/ Python/l  And puppetmasters includedir Services/ Nameserver/ configured to that directory. * - http://hunnur.com/blog/2010/10/dynamic-git-branch-puppet-environments/
  19. 19. File/Dir Organizationl  Common goes into its own branch – for convenience; less merging needed for manifests that we are Really Sure wont differ between environments.l  System manifest into common/manifests/$env.pp l  Initially tried putting manifest into alpha/beta/omega branches as site.pp – merge hell.l  Introduced extra variable - $effective_env l  Abstracts the puppet environment name, from the environment that the manifest runs in.
  20. 20. File/Dir Organizationl  Hotfixes branch off omega and merged to alpha/beta/ omega.l  Development branches off alpha l  This branch can be trialed as a separate environment (use --environment to specify custom env on puppet client). l  Merge to alpha → beta → omega. l  Or merge as feature branch to any other environment.l  “git diff branchA branchB” - differences are shown clearly between environments.
  21. 21. Edge Serversl  Our edge servers are hosted on OnApp cloud (only).l  When creating an edge server, the cloud control panel l  Instantiates a VM from a lightly-customized Debian image. l  Configures the package repositories. l  Issues a puppet run to set up.l  Advantage of setting it up through puppet instead of a “gold image” - our system can be installed on bare metal if needed, can be reproducibly installed on $future_debian_release
  22. 22. Edge Serversl  Our edge servers are hosted on OnApp cloud (only).l  When creating an edge server, the control panel instantiates a VM from a lightly-customized Debian image, and issues a puppet run to set it up.
  23. 23. Edge Servers – External Node Classifierl  No text manifest – all code, using “external node classifier”.l  Assign variables and classes specific to the edge server through node classifier. E.g. its password, the services it runs.l  In python, output = {} output[“classes”] = [ “class1”, “class2” ] output[“parameters”] = { “param1”: “value1” } print yaml.dump(output)
  24. 24. Edge Servers – External Node Classifierl  This YAML-encoded structure... $ puppet-nodeclassifier 85206671.onappcdn.com classes: [base, nginx ] parameters: { edge_secret_key: 86zFsrM7Ma, monitoring_domain: monitoring.alpha.onappcdn.com }l  … is equivalent to this textual manifest: node 85206671.onappcdn.com { $edge_secret_key = “86zFsrM7Ma” $monitoring_domain = “monitoring.alpha.onappcdn.com” include base include nginx }
  25. 25. Edge Servers Storedconfigsl  Puppet stores facts about the edge servers into MySQL.l  We make minimal use of this – for example sizing nginxs in-memory cache depending on the amount of memory it has.l  Could probably use more e.g. set # threads based on cpu core count.l  The datas always there if we ever want to query it...
  26. 26. Q&Al  Questions? Comments?l  P/S – final plug – were hiring sysadmins!