Your SlideShare is downloading. ×
0
Welcome           Jeffrey	  Lensen          System	  Engineer                               1
Hyves Infrastructure                            3000+	  Gentoo	  servers                          190	  func;ongroups/type...
Using Puppet                                  Since:	  January	  2007                   Puppetmasters:	  3	  Loadbalanced,...
Nagios          8	  Nagios	  hosts	  in	  distributed	  setup                                2500	  hosts         1	  Nagi...
Icinga                       Switched	  to	  Icinga	  November	  2010         Distributed	  Icinga	  setup	  doesn’t	  req...
Current monitoring setup                             Monitoring	  hosts:	  12	  (4	  per	  DC)                            ...
Problems with monitoring     Adding	  new	  checks	  meant	  manually	  edi;ng	  a	  lot	  of	  templates                 ...
Using Puppet to configure Icinga                                    Puppet	  knows	  it	  all	                            ...
Example modules/monitoring/manifests/init.pp: class monitoring {    service { "nrpe":       ensure => running,       enabl...
Example Nginx         Automa;cally	  create	  HTTP	  checks	  when	  including	  Nginx                    modules/nginx/ma...
Predefining and distributingmanifests/defines.pp:$__notifications_enabled = $systemstatus ? {    operational => "1",    fail...
Retrieving exported resources         modules/icinga/manifests/init.pp:         class icingacollect {            Nagios_ho...
Why not Tags?                      Using	  “notes”	  to	  assign	  monitoring	  host                Tagging	  caused	  pro...
Fail-safes     modules/icinga/manifests/init.pp:     class icinga {        include icingacollect         exec { "verify ne...
Deploying monitoring          Deploy	  script	  starts	  Puppet	  run	  on	  all	  monitoring	  hosts          Threaded	  ...
Downsides            Puppet	  run	  on	  Icinga	  hosts	  takes	  about	  20	  minutes             (using	  separate	  con...
Cleaning up   $fqdn = $host_to_be_removed.$domain   puppet apply      --certname $fqdn      --node_name facter      --thin...
What if something isn’t running Puppet?                              Configcheck	  check               Compares	  managemen...
Other cool stuff                               Genera;ng	  daemon	  checks                   modules/role/lib/facter/custo...
Other cool stuff                              Genera;ng	  daemon	  checks              modules/daemons/manifests/init.pp: ...
Other cool stuff                            Genera;ng	  overview	  daemon	  checksrequire net/httpmodule Puppet::Parser::F...
Other cool stuff                     Genera;ng	  overview	  daemon	  checks                   modules/icinga/manifests/noc...
The End          Ques%ons?	            Remarks?          Ideas?           Jeffrey Lensen | System Engineer | jeffrey@hyves...
Upcoming SlideShare
Loading in...5
×

Distributed monitoring at Hyves- Puppet

4,884

Published on

0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,884
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
98
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide

Transcript of "Distributed monitoring at Hyves- Puppet"

  1. 1. Welcome Jeffrey  Lensen System  Engineer 1
  2. 2. Hyves Infrastructure 3000+  Gentoo  servers 190  func;ongroups/types 3  datacenters Database  for  server  management 2
  3. 3. Using Puppet Since:  January  2007 Puppetmasters:  3  Loadbalanced,   1  for  CA  and  development Version:  2.6.1 MySQL  backend  for  (thin_)storeconfigs Nginx  +  8  Mongrel  instances  per  server 100+  modules nodes.rb  uses  management  database Puppet  run  every  morning  on  every  server 3
  4. 4. Nagios 8  Nagios  hosts  in  distributed  setup 2500  hosts 1  Nagios  master  for  web  and  aler;ng Scripts  to  generate  configura;on Management  database  for  informa;on Templates  for  service  checks Died  during  large  fallouts 4
  5. 5. Icinga Switched  to  Icinga  November  2010 Distributed  Icinga  setup  doesn’t  require  centralized  host Very  fast  standalone  Icinga-­‐web  interface Uses  database  backend REST  API Switching  was  easy  due  to  similar  configura;on 5
  6. 6. Current monitoring setup Monitoring  hosts:  12  (4  per  DC) Services:  over  83.000 Hosts:  nearly  3.500 Average  check  interval:  every  5  min NOC  monitoring  host:  1 Overview  checks  using  API Commandline  interface 6
  7. 7. Problems with monitoring Adding  new  checks  meant  manually  edi;ng  a  lot  of  templates Things  that  should  be  monitored  aren’t Won’t  realize  it  un;l  it’s  too  late No  monitoring  makes  it  harder  to  find  the  problem 7
  8. 8. Using Puppet to configure Icinga Puppet  knows  it  all   so  why  not  use  that  informa;on? Exported  resources  from  Naginator  to  define   monitoring  checks Include  the  monitoring  defini;ons  in  profiles Running  Puppet  defines  all  necessary  monitoring   checks  for  that  host 8
  9. 9. Example modules/monitoring/manifests/init.pp: class monitoring { service { "nrpe": ensure => running, enable => true } Appending $hostname in @@nagios_host { "$hostname": nagios_service definition to address => $ip prevent duplicate definitions } on monitoring hosts @@nagios_service { "NRPE $hostname": service_description => "NRPE", check_command => "check_nrpe_scripts", } } 9
  10. 10. Example Nginx Automa;cally  create  HTTP  checks  when  including  Nginx modules/nginx/manifests/init.pp: class nginx { service { "nginx": ensure => running, enable => true } @@nagios_service { "HTTP $hostname": service_description => "HTTP", check_command => "check_http", event_handler => "service_restart!nginx”, contact_groups => “admins_email, admins_sms” } } 10
  11. 11. Predefining and distributingmanifests/defines.pp:$__notifications_enabled = $systemstatus ? { operational => "1", fail => "0"}Nagios_host { Nagios_service { ensure => present, ensure => present, host_name => $hostname.$domain, host_name => $hostname.$domain, hostgroups => $role, use => "generic-service", #our standard template use => "generic-host", #our standard template notifications_enabled => $__notifications_enabled, alias => $hostname, target => "/etc/icinga/puppetgenerated/services/ notifications_enabled => $__notifications_enabled, $hostname.cfg", target => "/etc/icinga/puppetgenerated/hosts/ notes => $monitoringhost$hostname.cfg", } notes => $monitoringhost} 11
  12. 12. Retrieving exported resources modules/icinga/manifests/init.pp: class icingacollect { Nagios_host <<| notes == "$hostname" |>> { require => File["/etc/icinga/puppetgenerated/hosts"] } Nagios_service <<| notes == "$hostname" |>> { require => File["/etc/icinga/puppetgenerated/services"] } } 12
  13. 13. Why not Tags? Using  “notes”  to  assign  monitoring  host Tagging  caused  problems  when  seing  require  in   Nagios_host  and  Nagios_service Tagging  meant  redefining,  it’s  not  inherited   Solu;on:  stages  (?) 13
  14. 14. Fail-safes modules/icinga/manifests/init.pp: class icinga { include icingacollect exec { "verify new cfg": command => "/usr/bin/icinga -v /etc/icinga/verify-puppetgenerated.cfg", require => Class["icingacollect"] } exec { "mv cfgs": command => "rm -rf /etc/icinga/puppet/* ; mv /etc/icinga/puppetgenerated/* /etc/icinga/puppet/", require => Exec["verify new cfg"] } exec { "restart icinga": command => ""/usr/bin/printf [] RESTART_PROGRAMn > /var/icinga/rw/icinga.cmd"", require => [ Exec["mv cfgs"], Service["icinga"] ] } } 14
  15. 15. Deploying monitoring Deploy  script  starts  Puppet  run  on  all  monitoring  hosts Threaded  with  small  sleep  in  between  start  to  prevent   thundering  herd  on  Puppet  masters Waits  for  all  puppet  runs  to  finish  and  reports  whether   they  were  successful  or  not 15
  16. 16. Downsides Puppet  run  on  Icinga  hosts  takes  about  20  minutes (using  separate  config  files  for  each  host  helps) Modifying  a  servicecheck  requires  a  puppet  run  on  all  hosts   with  that  servicecheck   (solu;on:  use  -­‐-­‐noop) Cleaning  up  old  resources 16
  17. 17. Cleaning up $fqdn = $host_to_be_removed.$domain puppet apply --certname $fqdn --node_name facter --thin_storeconfigs $dbsettings --execute resources { ["nagios_service","nagios_host"]: purge => true } 17
  18. 18. What if something isn’t running Puppet? Configcheck  check Compares  management  database  with  Icinga  API 18
  19. 19. Other cool stuff Genera;ng  daemon  checks modules/role/lib/facter/customfacters.rb: Facter.add("hyves_daemons") do daemons = ["None"] if File::exists?( "/<path_to_config>/daemons.conf" ) daemons = [] daemonarray = [] daemonconf = %x{grep name /<path_to_config>/daemons.conf} for daemon in daemonconf daemon.sub!(/.** name:/, ) daemonarray.push(daemon.chomp) end end setcode do daemonarray.uniq end end 19
  20. 20. Other cool stuff Genera;ng  daemon  checks modules/daemons/manifests/init.pp: class daemons { define add_daemon_check { @@nagios_service { "$name Daemon $hostname": use => "Daemon-check", service_description => "$name Daemon", check_command => "check_daemon!$name" } } add_daemon_check { $hyves_daemons: } } 20
  21. 21. Other cool stuff Genera;ng  overview  daemon  checksrequire net/httpmodule Puppet::Parser::Functions newfunction(:get_daemons, :type => :rvalue, :docs => " This function returns an array of all current hyves_autodaemons, based on the Icinga API ") do |args| domain = "<domain_of_icinga_web>" url = "/icinga-web/web/api/service/filter[AND(SERVICE_NAME%7Clike%7C*Daemon)]/columns[SERVICE_NAME]/order[SERVICE_NAME;ASC]/authkey=<api_key>/json" response = Net::HTTP.get_response(domain, url) data = response.body results = PSON.parse(data) daemons = Array.new results.each { |result| daemon = result[SERVICE_NAME] daemon.sub!(/ Daemon/, ) daemons << daemon } daemons.uniq endend 21
  22. 22. Other cool stuff Genera;ng  overview  daemon  checks modules/icinga/manifests/noc.pp: $__daemons = get_daemons() templatefile { "/etc/icinga/puppetgenerated/other/daemons.cfg": template => template("icinga/daemons.cfg.erb") } hyvesdaemons.cfg.erb: define host{ use generic-host host_name daemons alias daemons address www.hyves.nl } <% __daemons.each do |daemon| -%> define service{ use DaemonOverview-check host_name daemons service_description <%= daemon %> } <% end -%> 22
  23. 23. The End Ques%ons?   Remarks? Ideas? Jeffrey Lensen | System Engineer | jeffrey@hyves.nl 23
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×