Distributed monitoring at Hyves- Puppet
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,375
On Slideshare
4,802
From Embeds
573
Number of Embeds
8

Actions

Shares
Downloads
94
Comments
0
Likes
10

Embeds 573

http://lanyrd.com 505
http://puppetlabs.com 44
http://twitter.com 10
https://twitter.com 4
https://puppetlabs.com 4
http://a0.twimg.com 3
http://pinterest.com 2
http://paper.li 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Welcome Jeffrey  Lensen System  Engineer 1
  • 2. Hyves Infrastructure 3000+  Gentoo  servers 190  func;ongroups/types 3  datacenters Database  for  server  management 2
  • 3. Using Puppet Since:  January  2007 Puppetmasters:  3  Loadbalanced,   1  for  CA  and  development Version:  2.6.1 MySQL  backend  for  (thin_)storeconfigs Nginx  +  8  Mongrel  instances  per  server 100+  modules nodes.rb  uses  management  database Puppet  run  every  morning  on  every  server 3
  • 4. Nagios 8  Nagios  hosts  in  distributed  setup 2500  hosts 1  Nagios  master  for  web  and  aler;ng Scripts  to  generate  configura;on Management  database  for  informa;on Templates  for  service  checks Died  during  large  fallouts 4
  • 5. Icinga Switched  to  Icinga  November  2010 Distributed  Icinga  setup  doesn’t  require  centralized  host Very  fast  standalone  Icinga-­‐web  interface Uses  database  backend REST  API Switching  was  easy  due  to  similar  configura;on 5
  • 6. Current monitoring setup Monitoring  hosts:  12  (4  per  DC) Services:  over  83.000 Hosts:  nearly  3.500 Average  check  interval:  every  5  min NOC  monitoring  host:  1 Overview  checks  using  API Commandline  interface 6
  • 7. Problems with monitoring Adding  new  checks  meant  manually  edi;ng  a  lot  of  templates Things  that  should  be  monitored  aren’t Won’t  realize  it  un;l  it’s  too  late No  monitoring  makes  it  harder  to  find  the  problem 7
  • 8. Using Puppet to configure Icinga Puppet  knows  it  all   so  why  not  use  that  informa;on? Exported  resources  from  Naginator  to  define   monitoring  checks Include  the  monitoring  defini;ons  in  profiles Running  Puppet  defines  all  necessary  monitoring   checks  for  that  host 8
  • 9. Example modules/monitoring/manifests/init.pp: class monitoring { service { "nrpe": ensure => running, enable => true } Appending $hostname in @@nagios_host { "$hostname": nagios_service definition to address => $ip prevent duplicate definitions } on monitoring hosts @@nagios_service { "NRPE $hostname": service_description => "NRPE", check_command => "check_nrpe_scripts", } } 9
  • 10. Example Nginx Automa;cally  create  HTTP  checks  when  including  Nginx modules/nginx/manifests/init.pp: class nginx { service { "nginx": ensure => running, enable => true } @@nagios_service { "HTTP $hostname": service_description => "HTTP", check_command => "check_http", event_handler => "service_restart!nginx”, contact_groups => “admins_email, admins_sms” } } 10
  • 11. Predefining and distributingmanifests/defines.pp:$__notifications_enabled = $systemstatus ? { operational => "1", fail => "0"}Nagios_host { Nagios_service { ensure => present, ensure => present, host_name => $hostname.$domain, host_name => $hostname.$domain, hostgroups => $role, use => "generic-service", #our standard template use => "generic-host", #our standard template notifications_enabled => $__notifications_enabled, alias => $hostname, target => "/etc/icinga/puppetgenerated/services/ notifications_enabled => $__notifications_enabled, $hostname.cfg", target => "/etc/icinga/puppetgenerated/hosts/ notes => $monitoringhost$hostname.cfg", } notes => $monitoringhost} 11
  • 12. Retrieving exported resources modules/icinga/manifests/init.pp: class icingacollect { Nagios_host <<| notes == "$hostname" |>> { require => File["/etc/icinga/puppetgenerated/hosts"] } Nagios_service <<| notes == "$hostname" |>> { require => File["/etc/icinga/puppetgenerated/services"] } } 12
  • 13. Why not Tags? Using  “notes”  to  assign  monitoring  host Tagging  caused  problems  when  seing  require  in   Nagios_host  and  Nagios_service Tagging  meant  redefining,  it’s  not  inherited   Solu;on:  stages  (?) 13
  • 14. Fail-safes modules/icinga/manifests/init.pp: class icinga { include icingacollect exec { "verify new cfg": command => "/usr/bin/icinga -v /etc/icinga/verify-puppetgenerated.cfg", require => Class["icingacollect"] } exec { "mv cfgs": command => "rm -rf /etc/icinga/puppet/* ; mv /etc/icinga/puppetgenerated/* /etc/icinga/puppet/", require => Exec["verify new cfg"] } exec { "restart icinga": command => ""/usr/bin/printf [] RESTART_PROGRAMn > /var/icinga/rw/icinga.cmd"", require => [ Exec["mv cfgs"], Service["icinga"] ] } } 14
  • 15. Deploying monitoring Deploy  script  starts  Puppet  run  on  all  monitoring  hosts Threaded  with  small  sleep  in  between  start  to  prevent   thundering  herd  on  Puppet  masters Waits  for  all  puppet  runs  to  finish  and  reports  whether   they  were  successful  or  not 15
  • 16. Downsides Puppet  run  on  Icinga  hosts  takes  about  20  minutes (using  separate  config  files  for  each  host  helps) Modifying  a  servicecheck  requires  a  puppet  run  on  all  hosts   with  that  servicecheck   (solu;on:  use  -­‐-­‐noop) Cleaning  up  old  resources 16
  • 17. Cleaning up $fqdn = $host_to_be_removed.$domain puppet apply --certname $fqdn --node_name facter --thin_storeconfigs $dbsettings --execute resources { ["nagios_service","nagios_host"]: purge => true } 17
  • 18. What if something isn’t running Puppet? Configcheck  check Compares  management  database  with  Icinga  API 18
  • 19. Other cool stuff Genera;ng  daemon  checks modules/role/lib/facter/customfacters.rb: Facter.add("hyves_daemons") do daemons = ["None"] if File::exists?( "/<path_to_config>/daemons.conf" ) daemons = [] daemonarray = [] daemonconf = %x{grep name /<path_to_config>/daemons.conf} for daemon in daemonconf daemon.sub!(/.** name:/, ) daemonarray.push(daemon.chomp) end end setcode do daemonarray.uniq end end 19
  • 20. Other cool stuff Genera;ng  daemon  checks modules/daemons/manifests/init.pp: class daemons { define add_daemon_check { @@nagios_service { "$name Daemon $hostname": use => "Daemon-check", service_description => "$name Daemon", check_command => "check_daemon!$name" } } add_daemon_check { $hyves_daemons: } } 20
  • 21. Other cool stuff Genera;ng  overview  daemon  checksrequire net/httpmodule Puppet::Parser::Functions newfunction(:get_daemons, :type => :rvalue, :docs => " This function returns an array of all current hyves_autodaemons, based on the Icinga API ") do |args| domain = "<domain_of_icinga_web>" url = "/icinga-web/web/api/service/filter[AND(SERVICE_NAME%7Clike%7C*Daemon)]/columns[SERVICE_NAME]/order[SERVICE_NAME;ASC]/authkey=<api_key>/json" response = Net::HTTP.get_response(domain, url) data = response.body results = PSON.parse(data) daemons = Array.new results.each { |result| daemon = result[SERVICE_NAME] daemon.sub!(/ Daemon/, ) daemons << daemon } daemons.uniq endend 21
  • 22. Other cool stuff Genera;ng  overview  daemon  checks modules/icinga/manifests/noc.pp: $__daemons = get_daemons() templatefile { "/etc/icinga/puppetgenerated/other/daemons.cfg": template => template("icinga/daemons.cfg.erb") } hyvesdaemons.cfg.erb: define host{ use generic-host host_name daemons alias daemons address www.hyves.nl } <% __daemons.each do |daemon| -%> define service{ use DaemonOverview-check host_name daemons service_description <%= daemon %> } <% end -%> 22
  • 23. The End Ques%ons?   Remarks? Ideas? Jeffrey Lensen | System Engineer | jeffrey@hyves.nl 23