Nagios Conference 2012 - Mike Weber - Failover

3,147 views

Published on

Mike Weber's presentation on using Nagios and High Availability.
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total views
3,147
On SlideShare
0
From Embeds
0
Number of Embeds
67
Actions
Shares
0
Downloads
99
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Nagios Conference 2012 - Mike Weber - Failover

  1. 1. High Availability For Nagios Mike Weber mweber@spidertools.com
  2. 2. Alternatives Daily Image Creation for Restore (VMWare, etc.) - lose parts of history - create gaps in monitoring with image creation rsync to Synchronize Servers - requires IP address, hostname changes - requires modification of nagios.cfg - assumes Master will never be misconfigured - rsync can use a lot of resources Clustered Nagios Server 2012 2
  3. 3. Alternatives: Redundant Monitoring 2012 3
  4. 4. Alternatives: Redundant Monitoring 2012 4
  5. 5. Alternatives: Failover 2012 5
  6. 6. Alternatives: Failover 2012 6
  7. 7. Perfect Solution: Does Not Exist
  8. 8. High Availability: Outline of Goals Create Master/Slave Relationship Master Sends History to the Slave Slave Not Check Services, Hosts or Notifications Slave Monitors Master via Script Slave Enables Host, Service Checks and Notifications Slave Disables All Checks when Master is Up Simplicity 2012 8
  9. 9. Failover and Performance Enhancement 2012 9
  10. 10. Test Server: Puppet Master 2012 10
  11. 11. Step #1: Clone Master to Slave Backup Master Databases and Files - MySQL databases - Postgres database Backup Files - /usr/local/nagios - /usr/local/nagiosxi Install all dependencies for plugins Enable Access from Slave on all devices 2012 11
  12. 12. Step #2: Disable Slave Edit nagios.cfg execute_host_checks=0 execute_service_checks=0 enable_notifications=0 Save and Restart Nagios 2012 12
  13. 13. Step #3: Enable NSCA Master Sends History via NSCA - edit nagios.cfg (save and restart Nagios) obsess_over_hosts=1 obsess_over_services=1 Slave Maintains History via NSCA - install NSCA daemon on slave - allow connections from Master 2012 13
  14. 14. Master: Allow Outbound Transfers 2012 14
  15. 15. Master: Outbound Config File Found in /usr/local/nagios/etc send_nsca-192.168.5.211.cfg # CONFIGURED BY NAGIOS XI password=LMb674FcsswP encryption_method=3 2012 15
  16. 16. Slave: NSCA Config default: on # description: NSCA (Nagios Service Check Acceptor) service nsca { flags = REUSE socket_type = stream wait = no user = nagios group = nagios server = /usr/local/nagios/bin/nsca server_args = -c /usr/local/nagios/etc/nsca.cfg --inetd log_on_failure += USERID disable = no only_from = 127.0.0.1 192.168.5.211 } 2012 16
  17. 17. Slave: Allow Inbound Transfers 2012 17
  18. 18. Step #4: Slave Monitor Master via SSH Create SSH Keys on Slave - push public key to master Create authorized_hosts file on Master Implement SSH script to check Master - passwordless login - set on a cron job (check every minute) - script detects status of Master - scripts turns on/off checks and notifications 2012 18
  19. 19. Create Key Pairsu – nagiosmkdir .sshcd .sshssh-keygen -b 1024 -f id_dsa -t dsa -N Generating public/private dsa key pair.Your identification has been saved in id_dsa.Your public key has been saved in id_dsa.pub.The key fingerprint is:61:23:17:2d:83:d8:d9:f9:87:2d:e1:6d:e6:3d:cb:5c nagios@slxiThe keys randomart image is:+--[ DSA 1024]----+| o +.o || . + =.o || . == = || + o= * || S *. || . o E|| o+|| + || |+-----------------+ 2012 19
  20. 20. Push Public Key to nagios user on Master scp id_dsa.pub nagios@192.168.5.211:/home/nagios/.ssh/slave This means that the nagios user must have a /home/nagios/.ssh directory. The public key name is changed to “slave” to avoid overwriting any keys. On the master (as the nagios user): cat slave >> authorized_keys chmod 644 authorized_keys 2012 20
  21. 21. Slave: Cron Job# /etc/cron.d/nagiosxi: crontab fragment for nagiosxi* * * * * nagios /bin/sh /usr/local/nagios/libexec/eventhandlers/check_master.sh 2012 21
  22. 22. Slave: check_master.sh#!/bin/bashmasterip=192.168.5.210function disable () {sed -i s/execute_host_checks=1/execute_host_checks=0/ /usr/local/nagios/etc/nagios.cfgsed -i s/execute_service_checks=1/execute_service_checks=0/ /usr/local/nagios/etc/nagios.cfgsed -i s/enable_notifications=1/enable_notifications=0/ /usr/local/nagios/etc/nagios.cfg/sbin/service nagios reload}function enable () {sed -i s/execute_host_checks=0/execute_host_checks=1/ /usr/local/nagios/etc/nagios.cfgsed -i s/execute_service_checks=0/execute_service_checks=1/ /usr/local/nagios/etc/nagios.cfgsed -i s/enable_notifications=0/enable_notifications=1/ /usr/local/nagios/etc/nagios.cfg/sbin/service nagios reload}nagpid=$(ssh nagios@$masterip /etc/init.d/nagios status | grep running |wc -l)if [ $nagpid -eq 0 ]; thenecho "Starting Checks"enablefiif [ $nagpid -eq 1 ]; thenecho "Stopping Checks"disablefiexit 0 2012 22
  23. 23. Assumptions: Based on Simplicity Mature Implementation -set up once implementation of network is primarily complete Master Down Short Amount of Time - slave not send history to Master on return Master and Slave Independent of Updates - no rsync - guarantees integrity of one system 2012 23
  24. 24. Master 2012 24
  25. 25. Slave 2012 25
  26. 26. Master: Service States 2012 26
  27. 27. Slave: Service States 2012 27
  28. 28. Problems
  29. 29. NSCA: Version 2.9.1 Plugin Buffer is Larger * NSCA Server Receives OK * NSCA Sending Adds Wrong Information Replace with Version 2.7.2 on Master * send_nsca * Located in /usr/local/nagios/libexec 2012 29
  30. 30. Questions?

×