Your SlideShare is downloading. ×
Nagios Conference 2012 - Mike Weber - Failover
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Nagios Conference 2012 - Mike Weber - Failover

1,814
views

Published on

Mike Weber's presentation on using Nagios and High Availability. …

Mike Weber's presentation on using Nagios and High Availability.
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna

Published in: Technology

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,814
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
74
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. High Availability For Nagios Mike Weber mweber@spidertools.com
  • 2. Alternatives Daily Image Creation for Restore (VMWare, etc.) - lose parts of history - create gaps in monitoring with image creation rsync to Synchronize Servers - requires IP address, hostname changes - requires modification of nagios.cfg - assumes Master will never be misconfigured - rsync can use a lot of resources Clustered Nagios Server 2012 2
  • 3. Alternatives: Redundant Monitoring 2012 3
  • 4. Alternatives: Redundant Monitoring 2012 4
  • 5. Alternatives: Failover 2012 5
  • 6. Alternatives: Failover 2012 6
  • 7. Perfect Solution: Does Not Exist
  • 8. High Availability: Outline of Goals Create Master/Slave Relationship Master Sends History to the Slave Slave Not Check Services, Hosts or Notifications Slave Monitors Master via Script Slave Enables Host, Service Checks and Notifications Slave Disables All Checks when Master is Up Simplicity 2012 8
  • 9. Failover and Performance Enhancement 2012 9
  • 10. Test Server: Puppet Master 2012 10
  • 11. Step #1: Clone Master to Slave Backup Master Databases and Files - MySQL databases - Postgres database Backup Files - /usr/local/nagios - /usr/local/nagiosxi Install all dependencies for plugins Enable Access from Slave on all devices 2012 11
  • 12. Step #2: Disable Slave Edit nagios.cfg execute_host_checks=0 execute_service_checks=0 enable_notifications=0 Save and Restart Nagios 2012 12
  • 13. Step #3: Enable NSCA Master Sends History via NSCA - edit nagios.cfg (save and restart Nagios) obsess_over_hosts=1 obsess_over_services=1 Slave Maintains History via NSCA - install NSCA daemon on slave - allow connections from Master 2012 13
  • 14. Master: Allow Outbound Transfers 2012 14
  • 15. Master: Outbound Config File Found in /usr/local/nagios/etc send_nsca-192.168.5.211.cfg # CONFIGURED BY NAGIOS XI password=LMb674FcsswP encryption_method=3 2012 15
  • 16. Slave: NSCA Config default: on # description: NSCA (Nagios Service Check Acceptor) service nsca { flags = REUSE socket_type = stream wait = no user = nagios group = nagios server = /usr/local/nagios/bin/nsca server_args = -c /usr/local/nagios/etc/nsca.cfg --inetd log_on_failure += USERID disable = no only_from = 127.0.0.1 192.168.5.211 } 2012 16
  • 17. Slave: Allow Inbound Transfers 2012 17
  • 18. Step #4: Slave Monitor Master via SSH Create SSH Keys on Slave - push public key to master Create authorized_hosts file on Master Implement SSH script to check Master - passwordless login - set on a cron job (check every minute) - script detects status of Master - scripts turns on/off checks and notifications 2012 18
  • 19. Create Key Pairsu – nagiosmkdir .sshcd .sshssh-keygen -b 1024 -f id_dsa -t dsa -N Generating public/private dsa key pair.Your identification has been saved in id_dsa.Your public key has been saved in id_dsa.pub.The key fingerprint is:61:23:17:2d:83:d8:d9:f9:87:2d:e1:6d:e6:3d:cb:5c nagios@slxiThe keys randomart image is:+--[ DSA 1024]----+| o +.o || . + =.o || . == = || + o= * || S *. || . o E|| o+|| + || |+-----------------+ 2012 19
  • 20. Push Public Key to nagios user on Master scp id_dsa.pub nagios@192.168.5.211:/home/nagios/.ssh/slave This means that the nagios user must have a /home/nagios/.ssh directory. The public key name is changed to “slave” to avoid overwriting any keys. On the master (as the nagios user): cat slave >> authorized_keys chmod 644 authorized_keys 2012 20
  • 21. Slave: Cron Job# /etc/cron.d/nagiosxi: crontab fragment for nagiosxi* * * * * nagios /bin/sh /usr/local/nagios/libexec/eventhandlers/check_master.sh 2012 21
  • 22. Slave: check_master.sh#!/bin/bashmasterip=192.168.5.210function disable () {sed -i s/execute_host_checks=1/execute_host_checks=0/ /usr/local/nagios/etc/nagios.cfgsed -i s/execute_service_checks=1/execute_service_checks=0/ /usr/local/nagios/etc/nagios.cfgsed -i s/enable_notifications=1/enable_notifications=0/ /usr/local/nagios/etc/nagios.cfg/sbin/service nagios reload}function enable () {sed -i s/execute_host_checks=0/execute_host_checks=1/ /usr/local/nagios/etc/nagios.cfgsed -i s/execute_service_checks=0/execute_service_checks=1/ /usr/local/nagios/etc/nagios.cfgsed -i s/enable_notifications=0/enable_notifications=1/ /usr/local/nagios/etc/nagios.cfg/sbin/service nagios reload}nagpid=$(ssh nagios@$masterip /etc/init.d/nagios status | grep running |wc -l)if [ $nagpid -eq 0 ]; thenecho "Starting Checks"enablefiif [ $nagpid -eq 1 ]; thenecho "Stopping Checks"disablefiexit 0 2012 22
  • 23. Assumptions: Based on Simplicity Mature Implementation -set up once implementation of network is primarily complete Master Down Short Amount of Time - slave not send history to Master on return Master and Slave Independent of Updates - no rsync - guarantees integrity of one system 2012 23
  • 24. Master 2012 24
  • 25. Slave 2012 25
  • 26. Master: Service States 2012 26
  • 27. Slave: Service States 2012 27
  • 28. Problems
  • 29. NSCA: Version 2.9.1 Plugin Buffer is Larger * NSCA Server Receives OK * NSCA Sending Adds Wrong Information Replace with Version 2.7.2 on Master * send_nsca * Located in /usr/local/nagios/libexec 2012 29
  • 30. Questions?