Nagios Conference 2013 - Mike Weber - Distributed Monitoring with Raspberry Pi
Upcoming SlideShare
Loading in...5
×
 

Nagios Conference 2013 - Mike Weber - Distributed Monitoring with Raspberry Pi

on

  • 1,773 views

Mike Weber's presentation on Distributed Monitoring with Raspberry Pi. ...

Mike Weber's presentation on Distributed Monitoring with Raspberry Pi.
The presentation was given during the Nagios World Conference North America held Sept 20-Oct 2nd, 2013 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna

Statistics

Views

Total Views
1,773
Views on SlideShare
1,773
Embed Views
0

Actions

Likes
0
Downloads
36
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Nagios Conference 2013 - Mike Weber - Distributed Monitoring with Raspberry Pi Nagios Conference 2013 - Mike Weber - Distributed Monitoring with Raspberry Pi Presentation Transcript

  • Distributed Monitoring with Raspberry Pi Mike Weber mweber@spidertools.com
  • 2013 2 The Problem: Remote Monitoring at Low Cost Limited Service Checks Limited Cost Low Power Usage Central Nagios Server Low Tech Skills
  • 2013 3 Possible Solutions Virtual Container Requires VMWare etc. Requires Expertise to Configure Nagios Hardware Cost Resource Waste Tech Skills Required (RAID, Nagios Config) Passive Checks Scripts on Hosts (more resources than compiled plugins) Tech Skills
  • 2013 4 Possible Solutions: ITX Mini-ITX ($400-600) 6.7 x 6.7 inch motherboard developed by VIA in 2001 Intel Atom 1.8 GHz Processor 2 GB of RAM SSD 60 Watt Power Supply Nano-ITX ($500-700) 4.7 x 4.7 inch motherboard developed by VIA in 2003 VIA 1.2 GHz Processor 1 GB of RAM SSD 60 Watt Power Supply Pico-ITX ($600-700) 3.9 x 2.8 inch motherboard developed by VIA in 2007
  • Raspberry PiRaspberry Pi
  • 2013 6 Raspberry Pi Low Cost $75.00 (board, case, power supply) Low Power Usage Power Usage of a Cell Phone Low Tech Skills Clone Disks Distributed Model Flexible Low Cost on Nagios Server
  • 2013 7 Pi: 512 RAM 700MHz
  • 2013 8 Installation of wheezy-raspbian Download the image file which is about 500 MB: http://www.raspberrypi.org/downloads  Verify the Image   sha1sum 2013­02­09­wheezy­raspbian.zip  b4375dc9d140e6e48e0406f96dead3601fac6c81  2013­02­09­wheezy­raspbian.zip  Unzip the Image unzip 2013­02­09­wheezy­raspbian.zip  Archive:  2013­02­09­wheezy­raspbian.zip    inflating: 2013­02­09­wheezy­raspbian.img  Username: pi  Password: raspberry  Verify Disk Location su ­  fdisk ­l  Disk /dev/sdd: 4102 MB, 4102889984 bytes  255 heads, 63 sectors/track, 498 cylinders  Units = cylinders of 16065 * 512 = 8225280 bytes  Sector size (logical/physical): 512 bytes / 512 bytes  I/O size (minimum/optimal): 512 bytes / 512 bytes  Disk identifier: 0x295b8178     Device Boot      Start         End      Blocks   Id  System  /dev/sdd1               1         497     3992135+   b  W95 FAT32  Create Disk dd bs=4M if=~/2012­10­28­wheezy­raspbian.img of=/dev/sdd 
  • 2013 9 Network Configuration: Wireless Edimax Wireless 802.11b/g/n (supports WPS,WPA2,802.1x) * works out of the box /etc/network/interfaces auto lo iface lo inet loopback iface eth0 inet dhcp allow­hotplug wlan0 iface wlan0 inet dhcp   wpa­ssid pi   wpa­psk Pi89YQbg56)
  • Mod-GearmanMod-Gearman
  • 2013 11 Why Mod-Gearman? Distributes Tasks to Multiple Workers Multiple Pi Workers Supports Multiple Programming Languages C, Java, Perl, PHP, Python, Shell Provides a Distributed Model Client Uses Very Small Resources In Contrast to DNX Workers
  • 2013 12 Why Not DNX? Not Currently Updated (2010-4-13) Uses UDP (less dependable) Client Uses More Resources DNX Worker Mod-Gearman Worker 0 50 100 150 200 250 Memory in MB
  • 2013 13 NEB: Nagios Event Broker
  • 2013 14 Mod-Gearman
  • 2013 15 Installation of Mod-Gearman on Pi Install Prerequisites sudo apt­get update  sudo apt­get install gearman mod­gearman­worker libgearman6 nagios­plugins  cd /etc/mod­gearman Edit the worker.conf sudo nano worker.conf server=192.168.5.212:4730 key=Modlinux23 hosts=no services=no eventhandlers=no min­worker=6 max­worker=8 servicegroups=pi_srv logfile=/var/log/mod_gearman/mod_gearman_worker.log p1_file=/usr/share/mod­gearman/mod_gearman_p1.pl Save your changes and then start the Mod-Gearman worker: sudo /etc/init.d/mod­gearman­worker start
  • 2013 16 Gearman Resource Usage ps axo pid,ppid,pcpu,size,cmd|grep gearman Process Parent CPU Memory CMD  1747     1   0.0  1224  /usr/sbin/mod_gearman_worker   3255   1747   2.5  1488  /usr/sbin/mod_gearman_worker (working)   3256   1747   6.6  1488  /usr/sbin/mod_gearman_worker (working)   3257   1747   7.0  1488  /usr/sbin/mod_gearman_worker (working)   3258   1747   0.0  1356  /usr/sbin/mod_gearman_worker   3259   1747   0.0  1356  /usr/sbin/mod_gearman_worker   3260   1747   0.0  1356  /usr/sbin/mod_gearman_worker size = virtual size of the process (code+data+stack) 
  • 2013 17 Mod-Gearman Queues
  • 2013 18 Mod-Gearman
  • 2013 19 Worker Capacity 75-100 Service Checks 5 Minute Intervals Compiled Plugins 6 Workers 2 Workers Always Available
  • 2013 20 Mod-Gearman Worker Configuration Worker Identifier Unique identifier for worker, hostname min-worker Minimum number of total workers max-worker Maximum number of total workers idle-timeout Time in seconds before idle worker exits max-jobs Maximum number of jobs before worker exits
  • 2013 21 Install Process Install Nagios Event Broker broker_module=/usr/local/lib/mod_gearman/mod_gearman.o  config=/etc/mod_gearman/mod_gearman_neb.conf  Install Server: gearmand /etc/init.d/gearmand start Install Worker: mod_gearman_worker /etc/init.d/mod_gearman_worker start Configuration File /etc/mod_gearman/mod_gearman_neb.conf
  • Distributed MonitoringDistributed Monitoring
  • 2013 23 Distributed Monitoring
  • 2013 24 Distributed Monitoring: Hostgroups Server Configuration: /etc/mod_gearman/mod_gearman_neb.conf server=localhost:4730 eventhandler=yes services=yes hosts=yes hostgroups=debian-servers encryption=yes key=linux23_Qg549K Pi Worker Configuration: /etc/mod-gearman/worker.conf server=192.168.5.99:4730 eventhandler=no services=no hosts=no min-worker=6 max-worker=8 encryption=yes key=linux23_Qg549K p1_file=/usr/share/mod­gearman/mod_gearman_p1.pl hostgroups=debian­servers
  • 2013 25 Distributed Monitoring: Servicegroups Server Configuration: /etc/mod_gearman/mod_gearman_neb.conf server=localhost:4730 eventhandler=yes services=yes hosts=yes servicegroups=pi_srv encryption=yes key=linux23_Qg549K Pi Worker Configuration: /etc/mod-gearman/worker.conf server=192.168.5.99:4730 eventhandler=no services=no hosts=no min-worker=6 max-worker=8 encryption=yes key=linux23_Qg549K p1_file=/usr/share/mod­gearman/mod_gearman_p1.pl servicegroups=pi_srv
  • Performance Tuning PiPerformance Tuning Pi
  • 2013 27 noatime mtime contents of file changed ctime inode changed (permissions,ownership) atime accessed time forces a write /etc/fstab proc            /proc           proc    defaults          0       0 /dev/mmcblk0p1  /boot           vfat    defaults          0       2 /dev/mmcblk0p2  /               ext4    defaults,noatime  0       1 mount ­o remount / Verify Changes with: mount
  • 2013 28 Maximize Resources Reduce Logging * Turn Off rsyslog * Minimize Logging Shutdown Other Services * mail server
  • Firewall IssuesFirewall Issues
  • 2013 30 Understanding Network Connections: Pi tcp        0      0 192.168.5.47:43965      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43948      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43964      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43962      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43960      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43977      192.168.5.212:4730      ESTABLISHED tcp        0      0 192.168.5.47:43956      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43947      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43975      192.168.5.212:4730      ESTABLISHED tcp        0      0 192.168.5.47:43969      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43978      192.168.5.212:4730      ESTABLISHED tcp        0      0 192.168.5.47:43967      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43973      192.168.5.212:4730      ESTABLISHED tcp        0      0 192.168.5.47:43959      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43951      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43961      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43957      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43963      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43976      192.168.5.212:4730      ESTABLISHED tcp        0      0 192.168.5.47:43945      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43972      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43970      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43950      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43958      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43952      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43955      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43954      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43946      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43966      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43968      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43953      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43979      192.168.5.212:4730      ESTABLISHED tcp        0      0 192.168.5.47:43971      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43974      192.168.5.212:4730      ESTABLISHED   
  • 2013 31 Understanding Network Connections: Nagios tcp        0      0 0.0.0.0:4730                0.0.0.0:*                   LISTEN         tcp        0      0 192.168.5.212:4730          192.168.5.47:44254          ESTABLISHED  tcp        0      0 192.168.5.212:4730          192.168.5.47:44258          ESTABLISHED  tcp        0      0 192.168.5.212:4730          192.168.5.47:44257          ESTABLISHED  tcp        0      0 192.168.5.212:4730          192.168.5.47:44255          ESTABLISHED  tcp        0      0 192.168.5.212:4730          192.168.5.47:44259          ESTABLISHED  tcp        0      0 192.168.5.212:4730          192.168.5.47:44253          ESTABLISHED  tcp        0      0 192.168.5.212:4730          192.168.5.47:44256          ESTABLISHED    
  • Creating ChecksCreating Checks
  • 2013 33 Create Service Check
  • 2013 34 Create Servicegroup
  • 2013 35 Add Services to Servicegroup
  • 2013 36 Graphing with Pi Checks
  • Monitoring PiMonitoring Pi
  • 2013 38 Monitor Pi: Workers and Jobs Create a Script on Nagios to Monitor Workers and Jobs #!/bin/bash check_gearman -H 192.168.5.99 -q worker_raspberrypi -t 10 -s check
  • 2013 39 Monitor Pi: Service Check
  • 2013 40 Monitor Gearman Workers
  • 2013 41 Monitor Gearman Workers/Jobs
  • 2013 42 Warning Signals Nagios Server: Check Latency Nagios Server: Orphaned Checks service check orphaned, is the mod-gearman worker on queue 'servicegroup_pi' running? Pi: Load Over 1 1= 100% Pi: Defunct Workers 15824 14129 2.1 0 [mod_gearman_wor] <defunct>
  • 2013 43 Pi: Overloaded Load Approaching Limit ps axo pid,ppid,pcpu,size,cmd|grep gearman|grep ­v grep pid   ppid  pcpu  size  cmd 14129     1  0.0  1224 /usr/sbin/mod_gearman_worker  15634 14129 12.0  1488 /usr/sbin/mod_gearman_worker  15635 14129 12.0  1488 /usr/sbin/mod_gearman_worker  15636 14129 12.0  1488 /usr/sbin/mod_gearman_worker  15637 14129 13.0  1488 /usr/sbin/mod_gearman_worker  15638 14129 12.0  1488 /usr/sbin/mod_gearman_worker  15639 14129 12.0  1488 /usr/sbin/mod_gearman_worker 15640 14129 12.0  1488 /usr/sbin/mod_gearman_worker  15641 14129 11.0  1488 /usr/sbin/mod_gearman_worker 15642 14129 11.0  1488 /usr/sbin/mod_gearman_worker  Increased CPU Usage Indicating Impending DOOM ps axo pid,ppid,pcpu,size,cmd|grep gearman|grep ­v grep pid   ppid   pcpu size  cmd 14129     1  0.0  1224 /usr/sbin/mod_gearman_worker  15658 14129  2.1  1488 /usr/sbin/mod_gearman_worker  15659 14129  2.1  1488 /usr/sbin/mod_gearman_worker  15660 14129  2.1  1488 /usr/sbin/mod_gearman_worker  15661 14129  2.1  1488 /usr/sbin/mod_gearman_worker  15662 14129  2.1  1488 /usr/sbin/mod_gearman_worker  15663 14129  2.1  1488 /usr/sbin/mod_gearman_worker  15664 14129 21.0  1488 /usr/sbin/mod_gearman_worker  15665 14129 21.0  1488 /usr/sbin/mod_gearman_worker  15666 14129 21.0  1488 /usr/sbin/mod_gearman_worker 
  • 2013 44 Plugin Resource Usage: RAM Compiled NSCA NSClient++ SSH Perl 0 2 4 6 8 10 12 RAM
  • 2013 45 Plugin Resource Use: Time Example: check_ping PID PPID CPU RAM Time Command 12106 12105 0.0 280 00:01 25 /usr/lib/nagios/plugins/check_ping -H 192.168.5.220 -w 3000.0,80% -c 5000.0,100% -p 5 12106 12105 0.0 280 00:02 25 /usr/lib/nagios/plugins/check_ping -H 192.168.5.220 -w 3000.0,80% -c 5000.0,100% -p 5 12106 12105 0.0 280 00:03 25 /usr/lib/nagios/plugins/check_ping -H 192.168.5.220 -w 3000.0,80% -c 5000.0,100% -p 5
  • 2013 46 Plugins Resource Hog: Network Bandwidth CPU   RAM        Time                         Plugin 13.0  7696       00:01  20 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   6.5  7696       00:02  20 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   4.3  7696       00:03  20 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   3.2  7696       00:04  20 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   2.6  7696       00:05  20 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   2.1  7696       00:06  15 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   1.8  7696       00:07  15 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl  1.6  7696       00:08  15 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   1.4  7696       00:09  15 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   1.3  7696       00:10  15 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl
  • 2013 47 Latency Evaluation Turn On Debug=1  [2013­08­20 10:24:36][11574][DEBUG] received job for queue servicegroup_pi_srv: centos ­ FTP [2013­08­20 10:24:36][11574][DEBUG] service: 'centos' ­ 'FTP', next_check is at 2013­08­20  10:24:36, latency so far: 0 [2013­08­20 10:25:17][11574][DEBUG] received job for queue servicegroup_pi_srv: centos ­ HTTP [2013­08­20 10:25:17][11574][DEBUG] service: 'centos' ­ 'HTTP', next_check is at 2013­08­20  10:25:17, latency so far: 0 [2013­08­20 10:25:17][11574][DEBUG] service job completed: centos HTTP: 2
  • 2013 48 Troubleshooting: Return code 127 CRITICAL: Return code of 127 is out of bounds. Make sure the plugin you're trying to run actually  exists. (worker: raspberrypi) Check the Path to the plugins directory. sudo mkdir ­p /usr/local/nagios sudo ln ­s /usr/lib/nagios/plugins /usr/local/nagios/libexec
  • Questions?Questions?