Distributed Monitoring
with Raspberry Pi
Mike Weber
mweber@spidertools.com
2013 2
The Problem: Remote Monitoring at Low Cost
Limited Service Checks
Limited Cost
Low Power Usage
Central Nagios Serve...
2013 3
Possible Solutions
Virtual Container
Requires VMWare etc.
Requires Expertise to Configure Nagios
Hardware
Cost
Reso...
2013 4
Possible Solutions: ITX
Mini-ITX ($400-600)
6.7 x 6.7 inch motherboard developed by VIA in 2001
Intel Atom 1.8 GHz ...
Raspberry PiRaspberry Pi
2013 6
Raspberry Pi
Low Cost
$75.00 (board, case, power supply)
Low Power Usage
Power Usage of a Cell Phone
Low Tech Skill...
2013 7
Pi: 512 RAM 700MHz
2013 8
Installation of wheezy-raspbian
Download the image file which is about 500 MB: http://www.raspberrypi.org/downloads...
2013 9
Network Configuration: Wireless
Edimax Wireless 802.11b/g/n (supports WPS,WPA2,802.1x)
* works out of the box
/etc/...
Mod-GearmanMod-Gearman
2013 11
Why Mod-Gearman?
Distributes Tasks to Multiple Workers
Multiple Pi Workers
Supports Multiple Programming Languages...
2013 12
Why Not DNX?
Not Currently Updated (2010-4-13)
Uses UDP (less dependable)
Client Uses More Resources
DNX Worker Mo...
2013 13
NEB: Nagios Event Broker
2013 14
Mod-Gearman
2013 15
Installation of Mod-Gearman on Pi
Install Prerequisites
sudo apt­get update 
sudo apt­get install gearman mod­gear...
2013 16
Gearman Resource Usage
ps axo pid,ppid,pcpu,size,cmd|grep gearman
Process Parent CPU Memory CMD
 1747     1   0.0 ...
2013 17
Mod-Gearman Queues
2013 18
Mod-Gearman
2013 19
Worker Capacity
75-100 Service Checks
5 Minute Intervals
Compiled Plugins
6 Workers
2 Workers Always Available
2013 20
Mod-Gearman Worker Configuration
Worker Identifier
Unique identifier for worker, hostname
min-worker
Minimum numbe...
2013 21
Install Process
Install Nagios Event Broker
broker_module=/usr/local/lib/mod_gearman/mod_gearman.o 
config=/etc/mo...
Distributed MonitoringDistributed Monitoring
2013 23
Distributed Monitoring
2013 24
Distributed Monitoring: Hostgroups
Server Configuration: /etc/mod_gearman/mod_gearman_neb.conf
server=localhost:47...
2013 25
Distributed Monitoring:
Servicegroups
Server Configuration: /etc/mod_gearman/mod_gearman_neb.conf
server=localhost...
Performance Tuning PiPerformance Tuning Pi
2013 27
noatime
mtime
contents of file changed
ctime
inode changed (permissions,ownership)
atime
accessed time forces a wr...
2013 28
Maximize Resources
Reduce Logging
* Turn Off rsyslog
* Minimize Logging
Shutdown Other Services
* mail server
Firewall IssuesFirewall Issues
2013 30
Understanding Network Connections: Pi
tcp        0      0 192.168.5.47:43965      192.168.5.212:4730      TIME_WAI...
2013 31
Understanding Network Connections: Nagios
tcp        0      0 0.0.0.0:4730                0.0.0.0:*               ...
Creating ChecksCreating Checks
2013 33
Create Service Check
2013 34
Create Servicegroup
2013 35
Add Services to Servicegroup
2013 36
Graphing with Pi Checks
Monitoring PiMonitoring Pi
2013 38
Monitor Pi: Workers and Jobs
Create a Script on Nagios to Monitor Workers and Jobs
#!/bin/bash
check_gearman -H 19...
2013 39
Monitor Pi: Service Check
2013 40
Monitor Gearman Workers
2013 41
Monitor Gearman Workers/Jobs
2013 42
Warning Signals
Nagios Server: Check Latency
Nagios Server: Orphaned Checks
service check orphaned, is the mod-gea...
2013 43
Pi: Overloaded
Load Approaching Limit
ps axo pid,ppid,pcpu,size,cmd|grep gearman|grep ­v grep
pid   ppid  pcpu  si...
2013 44
Plugin Resource Usage: RAM
Compiled NSCA NSClient++ SSH Perl
0
2
4
6
8
10
12
RAM
2013 45
Plugin Resource Use: Time
Example: check_ping
PID PPID CPU RAM Time Command
12106 12105 0.0 280 00:01 25 /usr/lib/...
2013 46
Plugins Resource Hog: Network Bandwidth
CPU   RAM        Time                         Plugin
13.0  7696       00:0...
2013 47
Latency Evaluation
Turn On Debug=1 
[2013­08­20 10:24:36][11574][DEBUG] received job for queue servicegroup_pi_srv...
2013 48
Troubleshooting: Return code 127
CRITICAL: Return code of 127 is out of bounds. Make sure the plugin you're trying...
Questions?Questions?
Upcoming SlideShare
Loading in …5
×

Nagios Conference 2013 - Mike Weber - Distributed Monitoring with Raspberry Pi

2,463 views
2,215 views

Published on

Mike Weber's presentation on Distributed Monitoring with Raspberry Pi.
The presentation was given during the Nagios World Conference North America held Sept 20-Oct 2nd, 2013 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,463
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
53
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Nagios Conference 2013 - Mike Weber - Distributed Monitoring with Raspberry Pi

  1. 1. Distributed Monitoring with Raspberry Pi Mike Weber mweber@spidertools.com
  2. 2. 2013 2 The Problem: Remote Monitoring at Low Cost Limited Service Checks Limited Cost Low Power Usage Central Nagios Server Low Tech Skills
  3. 3. 2013 3 Possible Solutions Virtual Container Requires VMWare etc. Requires Expertise to Configure Nagios Hardware Cost Resource Waste Tech Skills Required (RAID, Nagios Config) Passive Checks Scripts on Hosts (more resources than compiled plugins) Tech Skills
  4. 4. 2013 4 Possible Solutions: ITX Mini-ITX ($400-600) 6.7 x 6.7 inch motherboard developed by VIA in 2001 Intel Atom 1.8 GHz Processor 2 GB of RAM SSD 60 Watt Power Supply Nano-ITX ($500-700) 4.7 x 4.7 inch motherboard developed by VIA in 2003 VIA 1.2 GHz Processor 1 GB of RAM SSD 60 Watt Power Supply Pico-ITX ($600-700) 3.9 x 2.8 inch motherboard developed by VIA in 2007
  5. 5. Raspberry PiRaspberry Pi
  6. 6. 2013 6 Raspberry Pi Low Cost $75.00 (board, case, power supply) Low Power Usage Power Usage of a Cell Phone Low Tech Skills Clone Disks Distributed Model Flexible Low Cost on Nagios Server
  7. 7. 2013 7 Pi: 512 RAM 700MHz
  8. 8. 2013 8 Installation of wheezy-raspbian Download the image file which is about 500 MB: http://www.raspberrypi.org/downloads  Verify the Image   sha1sum 2013­02­09­wheezy­raspbian.zip  b4375dc9d140e6e48e0406f96dead3601fac6c81  2013­02­09­wheezy­raspbian.zip  Unzip the Image unzip 2013­02­09­wheezy­raspbian.zip  Archive:  2013­02­09­wheezy­raspbian.zip    inflating: 2013­02­09­wheezy­raspbian.img  Username: pi  Password: raspberry  Verify Disk Location su ­  fdisk ­l  Disk /dev/sdd: 4102 MB, 4102889984 bytes  255 heads, 63 sectors/track, 498 cylinders  Units = cylinders of 16065 * 512 = 8225280 bytes  Sector size (logical/physical): 512 bytes / 512 bytes  I/O size (minimum/optimal): 512 bytes / 512 bytes  Disk identifier: 0x295b8178     Device Boot      Start         End      Blocks   Id  System  /dev/sdd1               1         497     3992135+   b  W95 FAT32  Create Disk dd bs=4M if=~/2012­10­28­wheezy­raspbian.img of=/dev/sdd 
  9. 9. 2013 9 Network Configuration: Wireless Edimax Wireless 802.11b/g/n (supports WPS,WPA2,802.1x) * works out of the box /etc/network/interfaces auto lo iface lo inet loopback iface eth0 inet dhcp allow­hotplug wlan0 iface wlan0 inet dhcp   wpa­ssid pi   wpa­psk Pi89YQbg56)
  10. 10. Mod-GearmanMod-Gearman
  11. 11. 2013 11 Why Mod-Gearman? Distributes Tasks to Multiple Workers Multiple Pi Workers Supports Multiple Programming Languages C, Java, Perl, PHP, Python, Shell Provides a Distributed Model Client Uses Very Small Resources In Contrast to DNX Workers
  12. 12. 2013 12 Why Not DNX? Not Currently Updated (2010-4-13) Uses UDP (less dependable) Client Uses More Resources DNX Worker Mod-Gearman Worker 0 50 100 150 200 250 Memory in MB
  13. 13. 2013 13 NEB: Nagios Event Broker
  14. 14. 2013 14 Mod-Gearman
  15. 15. 2013 15 Installation of Mod-Gearman on Pi Install Prerequisites sudo apt­get update  sudo apt­get install gearman mod­gearman­worker libgearman6 nagios­plugins  cd /etc/mod­gearman Edit the worker.conf sudo nano worker.conf server=192.168.5.212:4730 key=Modlinux23 hosts=no services=no eventhandlers=no min­worker=6 max­worker=8 servicegroups=pi_srv logfile=/var/log/mod_gearman/mod_gearman_worker.log p1_file=/usr/share/mod­gearman/mod_gearman_p1.pl Save your changes and then start the Mod-Gearman worker: sudo /etc/init.d/mod­gearman­worker start
  16. 16. 2013 16 Gearman Resource Usage ps axo pid,ppid,pcpu,size,cmd|grep gearman Process Parent CPU Memory CMD  1747     1   0.0  1224  /usr/sbin/mod_gearman_worker   3255   1747   2.5  1488  /usr/sbin/mod_gearman_worker (working)   3256   1747   6.6  1488  /usr/sbin/mod_gearman_worker (working)   3257   1747   7.0  1488  /usr/sbin/mod_gearman_worker (working)   3258   1747   0.0  1356  /usr/sbin/mod_gearman_worker   3259   1747   0.0  1356  /usr/sbin/mod_gearman_worker   3260   1747   0.0  1356  /usr/sbin/mod_gearman_worker size = virtual size of the process (code+data+stack) 
  17. 17. 2013 17 Mod-Gearman Queues
  18. 18. 2013 18 Mod-Gearman
  19. 19. 2013 19 Worker Capacity 75-100 Service Checks 5 Minute Intervals Compiled Plugins 6 Workers 2 Workers Always Available
  20. 20. 2013 20 Mod-Gearman Worker Configuration Worker Identifier Unique identifier for worker, hostname min-worker Minimum number of total workers max-worker Maximum number of total workers idle-timeout Time in seconds before idle worker exits max-jobs Maximum number of jobs before worker exits
  21. 21. 2013 21 Install Process Install Nagios Event Broker broker_module=/usr/local/lib/mod_gearman/mod_gearman.o  config=/etc/mod_gearman/mod_gearman_neb.conf  Install Server: gearmand /etc/init.d/gearmand start Install Worker: mod_gearman_worker /etc/init.d/mod_gearman_worker start Configuration File /etc/mod_gearman/mod_gearman_neb.conf
  22. 22. Distributed MonitoringDistributed Monitoring
  23. 23. 2013 23 Distributed Monitoring
  24. 24. 2013 24 Distributed Monitoring: Hostgroups Server Configuration: /etc/mod_gearman/mod_gearman_neb.conf server=localhost:4730 eventhandler=yes services=yes hosts=yes hostgroups=debian-servers encryption=yes key=linux23_Qg549K Pi Worker Configuration: /etc/mod-gearman/worker.conf server=192.168.5.99:4730 eventhandler=no services=no hosts=no min-worker=6 max-worker=8 encryption=yes key=linux23_Qg549K p1_file=/usr/share/mod­gearman/mod_gearman_p1.pl hostgroups=debian­servers
  25. 25. 2013 25 Distributed Monitoring: Servicegroups Server Configuration: /etc/mod_gearman/mod_gearman_neb.conf server=localhost:4730 eventhandler=yes services=yes hosts=yes servicegroups=pi_srv encryption=yes key=linux23_Qg549K Pi Worker Configuration: /etc/mod-gearman/worker.conf server=192.168.5.99:4730 eventhandler=no services=no hosts=no min-worker=6 max-worker=8 encryption=yes key=linux23_Qg549K p1_file=/usr/share/mod­gearman/mod_gearman_p1.pl servicegroups=pi_srv
  26. 26. Performance Tuning PiPerformance Tuning Pi
  27. 27. 2013 27 noatime mtime contents of file changed ctime inode changed (permissions,ownership) atime accessed time forces a write /etc/fstab proc            /proc           proc    defaults          0       0 /dev/mmcblk0p1  /boot           vfat    defaults          0       2 /dev/mmcblk0p2  /               ext4    defaults,noatime  0       1 mount ­o remount / Verify Changes with: mount
  28. 28. 2013 28 Maximize Resources Reduce Logging * Turn Off rsyslog * Minimize Logging Shutdown Other Services * mail server
  29. 29. Firewall IssuesFirewall Issues
  30. 30. 2013 30 Understanding Network Connections: Pi tcp        0      0 192.168.5.47:43965      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43948      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43964      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43962      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43960      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43977      192.168.5.212:4730      ESTABLISHED tcp        0      0 192.168.5.47:43956      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43947      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43975      192.168.5.212:4730      ESTABLISHED tcp        0      0 192.168.5.47:43969      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43978      192.168.5.212:4730      ESTABLISHED tcp        0      0 192.168.5.47:43967      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43973      192.168.5.212:4730      ESTABLISHED tcp        0      0 192.168.5.47:43959      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43951      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43961      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43957      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43963      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43976      192.168.5.212:4730      ESTABLISHED tcp        0      0 192.168.5.47:43945      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43972      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43970      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43950      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43958      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43952      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43955      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43954      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43946      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43966      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43968      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43953      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43979      192.168.5.212:4730      ESTABLISHED tcp        0      0 192.168.5.47:43971      192.168.5.212:4730      TIME_WAIT   tcp        0      0 192.168.5.47:43974      192.168.5.212:4730      ESTABLISHED   
  31. 31. 2013 31 Understanding Network Connections: Nagios tcp        0      0 0.0.0.0:4730                0.0.0.0:*                   LISTEN         tcp        0      0 192.168.5.212:4730          192.168.5.47:44254          ESTABLISHED  tcp        0      0 192.168.5.212:4730          192.168.5.47:44258          ESTABLISHED  tcp        0      0 192.168.5.212:4730          192.168.5.47:44257          ESTABLISHED  tcp        0      0 192.168.5.212:4730          192.168.5.47:44255          ESTABLISHED  tcp        0      0 192.168.5.212:4730          192.168.5.47:44259          ESTABLISHED  tcp        0      0 192.168.5.212:4730          192.168.5.47:44253          ESTABLISHED  tcp        0      0 192.168.5.212:4730          192.168.5.47:44256          ESTABLISHED    
  32. 32. Creating ChecksCreating Checks
  33. 33. 2013 33 Create Service Check
  34. 34. 2013 34 Create Servicegroup
  35. 35. 2013 35 Add Services to Servicegroup
  36. 36. 2013 36 Graphing with Pi Checks
  37. 37. Monitoring PiMonitoring Pi
  38. 38. 2013 38 Monitor Pi: Workers and Jobs Create a Script on Nagios to Monitor Workers and Jobs #!/bin/bash check_gearman -H 192.168.5.99 -q worker_raspberrypi -t 10 -s check
  39. 39. 2013 39 Monitor Pi: Service Check
  40. 40. 2013 40 Monitor Gearman Workers
  41. 41. 2013 41 Monitor Gearman Workers/Jobs
  42. 42. 2013 42 Warning Signals Nagios Server: Check Latency Nagios Server: Orphaned Checks service check orphaned, is the mod-gearman worker on queue 'servicegroup_pi' running? Pi: Load Over 1 1= 100% Pi: Defunct Workers 15824 14129 2.1 0 [mod_gearman_wor] <defunct>
  43. 43. 2013 43 Pi: Overloaded Load Approaching Limit ps axo pid,ppid,pcpu,size,cmd|grep gearman|grep ­v grep pid   ppid  pcpu  size  cmd 14129     1  0.0  1224 /usr/sbin/mod_gearman_worker  15634 14129 12.0  1488 /usr/sbin/mod_gearman_worker  15635 14129 12.0  1488 /usr/sbin/mod_gearman_worker  15636 14129 12.0  1488 /usr/sbin/mod_gearman_worker  15637 14129 13.0  1488 /usr/sbin/mod_gearman_worker  15638 14129 12.0  1488 /usr/sbin/mod_gearman_worker  15639 14129 12.0  1488 /usr/sbin/mod_gearman_worker 15640 14129 12.0  1488 /usr/sbin/mod_gearman_worker  15641 14129 11.0  1488 /usr/sbin/mod_gearman_worker 15642 14129 11.0  1488 /usr/sbin/mod_gearman_worker  Increased CPU Usage Indicating Impending DOOM ps axo pid,ppid,pcpu,size,cmd|grep gearman|grep ­v grep pid   ppid   pcpu size  cmd 14129     1  0.0  1224 /usr/sbin/mod_gearman_worker  15658 14129  2.1  1488 /usr/sbin/mod_gearman_worker  15659 14129  2.1  1488 /usr/sbin/mod_gearman_worker  15660 14129  2.1  1488 /usr/sbin/mod_gearman_worker  15661 14129  2.1  1488 /usr/sbin/mod_gearman_worker  15662 14129  2.1  1488 /usr/sbin/mod_gearman_worker  15663 14129  2.1  1488 /usr/sbin/mod_gearman_worker  15664 14129 21.0  1488 /usr/sbin/mod_gearman_worker  15665 14129 21.0  1488 /usr/sbin/mod_gearman_worker  15666 14129 21.0  1488 /usr/sbin/mod_gearman_worker 
  44. 44. 2013 44 Plugin Resource Usage: RAM Compiled NSCA NSClient++ SSH Perl 0 2 4 6 8 10 12 RAM
  45. 45. 2013 45 Plugin Resource Use: Time Example: check_ping PID PPID CPU RAM Time Command 12106 12105 0.0 280 00:01 25 /usr/lib/nagios/plugins/check_ping -H 192.168.5.220 -w 3000.0,80% -c 5000.0,100% -p 5 12106 12105 0.0 280 00:02 25 /usr/lib/nagios/plugins/check_ping -H 192.168.5.220 -w 3000.0,80% -c 5000.0,100% -p 5 12106 12105 0.0 280 00:03 25 /usr/lib/nagios/plugins/check_ping -H 192.168.5.220 -w 3000.0,80% -c 5000.0,100% -p 5
  46. 46. 2013 46 Plugins Resource Hog: Network Bandwidth CPU   RAM        Time                         Plugin 13.0  7696       00:01  20 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   6.5  7696       00:02  20 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   4.3  7696       00:03  20 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   3.2  7696       00:04  20 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   2.6  7696       00:05  20 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   2.1  7696       00:06  15 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   1.8  7696       00:07  15 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl  1.6  7696       00:08  15 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   1.4  7696       00:09  15 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl   1.3  7696       00:10  15 /usr/bin/perl ­w? /usr/lib/nagios/plugins/check_iftraffic3.pl
  47. 47. 2013 47 Latency Evaluation Turn On Debug=1  [2013­08­20 10:24:36][11574][DEBUG] received job for queue servicegroup_pi_srv: centos ­ FTP [2013­08­20 10:24:36][11574][DEBUG] service: 'centos' ­ 'FTP', next_check is at 2013­08­20  10:24:36, latency so far: 0 [2013­08­20 10:25:17][11574][DEBUG] received job for queue servicegroup_pi_srv: centos ­ HTTP [2013­08­20 10:25:17][11574][DEBUG] service: 'centos' ­ 'HTTP', next_check is at 2013­08­20  10:25:17, latency so far: 0 [2013­08­20 10:25:17][11574][DEBUG] service job completed: centos HTTP: 2
  48. 48. 2013 48 Troubleshooting: Return code 127 CRITICAL: Return code of 127 is out of bounds. Make sure the plugin you're trying to run actually  exists. (worker: raspberrypi) Check the Path to the plugins directory. sudo mkdir ­p /usr/local/nagios sudo ln ­s /usr/lib/nagios/plugins /usr/local/nagios/libexec
  49. 49. Questions?Questions?

×