Writing Nagios Plugins in Python

15,277 views

Published on

I introduced Nagios to an organisation in 2004 to track the availability of various servers and network resources. It has since grown into a system validity tool that takes the stress out of help desk. Using Python as a scripting language, I have created a suite of additional Nagios plugins that ensures:
* real-time entry of market rates
* end of day rate integrity
* common errors in manual spreadsheets
* success of backup processes
* validity conditions in MS SQL databases
* routine tracking of known chronic errors

Published in: Technology

Writing Nagios Plugins in Python

  1. 1. Enhancing Nagios with Python Plugins Maurice Maneschi Associate Director, Risk Management Systems Oakvale Capital Limited
  2. 2. Presentation Outline <ul><li>Risk Management Systems </li></ul><ul><li>What is Nagios </li></ul><ul><li>Why Python </li></ul><ul><li>What is a plug in </li></ul><ul><li>Specific Risks being monitored </li></ul><ul><li>Analysing reports and logs </li></ul><ul><li>Where to next </li></ul>
  3. 3. Risk Management Systems <ul><li>A division of five staff </li></ul><ul><li>Supporting three key applications </li></ul><ul><li>Running on eight servers </li></ul><ul><li>Depending on 15+ other boxes spread over 3 LANs </li></ul><ul><li>Five key vendors </li></ul>
  4. 4. Risk Management System <ul><li>Divisional goals </li></ul><ul><ul><li>Key goal is application management </li></ul></ul><ul><ul><li>Some customer support </li></ul></ul><ul><ul><li>Product innovation </li></ul></ul><ul><ul><li>Project management </li></ul></ul><ul><ul><li>No time for nasty surprises </li></ul></ul>
  5. 6. What is Nagios <ul><li>Host, service, network monitoring program </li></ul><ul><li>Open source </li></ul><ul><li>Written in C </li></ul><ul><li>Runs on Linux and Apache </li></ul>
  6. 7. What is Nagios <ul><li>Configured with the hosts of a network </li></ul><ul><ul><li>How the hosts are networked </li></ul></ul><ul><ul><li>What key services are on the hosts </li></ul></ul><ul><ul><ul><li>“PING”, SMTP, HTTP etc. </li></ul></ul></ul><ul><li>Application polls these at specified intervals </li></ul><ul><ul><li>From the results of the polls, determines the state of hosts, services and networks </li></ul></ul><ul><ul><li>Alerts sent by email </li></ul></ul><ul><ul><li>Escalation, reporting, statistics and more </li></ul></ul>
  7. 8. Why Python <ul><li>Flexible </li></ul><ul><li>Efficient </li></ul><ul><li>Managable </li></ul><ul><li>Numerous, diverse libraries </li></ul><ul><li>Cross-platform </li></ul><ul><li>Huge number of code samples across the network </li></ul>
  8. 9. What is a plugin <ul><li>Executable file </li></ul><ul><ul><li>Takes parameters (preferable) </li></ul></ul><ul><ul><li>Prints a short status message </li></ul></ul><ul><li>Returns an exit status of </li></ul><ul><ul><li>0 – all OK </li></ul></ul><ul><ul><li>1 – warning </li></ul></ul><ul><ul><li>2 – critical </li></ul></ul><ul><li>Stateless </li></ul>
  9. 10. What is a plugin <ul><li>Executable Python script </li></ul><ul><li>Code the test </li></ul><ul><li>Print the status line </li></ul><ul><li>Return a status </li></ul><ul><li>Easy! </li></ul>
  10. 11. Specific risks being monitored <ul><li>Customer email to the help desk system has stopped </li></ul><ul><ul><li>User issues email in directly into our help desk system for prioritisation, action and eventually billing </li></ul></ul><ul><ul><li>Spam periodically breaks the import agent </li></ul></ul><ul><ul><li>Its proprietary, so no fix in sight </li></ul></ul><ul><ul><li>Nagios watches the queue using POP3 </li></ul></ul>
  11. 12. Specific risks being monitored
  12. 13. Specific risks being monitored
  13. 14. Specific risks being monitored <ul><li>Ratefeed is missing some rates </li></ul><ul><ul><li>Rates feed into our system from Reuters via MS Excel </li></ul></ul><ul><ul><li>Some rates are critical, and human intervention is required if they are missing </li></ul></ul><ul><ul><li>Other rates are important, but are just tracked when missing </li></ul></ul><ul><ul><li>Nagios watches MS Excel file sheet with the “unreliable rates” </li></ul></ul>
  14. 15. Specific risks being monitored
  15. 16. Specific risks being monitored
  16. 17. Specific risks being monitored <ul><li>Rates must be inserted regularly </li></ul><ul><ul><li>Insertion process has numerous dependencies </li></ul></ul><ul><ul><li>Moving target – causes of failure change over time </li></ul></ul><ul><ul><li>Focus on the end point – are the rates in the database? </li></ul></ul><ul><ul><li>Nagios the databases and alerts to old or missing rates </li></ul></ul>
  17. 18. Specific risks being monitored
  18. 19. Specific risks being monitored
  19. 20. Specific risks being monitored <ul><li>External source of dealing information </li></ul><ul><ul><li>Fed in through the FIX protocol </li></ul></ul><ul><ul><li>Numerous failure points being monitored on a (Windows) server </li></ul></ul><ul><ul><li>Monitor process must check in with Nagios every 10 minutes </li></ul></ul><ul><ul><li>Using passive and active checks </li></ul></ul>
  20. 21. Specific risks being monitored
  21. 22. Specific risks being monitored
  22. 23. Specific risks being monitored <ul><li>Quick passive check </li></ul>
  23. 24. Specific risks being monitored <ul><li>Successful backups </li></ul><ul><li>Successful scheduled tasks </li></ul><ul><li>Database comparisons </li></ul><ul><li>Common errors </li></ul><ul><ul><li>Password server on web site </li></ul></ul><ul><ul><li>Known failure point on an MS Excel worksheet </li></ul></ul>
  24. 25. Extra enhancements to Nagios <ul><li>High level view to systems health </li></ul><ul><li>Audio alerts and SMSes from UTbox.net </li></ul><ul><li>Status screen on monitor PC </li></ul><ul><li>Syslogd for firewall </li></ul><ul><li>Script reuse for rate checks </li></ul><ul><li>Ad hoc system problems </li></ul><ul><ul><li>Currently tracking WAN failures </li></ul></ul>
  25. 26. Analysing reports and logs <ul><li>Screen saver often sufficient </li></ul><ul><li>Summary views </li></ul>
  26. 34. Where to next <ul><li>Low spec-ed PC </li></ul><ul><li>Nagios is in several distro repositories </li></ul><ul><ul><li>I compile from the source </li></ul></ul><ul><li>Allow a day at least to configure Nagios </li></ul><ul><ul><li>Don't expect to install and switch it on </li></ul></ul><ul><li>Tuning Nagios is an ongoing job </li></ul>
  27. 35. Further information <ul><li>Nagios: http://www.nagios.org </li></ul><ul><li>Python: http://www.python.org </li></ul><ul><ul><li>pyexcelerator, pymssql, freetds from Sourceforge </li></ul></ul><ul><li>Oakvale Capital: http://www.oakvale.com </li></ul><ul><li>Code samples: http://www.redwaratah.com/wiki/index.php?title=Nagios_and_Python </li></ul><ul><li>Maurice Maneschi: [email_address] </li></ul>

×