Enhancing Nagios with Python Plugins Maurice Maneschi Associate Director, Risk Management Systems Oakvale Capital Limited
Presentation Outline <ul><li>Risk Management Systems </li></ul><ul><li>What is Nagios </li></ul><ul><li>Why Python </li></...
Risk Management Systems <ul><li>A division of five staff </li></ul><ul><li>Supporting three key applications </li></ul><ul...
Risk Management System <ul><li>Divisional goals </li></ul><ul><ul><li>Key goal is application management </li></ul></ul><u...
 
What is Nagios <ul><li>Host, service, network monitoring program </li></ul><ul><li>Open source </li></ul><ul><li>Written i...
What is Nagios <ul><li>Configured with the hosts of a network </li></ul><ul><ul><li>How the hosts are networked </li></ul>...
Why Python <ul><li>Flexible </li></ul><ul><li>Efficient </li></ul><ul><li>Managable </li></ul><ul><li>Numerous, diverse li...
What is a plugin <ul><li>Executable file </li></ul><ul><ul><li>Takes parameters (preferable) </li></ul></ul><ul><ul><li>Pr...
What is a plugin <ul><li>Executable Python script </li></ul><ul><li>Code the test </li></ul><ul><li>Print the status line ...
Specific risks being monitored <ul><li>Customer email to the help desk system has stopped </li></ul><ul><ul><li>User issue...
Specific risks being monitored
Specific risks being monitored
Specific risks being monitored <ul><li>Ratefeed is missing some rates </li></ul><ul><ul><li>Rates feed into our system fro...
Specific risks being monitored
Specific risks being monitored
Specific risks being monitored <ul><li>Rates must be inserted regularly </li></ul><ul><ul><li>Insertion process has numero...
Specific risks being monitored
Specific risks being monitored
Specific risks being monitored <ul><li>External source of dealing information </li></ul><ul><ul><li>Fed in through the FIX...
Specific risks being monitored
Specific risks being monitored
Specific risks being monitored <ul><li>Quick passive check </li></ul>
Specific risks being monitored <ul><li>Successful backups </li></ul><ul><li>Successful scheduled tasks </li></ul><ul><li>D...
Extra enhancements to Nagios <ul><li>High level view to systems health </li></ul><ul><li>Audio alerts and SMSes from UTbox...
Analysing reports and logs <ul><li>Screen saver often sufficient </li></ul><ul><li>Summary views </li></ul>
 
 
 
 
 
 
 
Where to next <ul><li>Low spec-ed PC </li></ul><ul><li>Nagios is in several distro repositories </li></ul><ul><ul><li>I co...
Further information <ul><li>Nagios:  http://www.nagios.org </li></ul><ul><li>Python:  http://www.python.org </li></ul><ul>...
Upcoming SlideShare
Loading in...5
×

Writing Nagios Plugins in Python

14,155

Published on

I introduced Nagios to an organisation in 2004 to track the availability of various servers and network resources. It has since grown into a system validity tool that takes the stress out of help desk. Using Python as a scripting language, I have created a suite of additional Nagios plugins that ensures:
* real-time entry of market rates
* end of day rate integrity
* common errors in manual spreadsheets
* success of backup processes
* validity conditions in MS SQL databases
* routine tracking of known chronic errors

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
14,155
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
297
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Writing Nagios Plugins in Python

    1. 1. Enhancing Nagios with Python Plugins Maurice Maneschi Associate Director, Risk Management Systems Oakvale Capital Limited
    2. 2. Presentation Outline <ul><li>Risk Management Systems </li></ul><ul><li>What is Nagios </li></ul><ul><li>Why Python </li></ul><ul><li>What is a plug in </li></ul><ul><li>Specific Risks being monitored </li></ul><ul><li>Analysing reports and logs </li></ul><ul><li>Where to next </li></ul>
    3. 3. Risk Management Systems <ul><li>A division of five staff </li></ul><ul><li>Supporting three key applications </li></ul><ul><li>Running on eight servers </li></ul><ul><li>Depending on 15+ other boxes spread over 3 LANs </li></ul><ul><li>Five key vendors </li></ul>
    4. 4. Risk Management System <ul><li>Divisional goals </li></ul><ul><ul><li>Key goal is application management </li></ul></ul><ul><ul><li>Some customer support </li></ul></ul><ul><ul><li>Product innovation </li></ul></ul><ul><ul><li>Project management </li></ul></ul><ul><ul><li>No time for nasty surprises </li></ul></ul>
    5. 6. What is Nagios <ul><li>Host, service, network monitoring program </li></ul><ul><li>Open source </li></ul><ul><li>Written in C </li></ul><ul><li>Runs on Linux and Apache </li></ul>
    6. 7. What is Nagios <ul><li>Configured with the hosts of a network </li></ul><ul><ul><li>How the hosts are networked </li></ul></ul><ul><ul><li>What key services are on the hosts </li></ul></ul><ul><ul><ul><li>“PING”, SMTP, HTTP etc. </li></ul></ul></ul><ul><li>Application polls these at specified intervals </li></ul><ul><ul><li>From the results of the polls, determines the state of hosts, services and networks </li></ul></ul><ul><ul><li>Alerts sent by email </li></ul></ul><ul><ul><li>Escalation, reporting, statistics and more </li></ul></ul>
    7. 8. Why Python <ul><li>Flexible </li></ul><ul><li>Efficient </li></ul><ul><li>Managable </li></ul><ul><li>Numerous, diverse libraries </li></ul><ul><li>Cross-platform </li></ul><ul><li>Huge number of code samples across the network </li></ul>
    8. 9. What is a plugin <ul><li>Executable file </li></ul><ul><ul><li>Takes parameters (preferable) </li></ul></ul><ul><ul><li>Prints a short status message </li></ul></ul><ul><li>Returns an exit status of </li></ul><ul><ul><li>0 – all OK </li></ul></ul><ul><ul><li>1 – warning </li></ul></ul><ul><ul><li>2 – critical </li></ul></ul><ul><li>Stateless </li></ul>
    9. 10. What is a plugin <ul><li>Executable Python script </li></ul><ul><li>Code the test </li></ul><ul><li>Print the status line </li></ul><ul><li>Return a status </li></ul><ul><li>Easy! </li></ul>
    10. 11. Specific risks being monitored <ul><li>Customer email to the help desk system has stopped </li></ul><ul><ul><li>User issues email in directly into our help desk system for prioritisation, action and eventually billing </li></ul></ul><ul><ul><li>Spam periodically breaks the import agent </li></ul></ul><ul><ul><li>Its proprietary, so no fix in sight </li></ul></ul><ul><ul><li>Nagios watches the queue using POP3 </li></ul></ul>
    11. 12. Specific risks being monitored
    12. 13. Specific risks being monitored
    13. 14. Specific risks being monitored <ul><li>Ratefeed is missing some rates </li></ul><ul><ul><li>Rates feed into our system from Reuters via MS Excel </li></ul></ul><ul><ul><li>Some rates are critical, and human intervention is required if they are missing </li></ul></ul><ul><ul><li>Other rates are important, but are just tracked when missing </li></ul></ul><ul><ul><li>Nagios watches MS Excel file sheet with the “unreliable rates” </li></ul></ul>
    14. 15. Specific risks being monitored
    15. 16. Specific risks being monitored
    16. 17. Specific risks being monitored <ul><li>Rates must be inserted regularly </li></ul><ul><ul><li>Insertion process has numerous dependencies </li></ul></ul><ul><ul><li>Moving target – causes of failure change over time </li></ul></ul><ul><ul><li>Focus on the end point – are the rates in the database? </li></ul></ul><ul><ul><li>Nagios the databases and alerts to old or missing rates </li></ul></ul>
    17. 18. Specific risks being monitored
    18. 19. Specific risks being monitored
    19. 20. Specific risks being monitored <ul><li>External source of dealing information </li></ul><ul><ul><li>Fed in through the FIX protocol </li></ul></ul><ul><ul><li>Numerous failure points being monitored on a (Windows) server </li></ul></ul><ul><ul><li>Monitor process must check in with Nagios every 10 minutes </li></ul></ul><ul><ul><li>Using passive and active checks </li></ul></ul>
    20. 21. Specific risks being monitored
    21. 22. Specific risks being monitored
    22. 23. Specific risks being monitored <ul><li>Quick passive check </li></ul>
    23. 24. Specific risks being monitored <ul><li>Successful backups </li></ul><ul><li>Successful scheduled tasks </li></ul><ul><li>Database comparisons </li></ul><ul><li>Common errors </li></ul><ul><ul><li>Password server on web site </li></ul></ul><ul><ul><li>Known failure point on an MS Excel worksheet </li></ul></ul>
    24. 25. Extra enhancements to Nagios <ul><li>High level view to systems health </li></ul><ul><li>Audio alerts and SMSes from UTbox.net </li></ul><ul><li>Status screen on monitor PC </li></ul><ul><li>Syslogd for firewall </li></ul><ul><li>Script reuse for rate checks </li></ul><ul><li>Ad hoc system problems </li></ul><ul><ul><li>Currently tracking WAN failures </li></ul></ul>
    25. 26. Analysing reports and logs <ul><li>Screen saver often sufficient </li></ul><ul><li>Summary views </li></ul>
    26. 34. Where to next <ul><li>Low spec-ed PC </li></ul><ul><li>Nagios is in several distro repositories </li></ul><ul><ul><li>I compile from the source </li></ul></ul><ul><li>Allow a day at least to configure Nagios </li></ul><ul><ul><li>Don't expect to install and switch it on </li></ul></ul><ul><li>Tuning Nagios is an ongoing job </li></ul>
    27. 35. Further information <ul><li>Nagios: http://www.nagios.org </li></ul><ul><li>Python: http://www.python.org </li></ul><ul><ul><li>pyexcelerator, pymssql, freetds from Sourceforge </li></ul></ul><ul><li>Oakvale Capital: http://www.oakvale.com </li></ul><ul><li>Code samples: http://www.redwaratah.com/wiki/index.php?title=Nagios_and_Python </li></ul><ul><li>Maurice Maneschi: [email_address] </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×