Nagios Conference 2011 - Michael Medin - NSClient++: Whats New


Published on

Michael Medin's presentation on NSClient++. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the conference (including photos and videos), visit:

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Hello my name is Michael Medin.I am from Stockholm, Sweden.This is my second time here in Bolzano but this time I had less problems with my flights.This year I will speak a bit about what has happened in the last year.And hopefully for the last time I am speaking about “Windows Monitoring”!If there are any questions or such just chime in.
  • Standard Disclaimer - My views (not anyone else's) - Not peer reviewed so I could be lying to you. - If you 2 billion dollar servers crash: life sucksLets simplify this a bit…
  • Sorry, this slide just keep getting longer and longer... But I have actually removed some information tis time…I am a developer and developers monitor software where as NOC monitors hardwareThe “unix” guy quit and since I know “unix” I apparently a good choice to administrate routers, firewalls and what not.Disliked BB so I devised a plan to migrate to Nagios.Best thing with Nagios was management loved SLA reporting!Once after some 30 or so installs of nsclient I went to the exchange server and:BANG! This was the birth of NSClient++.Management did not like crashing exchange servers!So we started looking at options and NRPE_NT was to hard to use for “simple” checks. Initially we went with SNMP but soon started on NSClient++ instead.
  • Briefly the agenda covers short introduction to NSClient++Then we move on to 0.3.9 and what’s new in the release.Following that is the 0.4.x version treeAnd finally we will have a QA session
  • A quick note on the terminology.The word NSClient can mean many things depending on what you are talking about
  • A quick summary of the options for monitoring Windows
  • If anyone has a Visual Studio 2005 “Team Edition” (with Itanium support) I’m very very interestedWiki means YOU write the documentation.If the docs suck, you are to blame (not me)
  • I actually payed money to come here speaking with youBut I have always been strange that wayMight seem strange that there are twice as many downloads as unique visitors, but downloads are aggregated from other sites
  • Thank you to my sponsors
  • NSClient++ is your friend!Testing: do them in that order.I know people who start in Nagios and spend the next 3 days debugging, and think NSClient++ sucks.Had they start in NSClient++ /test it would have take 5 minutes and things would not have sucked!I don’t like when things suck net eye
  • NSClient++ is your friend!Testing: do them in that order.I know people who start in Nagios and spend the next 3 days debugging, and think NSClient++ sucks.Had they start in NSClient++ /test it would have take 5 minutes and things would not have sucked!I don’t like when things suck net eye
  • This is really really cool!(And the reason we are 3 months behind schedule, it was amazingly hard to do)
  • As I said NSCP is around 40k lines of code, this is around 4 so 10% of the code and it is new!
  • There are two severities I generally use the one called severity (Based upon eventID)
  • What might be interesting is the safe operators
  • An important note is how neg works with dates
  • The filter: There can be only one!Dont forget NRPE and NSCA has payload limits so exceeding them will cause errors
  • There are two severities I generally use the one called severity (Based upon eventID)
  • Parsing is pretty fancy.It will try to ”do things for you”But what happened to neg?
  • Parsing is pretty fancy.It will try to ”do things for you”But what happened to neg?
  • Parsing is pretty fancy.It will try to ”do things for you”But what happened to neg?
  • Parsing is pretty fancy.It will try to ”do things for you”But what happened to neg?
  • Boost means things (hopefully) works better0.4.x does not neccserily mean 0.4.0CEP CEP CEP!If yourload is high and youhavetransacationthis is a goodthing
  • Boost means things (hopefully) works better0.4.x does not neccserily mean 0.4.0CEP CEP CEP!If your load is high and you have transacation this is a good thing
  • Boost means things (hopefully) works better0.4.x does not neccserily mean 0.4.0CEP CEP CEP!If your load is high and you have transacation this is a good thing
  • Nagios Conference 2011 - Michael Medin - NSClient++: Whats New

    1. 1. NSClient++: Whats New?<br />5 years of vaporware<br />Presentation © Michael Medin<br />
    2. 2. These slides represent the work and opinions of the author and do not constitute official positions of any organization sponsoring the author’s work <br />This material has not been peer reviewed and is presented here as-is with the permission of the author.<br />The author assumes no liability for any content or opinion expressed in this presentation and or use of content herein.<br />Disclaimer!<br />It is not their fault!<br />It is not my fault!<br />It is your fault!<br />
    3. 3. Developer (not manager)<br />Not working with Nagios<br />Accidentally ended up in our NOC<br />Hated BB so we migrated to Nagios<br />2003: The birth of NSClient++<br />NSClient sucked (Broke Exchange)<br />NRPE_NT was to much work<br />2004: The open source of NSClient++<br />“just for fun”<br />2007: The rebirth of NSClient++<br />Got a lot of emails and hits on the webpage<br />2011: The Present<br />0.3.9 out last may<br />0.4.0 out as alfa<br />My Background<br />
    4. 4. Windows Monitoring and NSClient++<br />Quick Introduction<br />What’s new in 0.3.9<br />Disk/File/*<br />Scheduled Tasks<br />Aliases<br />Crash Handling<br />What’s new in 0.4.0<br />New core<br />Unix support<br />New settings subsystem<br />New protocol<br />Python Scripting<br />The end of NSClient++!<br />Q/A<br />Agenda<br />
    5. 5. Windows Monitoring and NSClient++<br />Quick Introduction<br />
    6. 6. What is NSClient?<br />A (pretty old) program<br />pNSClient<br />A (pretty limited) protocol <br />check_nt<br />A (pretty incorrect) concept<br />”Windows monitoring”<br />What is it not?<br />NSClient++!<br />NSClient++ was written as a replacement for pNSClient<br />But it has evolved much since then<br />NSClient: Terminology<br />
    7. 7. NSClient++<br />Freedom!<br />Custom scripts<br />Decentralized or centralized<br />Active or Passive<br />Can monitor “anything” (including your application)<br />Can perform “tasks” (fix your problems)<br />Other options:<br />SNMP<br />Generally complex to use and limited on “standard” hardware<br />pNSClient/NRPE_NT/OpMonAgent/*<br />Old, outdated and usually limited functionality<br />“Agentless” WMI<br />Limited functionality<br />Enforces centralized and active monitoring<br />But...<br />I am biased, so might not want to take my word for it...<br />Why should you use NSClient++<br />
    8. 8. Several Protocols<br />
    9. 9. Internals:<br />C++<br />Around 75.000 lines of code<br />Actively developed (unfortunately only by me)<br />Modularized design (use what you need)<br />Runs on:<br />Windows: NT4, w2k, XP, w2k3, Vista, w2k8, X64, X86 …<br />Unix: Linux/Debian (probably many/most others as well)<br />Current Version:<br />0.3.9 with 0.4.0 in beta<br />Most features require NRPE or NSCA (or NSCP)<br />Documentation online (WIKI)<br /><br />About NSClient++<br />
    10. 10. Not supported by a commercial entity<br />Donations welcome<br />Sponsoring available (contact me for details)<br />Used by a lot of people (I think)<br />Impossible to estimate any figures<br />Please, Help out!<br />Add documentation<br />Report problems<br />Come with ideas, thoughts, etc…<br />About NSClient++ (cont.)<br />
    11. 11. Thank you!<br />
    12. 12. About NSClient++<br />Using NSClient++<br />
    13. 13. NSClient++ is a command line program!<br />nsclient++ -start (net start nsclientpp)<br />nsclient++ -stop (net stop nsclientpp)<br />nsclient++ -test<br />Configuration:<br />notepad nsc.ini<br />Testing:<br />Local (nsclient++ -test)<br />From CLI (check_nrpe ...)<br />From Nagios (add command)<br />Works with “anything” <br />Including many non Nagios based systems<br />Using NSClient++ (0.3.9)<br />nsclient++ -test<br />Is your friend!<br />
    14. 14. New command line syntax!<br />nscp --service --start<br />nscp --service –-stop<br />nscp --help<br />Testing<br />nscp --test<br />Configuration:<br />nscp --settings-help<br />nscp --settings --migrate-to ini<br />nscp --settings --set …<br />…<br />Run scripts:<br />nscp --client --module PythonScript --command execute-and-load-python --script --install<br />Using NSClient++ (0.4.0)<br />nscp --test<br />Is your friend!<br />
    15. 15. NSClient++What’s new 0.3.9<br />Overview<br />
    16. 16. Major simplification to the disk/file checker<br />CheckFile (removed)<br />CheckFile2 Deprecated<br />CheckFiles (replaces above)<br />Volume support (for real this time)<br />Aliases<br />NSCA/NRPE enhancements<br />Scheduled task checks<br />Crash Handling<br />A bunch of new commands<br />Bug fixes and many more things…<br />0.3.9 What's new: Overview<br />
    17. 17. We have recruited a new member to the team!<br />A girl actually…<br />…Still a bit wet behind the ears…<br />New team member!<br />
    18. 18. Evelina was born 2010-07-21<br />
    19. 19. NSClient++What’s new 0.3.9<br />CheckFile(1,2,s,…)<br />
    20. 20. The good:<br />Powerfull interface!<br />Simple to use!<br />out-of-the-box solution!<br />(on which you can expand)<br />The bad:<br />Nothing! Really, I mean it!<br />…and then… yesterday…<br />…in the bar…<br />…all hopes shattered…<br />…aparently it is still to complicated… <br />Overview<br />
    21. 21. Same as was introduced for eventlog last year<br />Based on SQL WHERE clauses<br />generated > -2d AND severity = 'error‘<br />size > 5k<br />size > 5k OR size < 1k<br />size > 5k AND written > -2d<br />(size > 5k OR size < 1k ) AND written > -2d<br />…<br />The new Filters<br />
    22. 22. Filter keywords<br />
    23. 23. Filter operators<br />
    24. 24. Filter Functions<br />
    25. 25. Command Options<br />
    26. 26. CheckDriveSize… CheckAll=volumes …<br />Other new features<br />Added a new option to ignore drives which are not readable (like office 2010 q: drive)<br />ignore-unreadable<br />Added magic modifiers (from check_mk)<br />magic=0.7<br />Volume support (for real this time)<br />
    27. 27. NSClient++What’s new 0.3.9<br />Scheduled Tasks<br />
    28. 28. Works the ”same” as CheckEventLog<br />”filter=exit_code ne 0”<br />Two modules:<br />CheckTaskSched.dll<br />Works on Windows NT4 and beyond<br />But cannot check ”new” tasks (from Vista and beyond)<br />CheckTaskSched2.dll<br />Works on Windows Vista and beyond<br />Has fewer filter keywords<br />Scheduled Tasks<br />
    29. 29. Filter keywords<br />
    30. 30. CheckTaskSched<br />"filter=exit_code ne 0" <br />"syntax=%title%: %exit_code%" <br />warn=>0<br />WARNING:test.job(1)<br />CheckTaskSched<br />"filter=status = 'running' AND most_recent_run_time < -30m" <br />"syntax=%title% (%most_recent_run_time%)“<br />warn=>0<br />WARNING:test.job(2011-02-10 23:14:35)<br />Sample Commands<br />
    31. 31. NSClient++What’s new 0.3.9<br />Aliases<br />
    32. 32. System<br />alias_cpu<br />CPU Load past 5 minutes, 80/90% bounds<br />alias_cpu_ex<br />CPU Load past 5 minutes, custom bounds<br />alias_mem<br />Memory utilization (all) 80/90% bounds.<br />alias_mem_ex<br />Memory utilization (all), custom bounds<br />alias_up<br />System uptime<br />Out of the box aliases<br />
    33. 33. Disk/Drive<br />alias_disk<br />All fixed drives<br />alias_disk_loose<br />All fixed drives, ignore any problematic drives<br />alias_volumes<br />All volumes<br />alias_volumes_loose<br />All volumes, ignore any problematic drives<br />alias_file_size<br />Check the size of a given file (filename, size)<br />alias_file_age<br />Check the age of a given file<br />Out of the box aliases (continued)<br />
    34. 34. Eventlog<br />alias_event_log<br />Check for errors in the event log<br />Schedules Tasks<br />alias_sched_all<br />No scheduled jobs have failed<br />alias_sched_long<br />No task has been running for longer then a given time.<br />alias_sched_task<br />Check if a given task succeeded<br />Misc<br />alias_updates<br />Check that updates are applied<br />Out of the box aliases (continued)<br />
    35. 35. Processes<br />alias_service<br />All services in “sensible state”<br />alias_service_ex<br />All services in “sensible state” (exclude various services)<br />alias_process<br />A process must be running<br />alias_process_stopped<br />A process must not be running<br />alias_process_count<br />A process must not have more then X instances<br />alias_process_hung<br />A process must not be hung<br />Out of the box aliases (continued)<br />
    36. 36. NSClient++What’s new 0.3.9<br />Crash Handling<br />
    37. 37. Using Google break pad <br />same as Google Chrome, Mozilla Firefox, etc<br />Three options (not mutually exclusive)<br />Send crash dumps to<br />Server can be changed <br />if you want to have an internal server or proxy server.<br />Store crash dumps for analysis<br />Will also be checked with check_nscp<br />Restart service<br />Crash Handling<br />
    38. 38. [crash]<br />restart=1<br />service_name=nsclientpp<br />submit=0<br />url=<br />archive=1<br />#folder=<appfolder>/dumps<br />Configuring Crash Handling<br />
    39. 39. NSClient++What’s new 0.3.9<br />Miscellaneous Fixes<br />
    40. 40. NSCA<br />Fixed problems with sending ”many” results back<br />NRPE<br />Added support for large payloads<br />Checks<br />Added ”check_nscp” to check health of NSClient++<br />Added new check for running other checks ”with a timeout”<br />Added new negate check (to negate the result of another check)<br />All filters (read CheckEventLog et al)<br />Many fixes and additions (regular expressions)<br />Process checks<br />Added support for checking if processes has ”hung”<br />Performance data<br />Added it to many places where it was intermittently missing before<br />Other stuff (The highlights)<br />
    41. 41. Roadmap<br />Whats to come?<br />
    42. 42. Roadmap (rough)<br />
    43. 43. NSClient++What’s new 0.4.0<br />Overview<br />
    44. 44. Brand new core based upon libraries<br />Things should *work* not just “work”<br />More modular and extensible<br />Unix support<br />Both as a client and server<br />New settings subsystem<br />Registry, improved ini support, http, etc<br />New protocol<br />NSCP (HTTP(s), MQ, Native)<br />Distributed monitoring<br />Many new things in this area (including MQ)<br />Python scripting<br />Primary goal (for me) is to create “unit-test”<br />Updated installer<br />Wix 3.5, more customizable<br />What’s new 0.4.0<br />
    45. 45. “Monitoring Kits”<br />Monitoring solutions for “standard things”<br />New windows check-subsytem<br />More modern and less arcane (no NT4 support)<br />Remote checking<br />.Net plugin support<br />Possibly internal VBA scripting support<br />Metrics cache and aggregation<br />Lightweight version of CEP<br />“crit=cpu > 80% AND transactions_per_sec < 10”<br />What’s coming 0.4.2<br />
    46. 46. Filter-like API (in addition to options)<br />“warn=any drive > 90% OR c: > 80%”<br />Remote updates/upgrades<br />Allow NSCP to upgrade itself<br />“port” of the “standard plugins”?<br />Run your favorite check_xxx from inside NSClient++<br />Unix plugins?<br />Run CheckCPU on unix machines?<br />Client/web Interface?<br />A nice little program (systray)<br />Let me know what you would like to see!<br />Whatmight be coming?<br />
    47. 47. NSClient++What’s new 0.4.0<br />Brand new core<br />
    48. 48. The flux capacitor<br />
    49. 49. This is why it was so long in the making<br />Merging each new version took forever!<br />New internal protocol<br />Removed all internal “limits” (think buffer sizes)<br />Allows many new features<br />Allows much more advanced internal scripts<br />Allows for “non NRPE based checks”<br />A lot of new bugs?<br />This is the scary part (for me)<br />but my testing has show it seems very stable<br />A completely new core<br />
    50. 50. NSClient++What’s new 0.4.0<br />Unix support<br />
    51. 51. Good question…<br />Since no one seems to like to program on Windows<br />I brought NSClient++ to “unix” <br />Because I can<br />With the new core comes portability<br />So, perhaps the better question was:<br />Why not?<br />Will NOT be supported for some time though<br />Unless someone wants to help out<br />Why?!?!<br />
    52. 52. NSClient++What’s new 0.4.0<br />New Settings<br />
    53. 53. Hierarchical settings subsystem<br />[/settings/NRPE/server]<br />allow arguments=false<br />Instead of <br />[NRPE Server]<br />allow_arguments=false<br />Why did I do this?<br />Because it was fun <br />Number of options has started to explode<br />Simpler to use the registry (as well as xml?)<br />Settings<br />
    54. 54. Since settings have “url:s”<br />old://${exe-path}/nsc.ini<br />ini://${base-path}/nsclient.ini<br />registry://HKEY_LOCAL_MACHINE/software/NSClient++<br />http://my.central.server/config/${hostname}.ini<br />Allows extensions (not via plugins though)<br />Maybe in the future:<br />lua://${base-path}/config.lua<br />python://${base-path}/<br />You can mix and match:<br />ini://${base-path}/nsclient.ini<br />Can “include”:<br />registry://HKEY_LOCAL_MACHINE/software/NSClient++<br />Which in turn includes<br />http://conf.server/${hostname}.conf<br />What’s in it for you?<br />
    55. 55. Ability to load the same plugin twice.<br />Normal (default alias is python)<br />[/modules]<br />PytonScript=<br />[/settings/python/scripts]<br /><br />Multiple modules (define two aliases foo and bar)<br />[/modules]<br />foo=PytonScript<br />bar=PythonScript<br />[/settings/foo/scripts]<br /><br />[/settings/bar/scripts]<br /><br />Multiple modules and alias<br />
    56. 56. It depends…<br />If you are “still” using check_nt:<br />Probably not<br />If you are using NSCA:<br />Maybe not<br />If you want to use all new features<br />Yes<br />How do I change?<br />It is pretty simple…<br />nscp --settings --migrate-to ini<br />(or)<br />nscp --settings --migrate-to registry<br />Do I need to change?<br />
    57. 57. NSClient++What’s new 0.4.0<br />New protocol<br />
    58. 58. Active NRPE<br />
    59. 59. Active NSCP<br />
    60. 60. Allows more then one command to be sent<br />Used internally for plugins<br />Support both passive and active checks<br />Supports configuration, management, etc…<br />Extensible<br />But will also support:<br />Multiple locales (based on utf)<br />Unlimited payloads (soft configurable)<br />Support real performance data (not strings)<br />New protocol<br />
    61. 61. NSClient++What’s new 0.4.0<br />Distributed monitoring<br />
    62. 62. Submission (evolution)<br />
    63. 63. Other scenarios<br />
    64. 64. an extension of the passive checks<br />”Something” can send notification events<br />”Something” can receive notification events<br />Agents can forward notification events<br />Replaces NSCAListenermodule<br />Supports routing<br />Not a one-to-one mapping.<br />Multiple consumers<br />multiple producers<br />Allows<br />Passive plugins (other then the built-in NSCA)<br />Script and rule based routing<br />Submissions and handlers<br />
    65. 65. NSClient++What’s new 0.4.0<br />Python scripting<br />
    66. 66. Built-in python scripting<br />Has full API support<br />Can build ”modules” in python<br />Can access settings<br />Can do “anything”<br />Primarily used by me for unit-testing<br />Requires a working python install<br />Python Scripting<br />
    67. 67. The end of NSClient++!<br />Le Roi est mort, vive le Roi!<br />
    68. 68. 0.4.x (ish) will be the last ”Windows” monitoring agent<br />The idea is to make it more:<br />A platform/client/server for distributed monitoring<br />Regardless of os/system<br />Regardless of Monitoring solutions<br />Don’t worry…<br />It will still work just fine as a ”Windows Monitoring Agent”<br />But in addition to this you will be able to do more.<br />So whats this all about?<br />
    69. 69. Questions?<br />Q&A<br />
    70. 70. Michael Medin<br /><br /><br />Information about NSClient++<br /><br />Facebook:<br />Slides, and examples<br /><br />Thank You!<br />