Nagios Conference 2011 - Michael Medin - NSClient++: Whats New
Upcoming SlideShare
Loading in...5
×
 

Nagios Conference 2011 - Michael Medin - NSClient++: Whats New

on

  • 2,693 views

Michael Medin's presentation on NSClient++. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the ...

Michael Medin's presentation on NSClient++. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna

Statistics

Views

Total Views
2,693
Views on SlideShare
2,496
Embed Views
197

Actions

Likes
0
Downloads
37
Comments
0

2 Embeds 197

http://exchange.nagios.org 196
http://translate.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Hello my name is Michael Medin.I am from Stockholm, Sweden.This is my second time here in Bolzano but this time I had less problems with my flights.This year I will speak a bit about what has happened in the last year.And hopefully for the last time I am speaking about “Windows Monitoring”!If there are any questions or such just chime in.
  • Standard Disclaimer - My views (not anyone else's) - Not peer reviewed so I could be lying to you. - If you 2 billion dollar servers crash: life sucksLets simplify this a bit…
  • Sorry, this slide just keep getting longer and longer... But I have actually removed some information tis time…I am a developer and developers monitor software where as NOC monitors hardwareThe “unix” guy quit and since I know “unix” I apparently a good choice to administrate routers, firewalls and what not.Disliked BB so I devised a plan to migrate to Nagios.Best thing with Nagios was management loved SLA reporting!Once after some 30 or so installs of nsclient I went to the exchange server and:BANG! This was the birth of NSClient++.Management did not like crashing exchange servers!So we started looking at options and NRPE_NT was to hard to use for “simple” checks. Initially we went with SNMP but soon started on NSClient++ instead.
  • Briefly the agenda covers short introduction to NSClient++Then we move on to 0.3.9 and what’s new in the release.Following that is the 0.4.x version treeAnd finally we will have a QA session
  • A quick note on the terminology.The word NSClient can mean many things depending on what you are talking about
  • A quick summary of the options for monitoring Windows
  • If anyone has a Visual Studio 2005 “Team Edition” (with Itanium support) I’m very very interestedWiki means YOU write the documentation.If the docs suck, you are to blame (not me)
  • I actually payed money to come here speaking with youBut I have always been strange that wayMight seem strange that there are twice as many downloads as unique visitors, but downloads are aggregated from other sites
  • Thank you to my sponsors
  • NSClient++ is your friend!Testing: do them in that order.I know people who start in Nagios and spend the next 3 days debugging, and think NSClient++ sucks.Had they start in NSClient++ /test it would have take 5 minutes and things would not have sucked!I don’t like when things suck so......like net eye
  • NSClient++ is your friend!Testing: do them in that order.I know people who start in Nagios and spend the next 3 days debugging, and think NSClient++ sucks.Had they start in NSClient++ /test it would have take 5 minutes and things would not have sucked!I don’t like when things suck so......like net eye
  • This is really really cool!(And the reason we are 3 months behind schedule, it was amazingly hard to do)
  • As I said NSCP is around 40k lines of code, this is around 4 so 10% of the code and it is new!
  • There are two severities I generally use the one called severity (Based upon eventID)
  • What might be interesting is the safe operators
  • An important note is how neg works with dates
  • The filter: There can be only one!Dont forget NRPE and NSCA has payload limits so exceeding them will cause errors
  • There are two severities I generally use the one called severity (Based upon eventID)
  • Parsing is pretty fancy.It will try to ”do things for you”But what happened to neg?
  • Parsing is pretty fancy.It will try to ”do things for you”But what happened to neg?
  • Parsing is pretty fancy.It will try to ”do things for you”But what happened to neg?
  • Parsing is pretty fancy.It will try to ”do things for you”But what happened to neg?
  • Boost means things (hopefully) works better0.4.x does not neccserily mean 0.4.0CEP CEP CEP!If yourload is high and youhavetransacationthis is a goodthing
  • Boost means things (hopefully) works better0.4.x does not neccserily mean 0.4.0CEP CEP CEP!If your load is high and you have transacation this is a good thing
  • Boost means things (hopefully) works better0.4.x does not neccserily mean 0.4.0CEP CEP CEP!If your load is high and you have transacation this is a good thing

Nagios Conference 2011 - Michael Medin - NSClient++: Whats New Nagios Conference 2011 - Michael Medin - NSClient++: Whats New Presentation Transcript

  • NSClient++: Whats New?
    5 years of vaporware
    Presentation © Michael Medin
  • These slides represent the work and opinions of the author and do not constitute official positions of any organization sponsoring the author’s work
    This material has not been peer reviewed and is presented here as-is with the permission of the author.
    The author assumes no liability for any content or opinion expressed in this presentation and or use of content herein.
    Disclaimer!
    It is not their fault!
    It is not my fault!
    It is your fault!
  • Developer (not manager)
    Not working with Nagios
    Accidentally ended up in our NOC
    Hated BB so we migrated to Nagios
    2003: The birth of NSClient++
    NSClient sucked (Broke Exchange)
    NRPE_NT was to much work
    2004: The open source of NSClient++
    “just for fun”
    2007: The rebirth of NSClient++
    Got a lot of emails and hits on the webpage
    2011: The Present
    0.3.9 out last may
    0.4.0 out as alfa
    My Background
  • Windows Monitoring and NSClient++
    Quick Introduction
    What’s new in 0.3.9
    Disk/File/*
    Scheduled Tasks
    Aliases
    Crash Handling
    What’s new in 0.4.0
    New core
    Unix support
    New settings subsystem
    New protocol
    Python Scripting
    The end of NSClient++!
    Q/A
    Agenda
  • Windows Monitoring and NSClient++
    Quick Introduction
  • What is NSClient?
    A (pretty old) program
    pNSClient
    A (pretty limited) protocol
    check_nt
    A (pretty incorrect) concept
    ”Windows monitoring”
    What is it not?
    NSClient++!
    NSClient++ was written as a replacement for pNSClient
    But it has evolved much since then
    NSClient: Terminology
  • NSClient++
    Freedom!
    Custom scripts
    Decentralized or centralized
    Active or Passive
    Can monitor “anything” (including your application)
    Can perform “tasks” (fix your problems)
    Other options:
    SNMP
    Generally complex to use and limited on “standard” hardware
    pNSClient/NRPE_NT/OpMonAgent/*
    Old, outdated and usually limited functionality
    “Agentless” WMI
    Limited functionality
    Enforces centralized and active monitoring
    But...
    I am biased, so might not want to take my word for it...
    Why should you use NSClient++
  • Several Protocols
  • Internals:
    C++
    Around 75.000 lines of code
    Actively developed (unfortunately only by me)
    Modularized design (use what you need)
    Runs on:
    Windows: NT4, w2k, XP, w2k3, Vista, w2k8, X64, X86 …
    Unix: Linux/Debian (probably many/most others as well)
    Current Version:
    0.3.9 with 0.4.0 in beta
    Most features require NRPE or NSCA (or NSCP)
    Documentation online (WIKI)
    http://nsclient.org
    About NSClient++
  • Not supported by a commercial entity
    Donations welcome
    Sponsoring available (contact me for details)
    Used by a lot of people (I think)
    Impossible to estimate any figures
    Please, Help out!
    Add documentation
    Report problems
    Come with ideas, thoughts, etc…
    About NSClient++ (cont.)
  • Thank you!
  • About NSClient++
    Using NSClient++
  • NSClient++ is a command line program!
    nsclient++ -start (net start nsclientpp)
    nsclient++ -stop (net stop nsclientpp)
    nsclient++ -test
    Configuration:
    notepad nsc.ini
    Testing:
    Local (nsclient++ -test)
    From CLI (check_nrpe ...)
    From Nagios (add command)
    Works with “anything”
    Including many non Nagios based systems
    Using NSClient++ (0.3.9)
    nsclient++ -test
    Is your friend!
  • New command line syntax!
    nscp --service --start
    nscp --service –-stop
    nscp --help
    Testing
    nscp --test
    Configuration:
    nscp --settings-help
    nscp --settings --migrate-to ini
    nscp --settings --set …

    Run scripts:
    nscp --client --module PythonScript --command execute-and-load-python --script test.py --install
    Using NSClient++ (0.4.0)
    nscp --test
    Is your friend!
  • NSClient++What’s new 0.3.9
    Overview
  • Major simplification to the disk/file checker
    CheckFile (removed)
    CheckFile2 Deprecated
    CheckFiles (replaces above)
    Volume support (for real this time)
    Aliases
    NSCA/NRPE enhancements
    Scheduled task checks
    Crash Handling
    A bunch of new commands
    Bug fixes and many more things…
    0.3.9 What's new: Overview
  • We have recruited a new member to the team!
    A girl actually…
    …Still a bit wet behind the ears…
    New team member!
  • Evelina was born 2010-07-21
  • NSClient++What’s new 0.3.9
    CheckFile(1,2,s,…)
  • The good:
    Powerfull interface!
    Simple to use!
    out-of-the-box solution!
    (on which you can expand)
    The bad:
    Nothing! Really, I mean it!
    …and then… yesterday…
    …in the bar…
    …all hopes shattered…
    …aparently it is still to complicated… 
    Overview
  • Same as was introduced for eventlog last year
    Based on SQL WHERE clauses
    generated > -2d AND severity = 'error‘
    size > 5k
    size > 5k OR size < 1k
    size > 5k AND written > -2d
    (size > 5k OR size < 1k ) AND written > -2d

    The new Filters
  • Filter keywords
  • Filter operators
  • Filter Functions
  • Command Options
  • CheckDriveSize… CheckAll=volumes …
    Other new features
    Added a new option to ignore drives which are not readable (like office 2010 q: drive)
    ignore-unreadable
    Added magic modifiers (from check_mk)
    magic=0.7
    Volume support (for real this time)
  • NSClient++What’s new 0.3.9
    Scheduled Tasks
  • Works the ”same” as CheckEventLog
    ”filter=exit_code ne 0”
    Two modules:
    CheckTaskSched.dll
    Works on Windows NT4 and beyond
    But cannot check ”new” tasks (from Vista and beyond)
    CheckTaskSched2.dll
    Works on Windows Vista and beyond
    Has fewer filter keywords
    Scheduled Tasks
  • Filter keywords
  • CheckTaskSched
    "filter=exit_code ne 0"
    "syntax=%title%: %exit_code%"
    warn=>0
    WARNING:test.job(1)
    CheckTaskSched
    "filter=status = 'running' AND most_recent_run_time < -30m"
    "syntax=%title% (%most_recent_run_time%)“
    warn=>0
    WARNING:test.job(2011-02-10 23:14:35)
    Sample Commands
  • NSClient++What’s new 0.3.9
    Aliases
  • System
    alias_cpu
    CPU Load past 5 minutes, 80/90% bounds
    alias_cpu_ex
    CPU Load past 5 minutes, custom bounds
    alias_mem
    Memory utilization (all) 80/90% bounds.
    alias_mem_ex
    Memory utilization (all), custom bounds
    alias_up
    System uptime
    Out of the box aliases
  • Disk/Drive
    alias_disk
    All fixed drives
    alias_disk_loose
    All fixed drives, ignore any problematic drives
    alias_volumes
    All volumes
    alias_volumes_loose
    All volumes, ignore any problematic drives
    alias_file_size
    Check the size of a given file (filename, size)
    alias_file_age
    Check the age of a given file
    Out of the box aliases (continued)
  • Eventlog
    alias_event_log
    Check for errors in the event log
    Schedules Tasks
    alias_sched_all
    No scheduled jobs have failed
    alias_sched_long
    No task has been running for longer then a given time.
    alias_sched_task
    Check if a given task succeeded
    Misc
    alias_updates
    Check that updates are applied
    Out of the box aliases (continued)
  • Processes
    alias_service
    All services in “sensible state”
    alias_service_ex
    All services in “sensible state” (exclude various services)
    alias_process
    A process must be running
    alias_process_stopped
    A process must not be running
    alias_process_count
    A process must not have more then X instances
    alias_process_hung
    A process must not be hung
    Out of the box aliases (continued)
  • NSClient++What’s new 0.3.9
    Crash Handling
  • Using Google break pad
    same as Google Chrome, Mozilla Firefox, etc
    Three options (not mutually exclusive)
    Send crash dumps to crash.nsclient.org
    Server can be changed
    if you want to have an internal server or proxy server.
    Store crash dumps for analysis
    Will also be checked with check_nscp
    Restart service
    Crash Handling
  • [crash]
    restart=1
    service_name=nsclientpp
    submit=0
    url=http://crash.nsclient.org/submit
    archive=1
    #folder=<appfolder>/dumps
    Configuring Crash Handling
  • NSClient++What’s new 0.3.9
    Miscellaneous Fixes
  • NSCA
    Fixed problems with sending ”many” results back
    NRPE
    Added support for large payloads
    Checks
    Added ”check_nscp” to check health of NSClient++
    Added new check for running other checks ”with a timeout”
    Added new negate check (to negate the result of another check)
    All filters (read CheckEventLog et al)
    Many fixes and additions (regular expressions)
    Process checks
    Added support for checking if processes has ”hung”
    Performance data
    Added it to many places where it was intermittently missing before
    Other stuff (The highlights)
  • Roadmap
    Whats to come?
  • Roadmap (rough)
  • NSClient++What’s new 0.4.0
    Overview
  • Brand new core based upon libraries
    Things should *work* not just “work”
    More modular and extensible
    Unix support
    Both as a client and server
    New settings subsystem
    Registry, improved ini support, http, etc
    New protocol
    NSCP (HTTP(s), MQ, Native)
    Distributed monitoring
    Many new things in this area (including MQ)
    Python scripting
    Primary goal (for me) is to create “unit-test”
    Updated installer
    Wix 3.5, more customizable
    What’s new 0.4.0
  • “Monitoring Kits”
    Monitoring solutions for “standard things”
    New windows check-subsytem
    More modern and less arcane (no NT4 support)
    Remote checking
    .Net plugin support
    Possibly internal VBA scripting support
    Metrics cache and aggregation
    Lightweight version of CEP
    “crit=cpu > 80% AND transactions_per_sec < 10”
    What’s coming 0.4.2
  • Filter-like API (in addition to options)
    “warn=any drive > 90% OR c: > 80%”
    Remote updates/upgrades
    Allow NSCP to upgrade itself
    “port” of the “standard plugins”?
    Run your favorite check_xxx from inside NSClient++
    Unix plugins?
    Run CheckCPU on unix machines?
    Client/web Interface?
    A nice little program (systray)
    Let me know what you would like to see!
    Whatmight be coming?
  • NSClient++What’s new 0.4.0
    Brand new core
  • The flux capacitor
  • This is why it was so long in the making
    Merging each new version took forever!
    New internal protocol
    Removed all internal “limits” (think buffer sizes)
    Allows many new features
    Allows much more advanced internal scripts
    Allows for “non NRPE based checks”
    A lot of new bugs?
    This is the scary part (for me)
    but my testing has show it seems very stable
    A completely new core
  • NSClient++What’s new 0.4.0
    Unix support
  • Good question…
    Since no one seems to like to program on Windows
    I brought NSClient++ to “unix” 
    Because I can
    With the new core comes portability
    So, perhaps the better question was:
    Why not?
    Will NOT be supported for some time though
    Unless someone wants to help out
    Why?!?!
  • NSClient++What’s new 0.4.0
    New Settings
  • Hierarchical settings subsystem
    [/settings/NRPE/server]
    allow arguments=false
    Instead of
    [NRPE Server]
    allow_arguments=false
    Why did I do this?
    Because it was fun 
    Number of options has started to explode
    Simpler to use the registry (as well as xml?)
    Settings
  • Since settings have “url:s”
    old://${exe-path}/nsc.ini
    ini://${base-path}/nsclient.ini
    registry://HKEY_LOCAL_MACHINE/software/NSClient++
    http://my.central.server/config/${hostname}.ini
    Allows extensions (not via plugins though)
    Maybe in the future:
    lua://${base-path}/config.lua
    python://${base-path}/config.py
    You can mix and match:
    ini://${base-path}/nsclient.ini
    Can “include”:
    registry://HKEY_LOCAL_MACHINE/software/NSClient++
    Which in turn includes
    http://conf.server/${hostname}.conf
    What’s in it for you?
  • Ability to load the same plugin twice.
    Normal (default alias is python)
    [/modules]
    PytonScript=
    [/settings/python/scripts]
    test.py
    Multiple modules (define two aliases foo and bar)
    [/modules]
    foo=PytonScript
    bar=PythonScript
    [/settings/foo/scripts]
    test1.py
    [/settings/bar/scripts]
    test2.py
    Multiple modules and alias
  • It depends…
    If you are “still” using check_nt:
    Probably not
    If you are using NSCA:
    Maybe not
    If you want to use all new features
    Yes
    How do I change?
    It is pretty simple…
    nscp --settings --migrate-to ini
    (or)
    nscp --settings --migrate-to registry
    Do I need to change?
  • NSClient++What’s new 0.4.0
    New protocol
  • Active NRPE
  • Active NSCP
  • Allows more then one command to be sent
    Used internally for plugins
    Support both passive and active checks
    Supports configuration, management, etc…
    Extensible
    But will also support:
    Multiple locales (based on utf)
    Unlimited payloads (soft configurable)
    Support real performance data (not strings)
    New protocol
  • NSClient++What’s new 0.4.0
    Distributed monitoring
  • Submission (evolution)
  • Other scenarios
  • an extension of the passive checks
    ”Something” can send notification events
    ”Something” can receive notification events
    Agents can forward notification events
    Replaces NSCAListenermodule
    Supports routing
    Not a one-to-one mapping.
    Multiple consumers
    multiple producers
    Allows
    Passive plugins (other then the built-in NSCA)
    Script and rule based routing
    Submissions and handlers
  • NSClient++What’s new 0.4.0
    Python scripting
  • Built-in python scripting
    Has full API support
    Can build ”modules” in python
    Can access settings
    Can do “anything”
    Primarily used by me for unit-testing
    Requires a working python install
    Python Scripting
  • The end of NSClient++!
    Le Roi est mort, vive le Roi!
  • 0.4.x (ish) will be the last ”Windows” monitoring agent
    The idea is to make it more:
    A platform/client/server for distributed monitoring
    Regardless of os/system
    Regardless of Monitoring solutions
    Don’t worry…
    It will still work just fine as a ”Windows Monitoring Agent”
    But in addition to this you will be able to do more.
    So whats this all about?
  • Questions?
    Q&A
  • Michael Medin
    michael@medin.name
    http://www.linkedin.com/in/mickem
    Information about NSClient++
    http://nsclient.org
    Facebook: facebook.com/nsclient
    Slides, and examples
    http://nsclient.org/nscp/conferances/2011/nwcna/
    Thank You!