Italian Conference on Nagios: Michael Medin on Windows Monitoring
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Italian Conference on Nagios: Michael Medin on Windows Monitoring

on

  • 1,716 views

Michael Medin point of view on Windows Monitoring explained during the Nagios Conference in Bolazno Italy

Michael Medin point of view on Windows Monitoring explained during the Nagios Conference in Bolazno Italy

Statistics

Views

Total Views
1,716
Views on SlideShare
1,716
Embed Views
0

Actions

Likes
0
Downloads
19
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Italian Conference on Nagios: Michael Medin on Windows Monitoring Presentation Transcript

  • 1. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Going where no man has gone before
  • 2. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  These slides represent the work and opinions of the author and do not constitute official positions of any organization sponsoring the author‟s work  This material has not been peer reviewed and is presented here as-is with the permission of the author.  The author assumes no liability for any content or opinion expressed in this presentation and or use of content herein.
  • 3. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Developer (not system manager) ◦ Not working with Nagios  Accidentally ended up in our NOC ◦ Hated BB  2003: The birth of NSClient++ ◦ NSClient sucked (Broke Exchange) ◦ NRPE_NT was to hard to use  2004: The open source of NSClient++ ◦ “just for fun”  2007: The rebirth of NSClient++ ◦ A lot of users emailed me ◦ Got a lot of hits on the webpage ◦ Intense development lead to 0.3.0!  2010: The Future ◦ 0.3.8 out now, ◦ 0.4.x in development (scheduled for beta fall 2010)
  • 4. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Agents ◦ An overview of your options  About NSClient++ ◦ Quick Introduction  Monitoring ◦ Eventlog Checking ◦ WMI (Windows Management Instrumentation) ◦ Scripts ◦ Revisiting WMI  Q/A
  • 5. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano An overview of the options
  • 6. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  What is NSClient? ◦ A (pretty old) program  NSClient (or pNSClient) ◦ A (pretty limited) protocol  check_nt ◦ A (pretty incorrect) concept  ”Windows monitoring”  What is it not? ◦ NSClient++! ◦ NSClient++ was written as a replacment for NSClient ◦ But has evolved much since then...
  • 7. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Agent Age Protocol Licence SNMP 1990-2008 SNMP Proprietary NSClient 200x NSClient GPL NRPE_NT 200x-2006 NRPE GPL NSClient++ 2004-2010 NRPE,NSClient,NSCA GPL NC_NET 2004-2009 NSClient,NSCA GPL? OpMonAgent 2008 NSClient,NRPE GPL? Agentless WMI recently N/A N/A ... ... ... ...
  • 8. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  I would use either: ◦ NSClient++ ◦ NC_NET  I would not use: ◦ SNMP  To complex to use (and limited on vanilla hardware) ◦ NSClient/NRPE_NT/OpMonAgent  Old, outdated and has limited functionality ◦ Agentless WMI  Limited functionality (and enforces centralized monitoring)  But... ◦ I am biased, so might not wanna take my word for it...
  • 9. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Protocol Method Encryption Auth Payload Args. Multi Commands NSClient Active No Yes Unlimited1 Yes1 Yes1 NRPE Active Yes No 10242 Yes No NSCA Passive Yes Yes 5122 Yes Yes Future3 A/P/* Yes Yes Unlimited Yes Yes 1) Protocol supports it but not check_nt 2) NRPE Payload can be extended with recompile of check_nrpe and configured in NSClient++ 3) A future protocol I am thinking of adding to NSClient++  NSClient++ supports all of them
  • 10. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  I would use: ◦ NRPE (check_nrpe)  For active checks (the server queries information) ◦ NSCA  For passive checks (the client pushes information)  I would not use: ◦ NSClient (check_nt)  Limited feature set  Be aware! ◦ None of them are safe (from a security perspective)! ◦ But then... Nothing really is...
  • 11. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Quick Introduction
  • 12. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Internals: ◦ C++ using W32 API ◦ Around 40.000 lines of code ◦ Actively developed (unfortunately only by me) ◦ Modularized design philosophy  Runs on: ◦ NT4, w2k, XP, w2k3, Vista, w2k8, Windows 7 ... ◦ X86, x64, IA64 (I lack a compiler for that platform, but it works)  Current Version: ◦ 0.3.8 (out now, yesterday in fact) ◦ Don‟t use 0.2.7!  Most features require NRPE or NSCA  Documentation online (WIKI) ◦ http://nsclient.org
  • 13. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Not supported by a commercial entity ◦ Donations welcome ◦ Sponsoring available (contact me for details)  Used by a lot of people (I think) ◦ Impossible to estimate any figures  Website has: ◦ Around 10-15.000 unique visitors per month ◦ Around 20-30.000 downloads per month  Please, Help out! ◦ Add documentation, report problems, ideas, thoughts, etc, etc...
  • 14. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano
  • 15. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Major simplification to the eventlog checker  generated > -2d AND severity = 'error'  Registry checks  Improvements to the file checker  Supports multi-language performance counters  “Automatic” volume support  Improved command line support  Simplified scripting with a new VB Helper ◦ Thanks op5!  Many more things…
  • 16. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Rewritten ”core” using boost ◦ Means ”propper handling” (and fewer bugs?) ◦ Unix support ◦ Improved multitasking ◦ Etc.  New settings subsytem ◦ Registry, improved ini support, better loader, xml?  Filter-like API (in addition to options) ◦ “warn=any drive > 90% or c: > 80%”  New improved client with improved protocol  Better .net integration  Better customization support  CEP - Complex Event Processing? ◦ If anyone wants this let me know!
  • 17. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  NSClient++ is a command line program! ◦ nsclient++ /start (net start nsclientpp) ◦ nsclient++ /stop (net stop nsclientpp) nsclient++ /test ◦ nsclient++ /test  Is your Configuration: ◦ notepad nsc.ini friend!  Testing: 1. Local (nsclient++ /test) 2. From CLI (check_nrpe ...) 3. From Nagios (add command)  Works with “anything” (event non Nagios things)
  • 18. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Eventlog checking
  • 19. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  In a galaxy far far away...
  • 20. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  The good: ◦ Powerfull interface! ◦ Simple to use! ◦ out-of-the-box solution!  (on which you can expand)  The bad: ◦ Nothing! Really, I mean it!  But... ◦ …still a bit “experimental”
  • 21. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Syntax is friendly and intuitive  Still experimental ◦ Should work though, so please try it  Based on SQL WHERE clauses ◦ generated > -2d AND severity = 'error'  Automatically detects version to use ◦ So no filter=newer option
  • 22. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Like SQL ”Where” clauses ◦ severity = ‟error‟ ◦ severity = ‟error‟ OR severity = ‟warning‟ ◦ severity = ‟error‟ OR (id = 123 OR id = 345) ◦ severity = ‟error‟ OR id IN (123, 345)
  • 23. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Type Description type Type of error. (Microsoft says this is severity) error, warning, info, auditSuccess or auditFailure source The name of the source of the event. The program who logged the message generated Time ago the message was generated. When it happened written Time ago the message was written to the log (don‟t use) strings Message contents (faster) message Message text (slower) id Event id of the log message (this with source in unique) severity Event severity (I think this is severity) success, informational, warning or error
  • 24. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Operator Safe Meaning = eq Equality != ne Not equal > gt Greater then < lt Less then => ge Greater then or equal =< le Less then or equal like String similarity (substring matching)
  • 25. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Name Use Example convert(...) Converts from one type to another Usualy not needed as types are infered neg(...) Negate value -1 = neg(1) yesterday=neg(tomorrow) yesterday = -tomorrow in ( ... ) Equals to anyone from a list id in (1,3, 4, 5)
  • 26. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Option Description file The “eventlog file” to open. Use multiple file-options to check multiple files. filter Define the filter (there can only be one) MaxWarn Maximum hits before a warning state is issued. MaxCrit Maximum hits before a critical state is issued. truncate Length of returned data. Since NRPE (and NSClient++) has a limited capacity this is important. Usually 900 is a good value. syntax How to format the return data unique Only “one of each” record will be returned. (“count” (MaxWarn/MaxCrit) is not affected) descriptions If you plan on using the %message% syntax option. (Will impact performance “severely”) debug=true Displays a lot more information about the check
  • 27. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  alias_event_log  Uses the following definition: ◦ file=application file=system  The files to check ◦ MaxWarn=1 MaxCrit=1  Every error is a warning ◦ "filter=generated gt -2d  Generated less then 2 days ago ◦ AND severity NOT IN ('success', 'informational')"  NOT a success or information message ◦ truncate=800 unique descriptions  Truncate returned data and make it look pretty
  • 28. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Filtering is fairly straight forward  The ”parser” will do most of the work for you ◦ generated > -2d just works!  enable debug=true to see what happens  Always always always debug in ”/test mode”  Check query times to optimize performance  There is a pretty ok guide on the wiki
  • 29. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Start with “everything” and work your way down.  System, Application, etc etc  Reasonable start filter: ◦ generated > -2d AND severity NOT IN („success‟, „informational‟ )  Need to customize it for your environment.  A good idea is to use more then one check 1. Check “all errors” send to /dev/null 2. Check “my service” send to admin@server  Don‟t overdo it (eventlog checking is slow)
  • 30. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano WMI - Windows Management Instrumentation
  • 31. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  The purpose of WMI is to define a non-proprietary set of environment-independent specifications which allow management information to be shared between management applications.  WMI prescribes enterprise management standards and related technologies that work with existing management standards, such as Desktop Management Interface (DMI) and SNMP.  WMI complements these other standards by providing a uniform model. This model represents the managed environment through which management data from any source can be accessed in a common way.  …yada yada yada…  In short: A bit like SNMP but modern  ◦ Though it is actually more then 10 years old
  • 32. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Everything? ◦ Almost...  There is a lot of objects (tables) ◦ win32 has 450 objects ◦ Various services will add more (AD, SQL Server, ...)  You can: ◦ Read, write and work with “objects”.  But only read via the built-in commands of NSClient++  But you can not: ◦ Check your application (ish)
  • 33. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Built-in commands are dangerous! ◦ No security, allows access to a lot of things! ◦ For instance you can enumerate the file system
  • 34. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  CheckWMI ◦ Check a result set ◦ Good for;  checking if we have more (or less) then n items...  CheckWMIValue ◦ Check a specific value ◦ Good for;  checking if a value is more or less then n  Custom Scripts ◦ For, I think, most things beyond the basics ◦ Also improves the security aspect ◦ Good for;  Everything
  • 35. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  WQL - WMI Query Language ◦ Based upon SQL ◦ Only select … (no update/insert/delete/DDL)  “Tables” are called objects in WMI ◦ An object usually correspond to a logical “type”.  Example: ◦ select LoadPercentage from win32_Processor  Retrieves system load from the win32_Processor ”object”. ◦ select * from win32_Processor  Retrieves everything from the win32_Processor ”object”.
  • 36. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Object Description Win32_Fan Represents the properties of a fan device in the computer system. Win32_TemperatureProbe Represents the properties of a temperature sensor (electronic thermometer). Win32_DiskDrive Represents a physical disk drive as seen by a computer running the Windows operating system. Win32_PhysicalMedia Represents any type of documentation or storage medium. Win32_TapeDrive Represents a tape drive on a computer system running Windows. Win32_BaseBoard Represents a baseboard (also known as a motherboard or system board). Win32_BIOS Represents the attributes of the computer system's basic input or output services (BIOS). Win32_IDEController Represents the capabilities of an Integrated Drive Electronics (IDE) controller device. Win32_MemoryArray Represents the properties of the computer system memory array and mapped addresses. Win32_OnBoardDevice Represents common adapter devices built into the motherboard (system board). Win32_Processor Represents a device capable of interpreting a sequence of machine instructions on the computer. Win32_SCSIController Represents a small computer system interface (SCSI) controller on a computer system running Windows. Win32_USBControllerDevice Relates a USB controller and the CIM_LogicalDevice instances connected to it. Win32_NetworkAdapter Represents a network adapter on a computer system running Windows. Win32_Battery Represents a battery connected to the computer system. Win32_PortableBattery Represents the properties of a portable battery, such as one used for a notebook computer. Win32_PowerManagementEvent Represents power management events resulting from power state changes. Win32_UninterruptiblePowerSupply Represents the capabilities and management capacity of an uninterruptible power supply (UPS). Represents a device connected to a computer system running Windows that is capable of reproducing a Win32_Printer visual image on a medium. Win32_PrintJob Represents a print job generated by a Windows-based application.
  • 37. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Object Description Win32_SystemDriver Represents the system driver for a base service. Win32_Directory Represents a directory entry on a computer system running Windows. Win32_DiskQuota Tracks disk space usage for NTFS file system volumes. Win32_LogicalDisk Represents a data source that resolves to an actual local storage device. Win32_Volume Represents an area of storage on a hard disk. Win32_PageFileUsage Represents the file used for handling virtual memory file swapping on a computer system running Windows. Win32_NetworkConnection Represents an active network connection in a Windows environment. Win32_NTDomain Represents a Windows NT domain. Win32_PingStatus Represents the values returned by the standard ping command. Win32_ComputerSystem Represents a computer system operating in a Windows environment. Win32_OperatingSystem Represents an operating system installed on a computer system running Windows. Win32_Process Represents a sequence of events on a computer system running Windows. Win32_ProcessStartup Represents the startup configuration of a computer system running Windows. Win32_ScheduledJob Represents a job scheduled using the Windows NT schedule service. Win32_BaseService Represents executable objects that are installed in a registry database maintained by the SCM. Win32_Service Represents a service on a computer system running Windows. Win32_LogonSession Describes the logon session or sessions associated with a user logged on to Windows 2000 or Windows NT. Win32_UserAccount Represents information about a user account on a computer system running Windows. Win32_UserInDomain Association class Win32_WindowsProductActivation Contains properties and methods related to WPA. Win32_NTEvent... Yes you can even check the eventlog!
  • 38. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  NSClient++ has support for executing WQL queries ”as is” and get the result. ◦ nsclient++ -noboot CheckWMI <query>  Sample use ◦ nsclient++ -noboot CheckWMI select * from win32_Processor
  • 39. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  A better option is ◦ WMI Administrative Tools ◦ Freely avalible from Microsoft
  • 40. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano 1. Checking ”values” Is Load above 50%  Use CheckWMIValue 2. Checking ”items” Is load on more then 3 cores above 50%  Use CheckWMI 3. Checking ”custom things” Check if load is above 50% and less then 5 queries are running on the database  Use Scripts
  • 41. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Best way to start  Simple to use... ◦ ...if you know your WMI  A sample query: ◦ CheckWMIValue  "Query=Select * from win32_Processor“  MaxWarn=80 MaxCrit=90  Check:CPU=LoadPercentage  AliasCol=LoadPercentage  ShowAll=long ◦ (a bit like CheckCPU)
  • 42. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Scripts
  • 43. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  External Scripts ◦ VB, Perl, Python, ... ◦ .exe files ◦ .net ◦ ...  Lua ◦ Lua is a simple programming language ◦ Used INSIDE NSClient++ ◦ Very powerful, and simple ◦ A fairly new feature so feel free to suggest things  Modules ◦ Written in C++, Vb, .net, ... ◦ Very powerful, but much “harder”
  • 44. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Configuration: ◦ [modules] ◦ CheckExternalScripts.dll ◦ ... ◦ [External Scripts] ◦ <alias>=<script>  <alias> is the command from nrpe  <script> is the command to execute  check_es_ok=scriptsok.bat ◦ [Wrapped Scripts] ◦ <alias>=<script>
  • 45. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Sample Code: ◦ @echo CRITICAL: Everything is not going to be ok! ◦ @exit 2  Exit statuses: ◦ 0=OK, 1=Warning, 2=Critical, 3=Unknown  NSC.ini syntax: ◦ [External Scripts] ◦ check_bat=scriptscheck_test.bat  Or ◦ [Wrapped Scripts] ◦ check_test=check_test.bat
  • 46. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Sample Code: ◦ Wscript.Echo “Everything might not be ok" ◦ Wscript.Quit(1)  Exit statuses: ◦ 0=OK, 1=Warning, 2=Critical, 3=Unknown  NSC.ini syntax: ◦ [External Scripts] ◦ check_test=cscript.exe /T:30 /NoLogo scriptscheck_test.vbs  Or ◦ [Wrapped Scripts] ◦ check_test=check_test.vbs
  • 47. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Sample Code: ◦ write-host “OK: Everything is wicked!" ◦ exit 0  Exit statuses: ◦ 0=OK, 1=Warning, 2=Critical, 3=Unknown  NSC.ini syntax: ◦ [External Scripts] ◦ check_test=cmd /c echo scriptscheck_test.ps1; exit($lastexitcode) | powershell.exe -command -  Or ◦ [Wrapped Scripts] ◦ check_test=check_test.ps1
  • 48. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  This is exactly as writing ”regular” Nagios scripts.  Find Script on:  http://www.monitoringexchange.org  http://exchange.nagios.org
  • 49. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  Configuration: ◦ [modules] ◦ LUAScript.dll ◦ ... ◦ [LUA Scripts] ◦ <script>  scriptstest.lua  What, no alias? ◦ Not needed (happens inside the script)
  • 50. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  nscp.print('Loading test script...')  nscp.register('check_foo', „foo')  function foo (command) ◦ nscp.print(command) ◦ code, msg, perf = nscp.execute('CheckCPU','time=5','MaxCrit=5') ◦ return code, 'hello from LUA: ' .. msg, perf  end
  • 51. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano  The power of Lua scripts comes from: ◦ The ability to run and modify the result of other commands ◦ The ability to run ”inside” NSClient++ ◦ The simplicity of the language
  • 52. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Revisiting WMI
  • 53. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano ' Default settings for your script. threshold_warning = 50 threshold_critical = 20 ' Create the NagiosPlugin object Set np = New NagiosPlugin ' Define what args that should be used np.add_arg "warning", "warning threshold", 0 np.add_arg "critical", "critical threshold", 0 If Args.Exists("warning") Then threshold_warning = Args("warning") If Args.Exists("critical") Then threshold_critical = Args("critical") np.set_thresholds threshold_warning, threshold_critical
  • 54. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Set colInstances = np.simple_WMI_CIMV2(“.”, "SELECT * FROM Win32_Battery") For Each objInstance In colInstances WScript.Echo "Battery " & objInstance.Status & " - Charge Remaining = " & objInstance.EstimatedChargeRemaining & "% | charge=" & objInstance.EstimatedChargeRemaining return_code = np.escalate_check_threshold(return_code, objInstance.EstimatedChargeRemaining) Next np.nagios_exit "", return_code
  • 55. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Status Meaning 1 Other 2 Unknown 3 Idle 4 Printing 5 WarmUp 6 Stopped Printing 7 Offline
  • 56. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano Questions?
  • 57. CONFERENCE ON NAGIOS & OSS Monitoring May 20th - Bolzano michael@medin.name http://www.linkedin.com/in/mickem Information about NSClient++ http://nsclient.org Slides, and examples at: http://nsclient.org/nscp/conferances/2010/WPN/