Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OpenNMS - My Notes

3,834 views

Published on

OpenNMS Tutorial

Published in: Software

OpenNMS - My Notes

  1. 1. Introduction To OpenNMS
  2. 2. OpenNMS is the world's first enterprise-grade network management application platform developed under the open source model.
  3. 3. “enterprise-grade”  Over 60,000 Devices on a Single Instance (Swisscom)  1.2 Million Data Points Every Five Minutes (New Edge)  120,000 syslog message per minute (SRNS)  320,000 Interfaces per Device (Wind)  3000 Remote Monitors (Papa Johns)
  4. 4. “network management application platform” The Architecture of OpenNMS has been designed to allow for easy integration of other tools, both proprietary and open.
  5. 5. “open source model” OpenNMS is published under the GPL and all components are licensed under an OSI-qualified free software license.
  6. 6. The Four Main Areas of OpenNMS  Provisioning: Both Automated Discovery and Directed Discovery.  Event and Notification Management: Generate, receive, reduce and correlate various network alerts and feed them to a robust notification system.  Service Assurance: Is a particular network service reachable and available?  Performance Data Collection: Gather numeric data from across the network for display, trending and thresholding.
  7. 7. The Architecture
  8. 8. OpenNMS Versions Stable (Production) Versions Have an Even Number: – 1.8 – 1.10 – 1.12 Unstable (Development) Versions Have an Odd Number: – 1.11 – 1.13
  9. 9. OpenNMS Installation:Windows http://www.opennms.org/wiki/Installation:Windows
  10. 10. Install the JDK • Download the Java 6 (1.6), Java 5 (1.5) or higher Java SE JDK from java.sun.com and install it. The JDK, not just a JRE, is required by the web UI, since JSP pages are dynamically compiled. You need the version labeled "Java SE" (for "standard edition"), not EE, ME, or FX.
  11. 11. Install PostgreSQL • Download the Installer • Download the one-click PostgreSQL for Windows installer. • http://www.postgresql.org/download/windo ws
  12. 12. Run the PostgreSQL Installer • Run the installer. For the most part, the defaults should be just fine. You should not need to run the Stack Builder for OpenNMS, although if you intend to use PostgreSQL for other things, it lets you install replication, web, and ODBC tools.
  13. 13. Initialize the Database • Create a Database in PostgreSQL_Rootdata – If for some reason you don't have a default database initialized from the installer, you can create it yourself: – open a command prompt (Start -> Run -> cmd) and change to the bin directory of your PostgreSQL install (by default, C:Program FilesPostgreSQLX.Xbin) – Initialize the database with the following command: • initdb -E UTF-8 -U postgres ..data
  14. 14. Install OpenNMS • If you did not start PostgreSQL already, start it by going to the "PostgreSQL X.X" menu in the Start bar, and click "Start service". • Then, all you need to do is download the latest standalone-'opennms-installer-X.X.X.zip' from the opennms section on the OpenNMS download page. Once it is downloaded to your hard drive, you should be able to just double-click setup32.exe or setup64.exe it in Explorer, and it will start installation. Note that the setupXX.exe you run should match your JVM's arch, so if you're running a 32-bit JVM on 64-bit Windows, install using setup32.exe. • Follow the instructions and you should have a complete OpenNMS installation!
  15. 15. Run OpenNMS • OpenNMS can be run from the command-line, using opennms. bat in your %OPENNMS_HOME%bin directory. Assuming you installed OpenNMS to C:Program FilesOpenNMS, you would open a command-prompt, and cd to C:Program FilesOpenNMSbin. Then run: – opennms.bat start • ...and OpenNMS should start. Open your browser and point it at http://localhost:8980/opennms and log in as "admin" with the password "admin".
  16. 16. Logging In Default username/password is admin/admin
  17. 17. Click on Admin
  18. 18. Admin Menu
  19. 19. Add A User Go to: • Configure Users, Groups and On-Call Roles Choose: • Configure Users
  20. 20. Set the Username and Password
  21. 21. Add User Details Click Finish!
  22. 22. Now, Add to Admin Group
  23. 23. Duty Schedules • Users can have duty schedules –No notices when not on duty –Multiple schedules can exist • Groups can have duty schedules –Outstanding notices sent when back on duty –Overridden by users • On-Call roles always get notices
  24. 24. OpenNMS File Locations /opennms/bin: OpenNMS binary files /opennms/etc: Configuration files /opennms/jetty-webapps: Web server /opennms/lib: Compiled libraries /opennms/share: Reports and RRD data (symlink to /var/opennms) /opennms/logs: Log Files (symlink to /var/log/opennms)
  25. 25. Edit magic-users.properties • ####################################################### #################### • ## R O L E S • ####################################################### #################### • # A comma-separated list of role keys. A role.{KEY}.name and • # role.{KEY}.users property must be set for each key in this property. • roles=rtc, admin, rouser, dashboard, provision, remoting, rest • # This role allows a user to make RTC data posts. • role.rtc.name=OpenNMS RTC Daemon • role.rtc.users=rtc • role.rtc.notInDefaultGroup=true • # This role allows users access to configuration and • # administrative web pages. • role.admin.name=OpenNMS Administrator • role.admin.users=admin,tarus
  26. 26. Login as New User
  27. 27. Configure Automatic Discovery
  28. 28. Click Home > Nodelist
  29. 29. Events Introduction http://www.opennms.org/wiki/Tutorial_Notifications
  30. 30. Events • OpenNMS is event driven • The key process is called “eventd” • Listens on port 5817 for XML • Daemon config: eventd-configuration.xml • Events config: eventconf.xml • Events identified by a “uei”: unique event identifier • Use send-event.pl to send events
  31. 31. Unique Event Identifier • A new interface is discovered: uei.opennms.org/internal/discovery/newSusp ect • A service is down: uei.opennms.org/nodes/nodeLostService • All services on an interface are down: uei.opennms.org/nodes/interfaceDown • All interfaces on a node are down: uei.opennms.org/nodes/nodeDown
  32. 32. eventd-configuration.xml <EventdConfiguration TCPAddress="*" TCPPort="5817" UDPAddress="*" UDPPort="5817" receivers="5" getNextEventID="SELECT nextval('eventsNxtId')" socketSoTimeoutRequired="yes" socketSoTimeoutPeriod="3000"> </EventdConfiguration>
  33. 33. eventconf.xml This file contains three things: • Global security settings • OpenNMS event definitions • Include files for non-OpenNMS event definitions Try and open the eventconf.xml file from opennms/etc folder
  34. 34. Security Settings <global> <security> <doNotOverride>logmsg</doNotOverride> <doNotOverride>operaction</doNotOverride> <doNotOverride>autoaction</doNotOverride> <doNotOverride>tticket</doNotOverride> <doNotOverride>script</doNotOverride> </security> </global>
  35. 35. An OpenNMS Event <uei>uei.opennms.org/internal/discovery/newSuspect</uei> <event-label>OpenNMS-defined internal event: discovery newSuspect</event-label> <descr> &lt;p&gt;Interface %interface% has been discovered and is being queued for a services scan.&lt;/p&gt; </descr> <logmsg dest='logndisplay'> A new interface (%interface%) has been discovered and is being queued for a services scan. </logmsg> <severity>Warning</severity> </event>
  36. 36. Event Include Files <event-file>events/Translator.default.events.xml</event-file> <event-file>events/Rancid.events.xml</event-file> <event-file>events/3Com.events.xml</event-file> <event-file>events/AdaptecRaid.events.xml</event-file> <event-file>events/ADIC-v2.events.xml</event-file> <event-file>events/Adtran.events.xml</event-file> <event-file>events/Adtran.Atlas.events.xml</event-file> <event-file>events/Aedilis.events.xml</event-file> <event-file>events/AirDefense.events.xml</event-file>
  37. 37. <event-file>events/AIX.events.xml</event-file> <event-file>events/AKCP.events.xml</event-file> <event-file>events/AlcatelLucent.SMSBrick.events.xml</event-file> <event-file>events/Allot.NetXplorer.events.xml</event-file> <event-file>events/Allot.SM.events.xml</event-file> <event-file>events/Alteon.events.xml</event-file> <event-file>events/Altiga.events.xml</event-file> <event-file>events/APC.events.xml</event-file> <event-file>events/APC.Best.events.xml</event-file> <event-file>events/APC.Exide.events.xml</event-file> Let’s open C:Program FilesOpenNMSetc View event-file from C:Program FilesOpenNMSetcevents
  38. 38. Let’s try to send some event
  39. 39. Restarting eventd • Simply restart OpenNMS • Use a reload event: send-event.pl uei.opennms.org/internal/eventsConfigChange Remember that there is no error checking
  40. 40. Event Severities
  41. 41. Create an Event • Send a uei.opennms.org/class/happiness event • Note how it appears in the GUI • Create a Class.events.xml file • Add it to eventconf.xml • Reload the event configuration • Send the event again • Note how it appears in the GUI
  42. 42. <events> <event> <uei>uei.opennms.org/class/happiness</uei> <event-label>OpenNMS defined event: The OpenNMS Class is so happy</event-label> <descr> &lt;p&gt;This event is sent when the OpenNMS Class is happy.&lt;/p&gt; &lt;ul&gt; &lt;li&gt;Dance, Everybody Dance!&lt;/li&gt; &lt;li&gt;Life is Good!&lt;/li&gt; &lt;li&gt;This is Fun!&lt;/li&gt; &lt;/ul&gt; </descr> <logmsg dest='logndisplay'> &lt;p&gt;OpenNMS Class is Happy! &lt;/p&gt; </logmsg> <severity>Normal</severity> </event> </events> Save Class.events.xml in: C:Program FilesOpenNMSetcevents
  43. 43. Add the new Class.events.xml file to eventconf.xml <event-file>events/Class.events.xml</event-file> <event-file>events/Translator.default.events.xml</event-file> <event-file>events/Rancid.events.xml</event-file> <event-file>events/3Com.events.xml</event-file> <event-file>events/AdaptecRaid.events.xml</event-file> <event-file>events/ADIC-v2.events.xml</event-file> <event-file>events/Adtran.events.xml</event-file> <event-file>events/Adtran.Atlas.events.xml</event-file> Save that file then send the reload event:
  44. 44. • Note that the reload was successful: • Resend the “happiness” event:
  45. 45. Notifications Introduction http://www.opennms.org/wiki/Tutorial_Notifications
  46. 46. Notices Notices bring events to others attention • Events can trigger notices – The event happens – A notice is triggered – It “walks a path” – Those along the path get notified • Escalations can insure notices get attention.
  47. 47. Built-in Notifications • A number of notifications are built in • Turn on notices • Add a node and see the notifications, as “nodeAdded” is a default notice.
  48. 48. Create a User • You can do so by clicking the "Admin" link in the menu bar, going to "Configure Users, Groups and On-Call Roles", and then clicking "Configure Users". • Then, click the "Add New User" link, and enter a username and password in the form, then click "OK". Next, fill out the relevant user information. • Finally, click "Finish" down at the bottom of the form.
  49. 49. Destination Paths • To receive a notification, your user or a group that your user is in needs to be a part of a destination path. A destination path helps OpenNMS determine who is eligible to receive a particular notification, depending on user name, groups, duty schedules, and other rules.
  50. 50. Create a Destination Path • Go to the Admin page, click "Configure Notifications" in the "Operations" section, then "Configure Destination Paths," then click the "New Path" button. • Give it a name, like "Tutorial", then click "Edit". Add your new user by selecting it, then click "Next >>>":
  51. 51. Admin → Notifications → Destination Paths
  52. 52. The “Email-Admin” Path
  53. 53. • On the next page, leave the defaults ("javaEmail" and "on") and click "Next >>>" again. • Finally, click "Finish" and you should see your new tutorial path in the "Existing Paths" section.
  54. 54. Different Types of Targets
  55. 55. Group Intervals
  56. 56. Notification Command
  57. 57. Event Notifications • The next thing to do is create an event notification, which ties any OpenNMS event together with a destination path. • Go to the Admin page, click "Configure Notifications" in the "Operations" section, then "Configure Event Notifications." • You can create a new event notification by clicking the "Add New Event Notification“, edit or delete.
  58. 58. Creating an Event Notification • "Add New Event Notification" button at the top of the Event Notifications page. • look for "OpenNMS-defined internal event: an authentication failure has occurred in WebUI":
  59. 59. Then click "Next >>>". you can build a rule for matching a set of IP addresses and/or services. In this case, we're just matching an internal OpenNMS event. click "Skip results validation >>>". Choose the Triggering Event
  60. 60. Apply Any Filters you can build a rule for matching a set of IP addresses and/or services.
  61. 61. Fill in the following values: • Name: authenticationFailed • Choose A Path: Tutorial • Text Message: The OpenNMS Web UI had a failed login attempt, by user '%parm[user]%', from IP address %parm[ip]% (Exception message: %parm[exceptionMessage]%) • Short Message: Authentication failed by user %parm[user]% (Notice #%noticeid%) • Email Subject: Authentication Failed (Notice #%noticeid%)
  62. 62. The Notice
  63. 63. • You should now see your notification event in the list, but it will be disabled. Set the radio button to "On" to enable it, and you should be ready to send a notification:
  64. 64. Hint: UEI Filter • You can use Regex UEI filter in notifications to use one notification for several events. • Unfortunately you can't use the WebUI for this configuration. You need to edit your notification in /etc/opennms/notifications.xml. Examples: <uei>~uei.opennms.org/custom/event.*</uei> <uei>~uei.opennms.org/.*/event</uei> <uei>~uei.opennms.org/custom/event[0-9]+</uei> <uei>~uei.opennms.org/custom/eventd+</uei>
  65. 65. Configure Mail Unless you have an SMTP server running on your OpenNMS host, you will need to configure mail before notifications will work. Mail configuration for notifications is in the – "$OPENNMS_HOME/etc/javamail- configuration.properties" file.
  66. 66. # This is the e-mail address that OpenNMS puts in the "From" field: • org.opennms.core.utils.fromAddress=FROM- EMAIL-ADDRESS • org.opennms.core.utils.mailHost=MAIL- SERVER-IP if your mail server requires authentication for sending, see the section under "These properties define the Mail authentication." and edit them as appropriate for your environment.
  67. 67. • The last bit is to pass the mail server information to OpenNMS Backing up file /opt/opennms/etc/javamail- configuration.properties, and change the contents to following org.opennms.core.utils.useJMTA=false org.opennms.core.utils.transport=smtp org.opennms.core.utils.mailHost=smtp.gmail.com org.opennms.core.utils.smtpport=587 org.opennms.core.utils.smtpssl.enable=false org.opennms.core.utils.authenticate=true org.opennms.core.utils.authenticateUser=hello@gmail.com org.opennms.core.utils.authenticatePassword=mypassword org.opennms.core.utils.starttls.enable=true org.opennms.core.utils.messageContentType=text/html org.opennms.core.utils.charset=UTF-8 Above configuration allows you to send email to your gmail account, which might be good if you want use OpenNMS to monitor your home server. Try the email configuration with your internet account
  68. 68. • if you are using OpenNMS in your company, you need to make it to talk to your own mail server instead, you can use following configuration org.opennms.core.utils.transport=smtp org.opennms.core.utils.useJMTA=false org.opennms.core.utils.mailHost=mailhost.doloveyou.com org.opennms.core.utils.smtpport=25 org.opennms.core.utils.authenticate=true org.opennms.core.utils.authenticateUser=leaonow@dolov eyou.com org.opennms.core.utils.authenticatePassword=password org.opennms.core.utils.fromAddress=opennms@doloveyou .com org.opennms.core.utils.charset=UTF-8
  69. 69. • some server will complain if you do not specify org.opennms.core.utils.fromAddress • now save the file, and restart OpenNMS, then generate some event and you should be able to receive a notification email. • if not, you can check the log file for notification which is /opennms_home/logs/daemon/notifd.log
  70. 70. Enable Notifications • By default, OpenNMS ships with notifications disabled. In the main OpenNMS Admin page, there is a radio selector labeled, "Notification Status." Change it to "On" and click update.
  71. 71. Now let's trigger your notification. Log out and Log in with wrong password. You should receive an email telling you about the authentication failure. If you don't, take a look at "$OPENNMS_HOME/logs/daemon/notifd.log" and see if you get any error messages.
  72. 72. Alarm Introduction
  73. 73. Alarms • OpenNMS gives you the ability to indicate which events are important and they become alarms. Also, with this ability, you can now reduce these important events to one row in the alarms table with the reduction-key element of the the <alarm-data> tag in the event configuration files. This allows you to decide the granularity of the reduction as you will see in the sample images below.
  74. 74. Alarms with new style sheet and sorted by severity This is much nicer and generally more useful than the raw events.
  75. 75. Alarms sorted by count This shows the number of events that were reduced to a single alarm row. Un-reduced event list Click on a count in the alarm listing and and jump to the list of un-reduced events for that alarm.
  76. 76. Create an Alarm from an Event Simply add <alarm-data> to the event definition <event> <uei>uei.opennms.org/default/event</uei> <event-label>OpenNMS-defined default event: event</event-label> <descr> &lt;p&gt;An event with no matching configuration was received from interface %interface%. This event included the following parameters: %parm[all]%&lt;/p&gt; </descr> <logmsg dest='logndisplay'> An event with no matching configuration was received from interface %interface%. </logmsg> <severity>Indeterminate</severity> <alarm-data reduction-key="%uei%:%dpname%:%nodeid%:%interface%" alarm-type="3" /> </event>
  77. 77. Capability Scanning Introduction http://www.opennms.org/wiki/Tutorial_Capability_Scanning
  78. 78. Detecting Services on Devices • In order to detect services on devices, the Provisiond daemon executes detectors to detect the presence of configured services. Once a service is detected, it is used by other daemons to monitor availability, or to collect data about it. • These services are configured in the default foreign-source definition, which is pre-configured with a number of common services, like HTTP, SMTP, and so on. All Provisiond needs to get started is an IP address that OpenNMS suspects may provide one or more of these services.
  79. 79. New Suspect Events • Provisiond listens for "new suspect" events, which tell OpenNMS, "I suspect the IP address at <x> has services." These newSuspect events can come from a number of sources: – an event sent from the command-line, using the send- event.pl script – an event sent from a 3rd-party tool, talking on the OpenNMS event listener TCP port – the Discovery daemon, which scans IP ranges for valid IP addresses (using ICMP "ping") – the "Add Interface" page on the OpenNMS administrator web UI
  80. 80. Scanning an Address • When Provisiond receives a newSuspect event, it executes the configured detectors to determine the presence of each service on the IP address provided by the event. It then creates a node and interface representing the IP address, and adds any services that it detects to that interface. • After the scan is complete, the node, interfaces, and services are written to the database, and then OpenNMS events are sent to indicate that these new nodes, interfaces, and services exist.
  81. 81. Adding a Custom Service • To add a custom service to Provisiond, log in to the web UI as an admin user. Go to Admin / Manage Provisioning Requisitions and click the Edit Default Foreign Source Definition button and add a detector for your custom service. (Gold, Silver, Bronze package) All Provisiond detectors take the port, timeout, and retry parameters, but many can take additional parameters as well which allow you to tune a particular plugin's behavior. The available parameters for the chosen detector are enumerated in a drop-down list as you build the detector definition
  82. 82. • Some example detectors are HttpDetector for detecting web servers, TcpDetector for detecting an arbitrary open TCP port. To add a custom service is make a new detector entry in the default foreign-source definition, with a unique service name defined.
  83. 83. Let’s Try:- title=Adding the OpenNMS Web Server contents=As an example, Make OpenNMS detect its web UI. The OpenNMS web UI is just an HTTP server, listening on port 8980, so we'll use the HttpDetector to detect it. Edit the default foreign- source definition and add the following to the end of the list of configured detectors: (don’t forget to synchronize) By default, Provisiond will re-scan all of your devices every 24 hours. If you wish to force it to schedule a rescan for your device immediately, it will do so when it receives the "forceRescan" event. Click “Node List”, select the added node then click “rescan”.
  84. 84. Discovery_Configuration_How-To http://www.opennms.org/wiki/Discovery_Configuration_How-To
  85. 85. Discovery Configuration Details The global discovery attributes are:  threads  packets-per-second  initial-sleep-time  restart-sleep-time  timeout  retries  specific  include-range  exclude-range  include-url
  86. 86. • threads: This is the number of threads that will be used for discovery. By default this is set to 1. • packets-per-second: This is the number of ICMP packets that will be generated each second. The default is 1. • Relationship between the packets-per-second and the number of threads. If a network average latency = 500ms, if packets-per-second = 2: – Then double the speed of NewSuspect messages were created. If = 1 thread, packets-per-second = 3 would have little effect – The single thread would be processing as many packets as it could as fast as it could.
  87. 87. • initial-sleep-time: This is in milliseconds, before the discovery process will commence after OpenNMS is started (by default 5 minutes). This delay is put in place to allow the product to fully start before generating new events. • restart-sleep-time: Once the discovery process has completed, in milliseconds, before it will start again. By default, the process will repeat 24 hours after the last discovery run has completed.
  88. 88. • timeout: this is the amount of time, in milliseconds, that the discovery process will wait for a response from a given IP address before deciding that there is nothing there. This can be overridden later in the file. • retries: this is the number of attempts that will be made to query a given IP address before deciding that there is nothing there. This can be overridden later in the file.
  89. 89. • Once the defaults are in place (defaults meaning the global values that will be used if they are not overridden in the tags below), the only thing left to tell the discovery process is which IP addresses to try. This is controlled by four different tags: – Specific – include-range – exclude-range – include-url
  90. 90. specific • specify a IP address to be discovered. Multiple specific tags can be used.<specific>ip- address</specific>Where ip-address is the address you want discovered. Note the lack of spaces between the tags.
  91. 91. include-range • Specify a range of IP addresses to be discovered. Multiple include-range tags can be used. <include-range> <begin>start-ip-address</begin> <end>end-ip-address</end> </include-range> • Where start-ip-address is the beginning of a range to be scanned and end-ip-address is the end of that range.
  92. 92. exclude-range • Specify a range of IP address to be excluded from discovery. <exclude-range> <begin>start-ip-address</begin> <end>end-ip-address</end> </exclude-range> • Where start-ip-address is the beginning of a range to be excluded and end-ip-address is the end of that range. • Note that the exclude-range tag will only override addresses in an include-range. It will not override specific IP addresses or addresses included in a file. • There is no "specific" version of the exclude tag - if you want to exclude a specific IP address use an exclude- range where the beginning and ending IP addresses are the same.
  93. 93. include-url • Specify a file containing IP addresses to be included in discovery. <include-url>file:filename</include-url> • Where filename is the full path to a text file listing IP addresses, one to a line. Comments can be imbedded in this file. Any line that begins with a "#" character will be ignored, as will the remainder of any line that includes a space followed by "#". • All tags are optional and unbounded (you can have as many as you wish).
  94. 94. • Let's try having OpenNMS ping all the addresses close to your management system. • First determine the IP address of your management system. (We'll use 10.0.1.17 in this example.) We will configure Discovery with the class C range of your address (For this example, 10.0.1.1 to 10.0.1.254). • Go to the admin page in the OpenNMS web UI and then click "Configure Discovery", under the "Operations" section. In the Discovery Configuration GUI click on Add New in the Include Ranges section
  95. 95. • This will bring up a window like the one below. Fill it in with the begin and end address of your managements systems network and click 'Add'. After this return the main Discovery Configuration GUI and click Save & Restart Discovery.
  96. 96. • Assuming you have no other include ranges configured on this system it should take opennms about 5 minutes to ping all of those devices. After that time go look at the Node List and see the devices it has discovered.
  97. 97. Polling_Configuration_How-To http://www.opennms.org/wiki/Polling_Configuration_How-To
  98. 98. • There are two major ways that OpenNMS gathers data about the network. The first is through polling. Processes called monitors connect to a network resource and perform a simple test to see if the resource is responding correctly. If not, events are generated. The second is through data collection using collectors. Example, SNMP data-collector OpenNMS Data Gathering polling Data collection Monitor processes Collector processes Network resources Polling/Monitoring
  99. 99. • The basic idea behind the poller starts with grouping network devices into packages. Each package will consist of various services and how they are to be polled (i.e. frequency). In addition, should an outage be detected, each package can have its own downtime model which controls how the poller will dynamically adjust its polling on services that are down. Finally, each package has an outage calendar that schedules times when the poller is not to poll (i.e. scheduled downtime). poller Monitor processes Network devices Packages: 1.Services 2.frequency Downtime model: 1. Dynamically adjust 2. Outage calendar control Polling/Monitoring
  100. 100. Polling/Monitoring The OpenNMS pollerd subsystem is responsible for polling its defined list of services each monitoring interval (by default, every 5 minutes). If any service is down, pollerd will send an event notification to other OpenNMS subsystems to handle as desired (such as the notification system, which could then send an e-mail alert). Pollerd (every 5 minutes) SYS#3 DOWN DOWN EVENT SYS#1- UP SYS#2- UP Notification OpenNMS
  101. 101. • Outages are an output of the poller, as are the nodeLostService / nodeRegainedService and equivalent paired events for interface- and node-level outages. When the poller sees an outage, it broadcasts a "down" event and opens an outage. Persisting the outage to the DB is Pollerd's job; persisting the event is Eventd's job. Internally, the event bus is implemented as a FIFO queue, so events cannot "leap-frog" each other while they're on the bus. pollerd nodeReg ainedSer vice nodeLost Service Network devices Packages: Downtime model: outage outage DB Eventd “down” FIFO queue
  102. 102. Poller-configuration • Pollerd's main configuration file is /etc/poller- configuration.xml. – open the file • Note that pollerd requires its monitored services to already have been initially provisioned in order to start monitoring them. Provisiond handles this service provisioning using either requisitions or detectors
  103. 103. Requisitions – manually define nodes & services Detectors – Auto discovery the nodes and services
  104. 104. The Poller Configuration File Header • <poller-configuration threads="30" serviceUnresponsiveEnabled="false" pathOutageEnabled="false"> <node-outage status="on" pollAllIfNoCriticalServiceDefined="true"> <critical-service name="ICMP"/> </node- outage> http://www.opennms.org/documentation/java-xsddocs-stable/poller-configuration.html
  105. 105. • poller-configuration threads :- Determines the maximum number of threads that will be used for polling, and can be adjusted up or down depending on the size of your network and the power of your server. poller#2 Monitor processes Network devices size Packages: 1.Services 2.frequency Downtime model: 1. Dynamically adjust 2. Outage calendar control poller#1 poller#30 Server power Threads poller-configuration threads
  106. 106. • serviceUnresponsiveEnabled :- A poll consists of a connection to a particular port on a remote interface, and then a test to see if the service on that port returns an expected response. If the response is not received within the timeout, the service is considered down. In some networks, however, short, intermittent failures are common. This will result in what is known as a "30 second outage". Due to the default downtime model, a failed service will be polled again in 30 seconds. Problem: a user attempting to access that resource would also have experienced a timeout. So the option was added to denote a failure as when the port connection fails and not the response. In this case, an unresponsive service does not generate an outage, but only a "service unresponsive" event. To enable this behavior, set this value to "true". poller Connection: 1.Service response ? 2.Port connection ? Packages: Service Freq=30sec Downtime model: If 1==timeout then unresponsive service event Else if 2==fail then outage event If 1==30 sec outage And Freq Poll==30 sec then outage event serviceUnresponsiveEnabled =“False” serviceUnresponsiveEnabled =“True” serviceUnresponsiveEnabled
  107. 107. • node-outage:- The basic event that is generated when a poll fails is called "NodeLostService". If > 1 service is lost, multiple NodeLostService events will be generated. If all the services on an interface are down, instead of a NodeLostService event, an "InterfaceDown" event will be generated. If all the interfaces on a node are down, the node itself can be considered down, and this section of the configuration file controls the poller behavior should that occur. If a "NodeDown" event occurs and node- outage status="on" then all of the InterfaceDown and NodeLostService events will be suppressed and only a NodeDown event will be generated. Instead of attempting to poll all the services on the down node, the poller will attempt to poll only the critical-service, by default ICMP. Once the critical service returns, the poller will then resume polling the other services. If the critical service is not available on a node, the pollAllIfNoCriticalServiceDefined parameter controls the behavior. If set to "true" then all services will be polled. If set to "false" then the first service in the package that exists on the node will be polled until service is restored, and then polling will resume for all services. # a node consists of interfaces If poll == node service fail then NodeLostService event If poll == node services fail ( >1) then multiple NodeLostService event If poll == interface all services fail then InterfaceDown event If poll == all interfaces in a node down the NodeDown event if node-outage status="on" suppressed InterfaceDown and NodeLostService events poll only critical-service (default ICMP) until services return If poll == critical services return then resume polling other services If poll == critical services not available And if pollAllIfNoCriticalServiceDefined =“true” then poll all services if pollAllIfNoCriticalServiceDefined =“false” then poll first services exists in package until services return node-outage status
  108. 108. pathOutageEnabled • To be Updated soon http://www.opennms.org/wiki/Path_Outage_How-To
  109. 109. pollerd Network devices Downtime model: Packages1: Http, SNMP,DNS ICMP, JMX,SMTP Freq: 1 minute Packages2: Http, SNMP,DNS ICMP, Freq: 5 minute Packages3: Http, ICMP, Freq: 15 minute Downtime model: Downtime model: A poller package consists of a name, a group of interfaces to poll, and the services to be polled on those interfaces. Multiple packages can be configured, and an interface can exist in more than one package (although the value of that is questionable). This gives great flexibility to how the service levels will be determined for a given device. In addition to a list of services, each package can have a "downtime" model and an "outage calendar” Poller Packages
  110. 110. The definition of a package starts with a package tag: • <package name="example1"> This is followed by a list of tags that define what interfaces will be included in the package. There are five of these tags: filter • IPADDR IPLIKE *.*.*.* Each package must have a filter tag that performs the initial test to see if an interface should be included in a package. Filters operate on interfaces (not nodes). Only one filter statement can exist per package. specific • <specific>192.168.1.59</specific> .This specifies a particular IP address to include in a package. include-range • <include-range begin="192.168.0.1" end="192.168.0.254"/> .This specifies a particular range of IP addresses to include in a package.
  111. 111. exclude-range • <exclude-range begin="192.168.0.100" end="192.168.0.104"/> This specifies a particular range of IP addresses to exclude in a package. This will override an include-range tag. include-url • <include-url>file:/opt/OpenNMS/etc/include</include-url> This tag will point to a file that consists of a list of IP addresses, one to a line, that will be included in the package. Comments can be imbedded in this file. Any line that begins with a "#" character will be ignored, as will the remainder of any line that includes a space followed by "#". • All of the above tags, except for filter, are optional and unbounded.
  112. 112. Poller Services Once the IP addresses to include in a package are defined, the services to be polled are listed. For example: • <service name="DNS" interval="300000" user- defined="false" status="on"> <parameter key="retry" value="3"/> <parameter key="timeout" value="5000"/> <parameter key="port" value="53"/> <parameter key="lookup" value="localhost"/> </service> There must be at least one service defined per package.
  113. 113. The common parameters for the poller service are as follows: • retry – The number of attempts that will be made to connect to the service. Default is 3 • timeout – The amount of time, in milliseconds, that OpenNMS will wait for a response from the service. Default is
  114. 114. port lookup • Note that the service configuration parameters in the poller can be different from the detector configuration in the foreign-source definition, but the service names in these two places must match exactly. You may want a longer timeout during service detection. • In this example, a DNS request will be made to look up "localhost". This should return an error (as localhost is usually not listed in a DNS) but if that error is returned, DNS is functioning properly and the test passes. • Microsoft's implementation of DNS, however, sometimes has problems with this, so you may want to put a real host for the lookup value (and in the detector entry in the foreign-source definition as well).
  115. 115. Scheduled Outages • In order to keep servers operating properly, it is often necessary to bring them down for scheduled maintenance. Instead of having these maintenance outages reflected as a true service outage, they can be included in a Scheduled Outage and then referenced by the poller package using the outage-calendar tag. This tag contains the name of a valid outage in the poll- outages.xml file. • The outage-calendar tag is optional and unbounded (i.e. you can reference more than one outage).
  116. 116. • Since version 1.5.91 you can configure scheduled outages from the GUI, go to Admin -> Scheduled Outages. Note that scheduled outages may be edited fully from the web UI; it's no longer necessary to edit the poller configuration manually to associate a scheduled outage with a poller package, although it's still possible. • Before version 1.5.91, there were three types of scheduled outages: weekly, monthly and specific. Since 1.5.91 there is also the possibility to configure daily scheduled outages.
  117. 117. • If you have the problem that nodes are reported to be down although they are within a daily outage which goes past midnight, try to define two timespans within the outage, one until midnight and the other one starting after midnight, e.g. instead of outage 22:00:00- 01:00:00 define 22:00:00-23:59:59 and 00:00:00-01:00:00.
  118. 118. Examples from the poll-outages.xml file: <outage name="global" type="weekly"> <time day="sunday" begins="12:30:00" ends="12:45:00"/> <time day="sunday" begins="13:30:00" ends="14:45:00"/> <time day="monday" begins="13:30:00" ends="14:45:00"/> <time day="tuesday" begins="13:00:00" ends="14:45:00"/> <interface address="192.168.0.1"/> <interface address="192.168.0.36"/> <interface address="192.168.0.38"/> </outage> This defines an outage calendar called "global" that is run every week. It specifies four outage times: Sunday starting at 12:30 pm and lasting 15 minutes, Sunday starting at 1:30 pm and lasting an hour and fifteen minutes, the same outage on Monday, and one on Tuesday from 1:00 pm to 2:45 pm. Three interfaces will be affected.
  119. 119. <outage name="hub maintenance" type="monthly"> <time day="1" begins="23:30:00" ends="23:45:00"/> <time day="15" begins="21:30:00" ends="21:45:00"/> <time day="15" begins="23:30:00" ends="23:45:00"/> <interface address="192.168.100.254"/> <interface address="192.168.101.254"/> <interface address="192.168.102.254"/> <interface address="192.168.103.254"/> <interface address="192.168.104.254"/> <interface address="192.168.105.254"/> <interface address="192.168.106.254"/> <interface address="192.168.107.254"/> </outage> This outage calendar is called "hub maintenance" that is run every month. On the first of the month the outage begins at 11:30 pm and lasts 15 minutes. The same outage occurs on the 15th of the month in addition to another outage from 9:30 pm to 9:45 pm.
  120. 120. • <outage name="proxy server tuning" type="specific"> <time begins="10-Nov-2001 17:30:00" ends="11-Nov-2001 08:00:00"/> <interface address="192.168.0.1"/> </outage> This outage named "proxy server tuning" began on November 10th, 2001 at 5:30 pm and lasted until 8:00 am the next day. This affected one interface. You can have more than one "time" entry per specific outage. If a particular outage calendar is included in a poller package, then polling will not occur during this time. This does not mean that the service will be considered "up" during this time. If the maintenance is started a minute too soon and an outage is detected, then no poll will be made to restore the service until after the outage window has closed.
  121. 121. Downtime Models • The goal of the poller is to verify service levels, and everyone involved would like to see those be as high as possible. • By default, the poller will poll every five minutes. If that polling rate was static, then the shortest an outage could be would be five minutes: one poll to note the outage and the next to note it was restored. • In these days of service levels in the "99.99%" range, a five minute outage can be devastating.
  122. 122. • To help combat this, OpenNMS uses adaptive polling. Once an outage is detected, polling is temporarily increased to try and detect, as soon as possible, when the service is restored. • <downtime interval="30000" begin="0" end="300000"/> <!-- 30s, 0, 5m --> <downtime interval="300000" begin="300000" end="43200000"/> <!-- 5m, 5m, 12h --> <downtime interval="600000" begin="43200000" end="432000000"/> <!-- 10m, 12h, 5d --> <downtime begin="432000000" delete="true"/> <!-- anything after 5 days delete -->
  123. 123. What this downtime model will do is the following: • from the moment the outage begins (time 0) until five minutes later (time 300,000 ms), the poller will poll every 30 seconds (30,000 ms). • After five minutes, it is assumed that any service level that would be greatly affected by a five minute outage has been broken, so from five minutes (300,000 ms) into the outage until the first 12 hours of the outage (43,200,000 ms) polling resumes its five minute (300,000 ms) interval.
  124. 124. • If the outage is older than 12 hours, it must not be important and/or it is difficult to fix, so from when the outage is 12 hours old until it is 5 days (432,000,000 ms) old, the interval is reduced to poll once every ten minutes (600,000 ms). • If a service has been down for longer than five days, it is deleted (well, marked as "forced unmanaged") and no longer polled. Note that this is optional, you can continue to poll a down service for as long as you would like. For the last downtime interval in the model, just leave the "end" time off in order to extend polling indefinitely.
  125. 125. Poller Monitors • For each service in a poller package, there must be a corresponding monitor. In the detector configuration in the foreign-source definition, the detector class was included in the detector entry itself, but since there is the potential for a particular service to exist many times in the poller configuration file, this bit of bookkeeping was put, once, at the end of the file.
  126. 126. <monitor service="DominoIIOP" class-name="org.opennms.netmgt.poller.DominoIIOPMonitor"/> <monitor service="ICMP" class-name="org.opennms.netmgt.poller.IcmpMonitor"/> <monitor service="Citrix" class-name="org.opennms.netmgt.poller.CitrixMonitor"/> <monitor service="LDAP" class-name="org.opennms.netmgt.poller.LdapMonitor"/> <monitor service="HTTP" class-name="org.opennms.netmgt.poller.HttpMonitor"/> <monitor service="HTTP-8080" class-name="org.opennms.netmgt.poller.HttpMonitor"/> <monitor service="HTTP-8000" class-name="org.opennms.netmgt.poller.HttpMonitor"/> <monitor service="HTTPS" class-name="org.opennms.netmgt.poller.HttpsMonitor"/> <monitor service="SMTP" class-name="org.opennms.netmgt.poller.SmtpMonitor"/> <monitor service="DHCP" class-name="org.opennms.netmgt.poller.DhcpMonitor"/> <monitor service="DNS" class-name="org.opennms.netmgt.poller.DnsMonitor" /> <monitor service="FTP" class-name="org.opennms.netmgt.poller.FtpMonitor"/> <monitor service="SNMP" class-name="org.opennms.netmgt.poller.SnmpMonitor"/> <monitor service="Oracle" class-name="org.opennms.netmgt.poller.TcpMonitor"/> <monitor service="Postgres" class-name="org.opennms.netmgt.poller.TcpMonitor"/> <monitor service="MySQL" class-name="org.opennms.netmgt.poller.TcpMonitor"/> <monitor service="Sybase" class-name="org.opennms.netmgt.poller.TcpMonitor"/> <monitor service="Informix" class-name="org.opennms.netmgt.poller.TcpMonitor"/> <monitor service="SQLServer" class-name="org.opennms.netmgt.poller.TcpMonitor"/> <monitor service="SSH" class-name="org.opennms.netmgt.poller.TcpMonitor"/> <monitor service="IMAP" class-name="org.opennms.netmgt.poller.ImapMonitor"/> <monitor service="POP3" class-name="org.opennms.netmgt.poller.Pop3Monitor"/> You should not need to modify this section unless you manually add your own pollers.
  127. 127. SNMP Data Collection Configuration How-To http://www.opennms.org/wiki/Data_Collection_Configuration_How-To
  128. 128. • OpenNMS focuses on the services network resources provide: web pages, database access, DNS, DHCP, etc. (although information on network elements is also available). • There are two major ways that OpenNMS gathers data about the network. – The first is through polling. Processes called monitors connect to a network resource and perform a simple test to see if the resource is responding correctly. If not, events are generated. – The second is through data collection using collectors.
  129. 129. Currently data can be collected by : • SNMP, • NSClient (the Nagios Agent), • JMX, • HTTP
  130. 130. There are several things that have to happen in order for this to work. For all data collection methods: • Provisiond During the scanning process, Provisiond discovers whether the various collectable services exist on the discovered node. More specifically for SNMP collection, Provisiond must be able to access SNMP information on that interface and to form some basic mappings, such as IP Address to ifIndex. • collectd-configuration.xml Just as in the poller- configuration.xml file, interfaces are mapped to packages for collection in this file. If data collection is required on an interface, it needs to exist in a package in this file. The default configuration is suitable for most initial purposes.
  131. 131. For SNMP data collection, the following files must be configured correctly: • snmp-config.xml For each interface, a valid community string must exist in this file. • datacollection-config.xml Each package in the collectd configuration file points to an snmp- collection definition in this file. Each snmp- collection defines what information to collect via SNMP. The default configuration is fairly complete for basic purposes, and will probably not require much changing initially.
  132. 132. snmp-config.xml • The parameters used to connect with SNMP agents are defined in the snmp-config.xml file. Here is an example: <snmp-config retry="3" timeout="800" read-community="public" write-community="private"> <definition version="v2c"> <specific>192.168.0.5</specific> </definition> <definition retry="4" timeout="2000"> <range begin="192.168.1.1" end="192.168.1.254"/> <range begin="192.168.3.1" end="192.168.3.254"/> </definition> <definition read-community="bubba" write-community="zeke"> <range begin="192.168.2.1" end="192.168.2.254"/> </definition> <definition port=“1161"> <specific>192.168.5.50</specific> </definition> </snmp-config>
  133. 133. The common attributes for the snmp-config tag are as follows: • retry – The number of attempts that will be made to connect to the SNMP agent. Default is 1 • timeout – The amount of time, in milliseconds, that OpenNMS will wait for a response from the agent. Default is 3000 • read-community – The default "read" community string for SNMP queries. If not specified, defaults to "public"
  134. 134. • write-community –The default "write" community string for SNMP queries. Note that this is for future development - OpenNMS does not perform SNMP "sets" at the moment. • port –This overrides the default port of 161. • version –Here you can force either SNMP version 1 by specifying "v1", version 2c with "v2c", or version 3 with "v3". Default is "v1"
  135. 135. For SNMPv3 authentication and collection (only available when using SNMP4J): • security-name - A security name for SNMP v3 authentication • auth-passphrase - The passphrase to use for SNMP v3 authentication • auth-protocol - The authentication protocol for SNMP v3. Either "MD5" or "SHA". Default is MD5 • privacy-passphrase - A privacy pass phrase used to encrypt the contents of SNMP v3 packets
  136. 136. • privacy-protocol –The privacy protocol used to encrypt the contents of SNMP v3 packets. Either "DES", "AES","AES192" or "AES256". Default is DES. • engine-id –The engine id of the target agentcontext- name The name of the context to obtain data from on the target agent. • context-engine-id –The context engine id of the target entity on the agent.
  137. 137. • enterprise-id – An enterprise id for SNMP v3 collection More rarely used attributes in the snmp-config tag are: • proxy-host – A proxy host to use to communicate with the specified agent(s) • max-vars-per-pdu – Number of variables per SNMP request. Default is 10 • max-request-size – If using SNMP4J as the SNMP library, the maximum size of outgoing SNMP requests. Defaults to 65535, must be at least 484
  138. 138. • As explained in the Discovery How-To, the capabilities check process starts with a newSuspect event (generated either manually or through the discovery process). This NewSuspect event is received by the provisioning daemon (Provisiond). • The Provisiond process is responsible for scanning IP addresses for particular services. Each service that can be detected on a discovered node is defined in the default foreign-source definition. Upon receipt of a newSuspect event, Provisiond begins to test each configured service detector to see if it exists on that device.
  139. 139. • When testing SNMP, Provisiond makes an attempt to receive the System Object ID (systemOID) for the device using the community string and port defined in snmp- config.xml. • If the sysObjectID is successfully retrieved, Provisiond gathers additional SNMP attributes from the system group, the ipAddressTable (if present), ipAddrTable (if ipAddressTable is not present), ifTable, and ifXTable.
  140. 140. • If the ipAddressTable (or ipAddrTable) or ifTable are unavailable, the scan aborts (but the SNMP system data may show up on the node page • Second, all of the sub-target IP addresses in the ipAddressTable or ipAddrTable have all the configured service detectors run against them.
  141. 141. • Third, every IP address in the ipAddressTable or ipAddrTable that supports SNMP is tested to see if it maps to a valid ifIndex in the ifTable. Each one that does is marked as a secondary SNMP interface and is a contender for becoming the primary SNMP interface.
  142. 142. • Finally, all secondary SNMP interfaces are tested to see if they match a valid package in the collectd-configuration file. If more than one valid IP address meets all three criteria (supports SNMP, has a valid ifIndex and is included in a collection package), then the lowest-numbered IP address is marked as primary. All SNMP data collection is performed via the primary SNMP interface. • When the Provisiond node scan and service detectors are completed, events are generated, including nodeGainedService events.
  143. 143. collectd-configuration.xml • Data collection is handled via the collectd process. collectd listens for NodeGainedService events for the SNMP "service". When this happens, it checks to see if the primary SNMP interface for that node exists in a collection package (which it should by definition). If so, the SNMP collector is instantiated for that IP address.
  144. 144. • Let's look at the collectd-configuration.xml file: <collectd-configuration threads="5"> <package name="example1"> <filter>IPADDR IPLIKE *.*.*.*</filter> <specific>0.0.0.0</specific> <include-range begin="192.168.0.1" end="192.168.0.254"/> <include-url>file:/opt/OpenNMS/etc/include</include-url> <service name="SNMP" interval="300000" user-defined="false“ status="on"> <parameter key="collection" value="default"/> <parameter key="port" value="161"/> <parameter key="retry" value="3"/> <parameter key="timeout" value="3000"/> </service> <outage-calendar>zzz from poll-outages.xml zzz</outage-calendar> </package> <collector service="SNMP" class- name="org.opennms.netmgt.collectd.SnmpCollector"/> </collectd-configuration>
  145. 145. • The threads attribute limits the number of threads that will be used by the data collection process. You can increase or decrease this value based upon your network and the size of your server. • Just like pollers have poller packages, collectors have collection packages. Each package determines how often the device will be polled for SNMP data, and through the collection key, what will be polled and how it will be stored. The example1 package is the default included out of the box.
  146. 146. Event Configuration How-To http://www.opennms.org/wiki/Event_Configuration_How-To
  147. 147. OpenNMS has three main functional areas: • Determining Availability of Network Services - discovery • Gathering Performance Data – collectd • Event Management and Notifications – trapd, pollerd
  148. 148. “Monitor Availability of tcp port 80 in OpenNMS” 1. capsd-configuration.xml • Detects the service 2. pollerd-configuration.xml • Monitor the service and define a downtime policy
  149. 149. capsd-configuration.xml • <protocol-plugin protocol = “HTTP-80” class-name =“org.opennms.net.mgt.plugins.TcpPlugin” scan=“on”> <property key=“port” value=“80”/> <property key=“banner” value=“*”/> <property key=“timeout” value=“1000”/> <property key=“retry” value=“1”/> </protocol-plugin> ### Older version OpenNMS
  150. 150. pollerd-configuration.xml • <package name=“http-tcp”>  <filter>!(IPADDR IPLIKE *.*.*.252-254/filter> <rrd step=“300”>  polling freq 5 min <rra>RRA:Average:0.5:1:2016 </rra>  1. <rra>RRA:Average:0.5:12:1488 </rra>  2. <rra>RRA:Average:0.5:288:366</rra>  3. <rra>RRA:Max:0.5:288:366 </rra>  4. <rra>RRA:Min:0.5:288:366 </rra>  5. </rrd> ### Older version OpenNMS By default is “example”
  151. 151. RRA:Cf:xff:steps:rows RRD Configuration http://www.opennms.org/wiki/Data_Collection_Configuration_How-To#RRD_Configuration Cf: consolidation function = AVERAGE, MAX, MIN, or LAST. xff: This factor determines how many of the samples can be UNKNOWN for the consolidated sample is considered UNKNOWN. By default this is set to 0.5 or 50%. steps: if the step size is 300 seconds (5 minutes) and the number of steps is 12, then the RRA is 12 x 5 minutes = 60 minutes = 1 hour long, and it will stored the consolidated value for that hour. rows: The rows field determine the number of values that will be stored in the RRA.
  152. 152. RRD Configuration • RRA:AVERAGE:0.5:1:8928 – AVERAGE value collected over 1 step and store up to 8928. – If, for any step, more than 50% of the values are UNKNOWN, then the average value will be UNKNOWN. – Default step size is 300 seconds (5 min, same with default poller and collectd configuration), So there will be one value per step (AVERAGE=MAX=MIN=LAST). – 5 minutes/sample12 sample/hour288 samples/24hours. 31 Days x 288 samples = 8928 http://www.opennms.org/wiki/Data_Collection_Configuration_How-To#RRD_Configuration

×