Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[Free OpManager training] Part 4- Network fault-management & IT automation


Published on

Learn how to detect, troubleshoot network faults with OpManager. Automate fault management process and achieve a problem-free network

Published in: Software
  • Be the first to comment

  • Be the first to like this

[Free OpManager training] Part 4- Network fault-management & IT automation

  1. 1. Week 4 Effective fault management and IT automation
  2. 2. 1. How to identify the faults quickly? 2. How to prioritize the problems?
  3. 3. All services are currently UP 1. How to identify the faults quickly? 2. How to prioritize the problems? 3. How do you get it resolved quickly?
  4. 4. Agenda • Alarm severity levels • Threshold violation alarms • Other alarms : VMWare; Event logs; SNMP traps and Syslogs • Notifications • Using an IT workflow to remediate problems • Tips and tricks • Questions
  5. 5. Alarm severity levels
  6. 6. Severity Color code Attention Trouble Critical Service down Clear
  7. 7. Device down Interface down Severity: predefined
  8. 8. Process down Service down URL down Severity: predefined
  9. 9. Event log Syslog SNMP trap Severity: configurable
  10. 10. Threshold-based alarms
  11. 11. • Configuring threshold values on an individual device • Configuring consecutive times • Configuring rearm value to clear fault alarms • Using device templates to configure thresholds globally based on device type Threshold-based alarms
  12. 12. VMWare alarms; Event logs; SNMP traps; Syslogs
  13. 13. Alarms for inventory changes o vMotion o Host added/removed o Host or VMs connected/disconnected o VMs powered on/off o VMs orphaned o Scheduled task removed o Etc. Querying more events from the Vcenter server / ESX host VMware events
  14. 14. Event log alarms Prerequisites o Check if WMI and RPC services are enabled on the Windows servers o Default WMI ports: 135 & 445, 5000 to 6000 (TCP) • Configuring event logs for a Windows server in OpManager • Ignoring a specific event log from a Windows server • Configuring OpManager to handle event floods ( o serverparameters.conf (OpManager/conf/OpManager) o EVENTS_PER_HOUR 1000 o EVENT_FLOOD_SEVERITY Critical
  15. 15. SNMP trap alarms 5things that you should know about SNMP traps in OpManager 1. Unsolicited traps 2. Varbinds 3. Failure component 4. Loading traps from MIB files 5. Forwarding trap messages to another NMS platform OpManager Trap- Receiver Router Switch Firewall Server SNMP Agent Management Definitions Management Database Trap (162)
  16. 16. #1 Unsolicited traps I have configured a Router to forward SNMP traps to OpManager's server. However I don't get to see an alarm? How do I fix this? Things to verify :  Verify whether the Router is added to OpManager  Verify whether the 'Trap rule' is available for the respective event  Verify whether the trap event is listed under 'Unsolicited traps' Solution: Identify the event from the 'Unsolicited traps' and add a new trap rule
  17. 17. #2 Varbinds I have a Windows server added to OpManager. It triggers 100s of trap events with various messages from x.x.x.x OID. However I want to filter the trap event only if the priority is 'critical' and clear the event automatically when the priority is 'low'? How do I achieve this? Know • What is a varbind? • How to identify the varbinds from trap event? Solution: Use 'match criteria' to filter and clear the trap alarms based on 'varbinds'
  18. 18. #3 Failure component I have a Switch added to OpManager. It triggers a failure trap event for BGP down from . OID and a clear event for BGP up from . OID. This generates two different alarms in OpManager. I want the clear alarm for BGP up event merged with the original alarm as it is for the same link. How do I achieve this? Solution: Provide a common 'failure component' in both the trap rule It generates two different alarm because OpManager receives the trap from two different OIDs and each one got a separate trap rule
  19. 19. Syslog alarms Prerequisites o Configure devices to forward syslog events to OpManager's server o Default ports: 514 & 519 (UDP); configurable • Creating a syslog rule o Syslog receiver • Using facility name, severity, or match text to filter and clear syslog alarms (regex format) • Identifying the syslog flow rate from OpManager • Forwarding syslog messages to another NMS platform
  20. 20. Notifications
  21. 21. Notification cycle Profile type - Send email or SMS - Run system command - Run program - Log a ticket - Web alarm - Syslog - Trap Alarm criteria - Device down - Service down - Hardware fault - Threshold violation - Virtual device fault - UCS fault Device selection - Category - Business view - Devices Schedule - All the time - Selected time window - Delayed trigger - Recurring trigger Preview - Verify inputs - Add a profile
  22. 22. #1 Email notification Steps : 1. Configure mail server settings 2. Create a notification profile for 'email'; - Select the required 'alarm criteria'; - Associate the profile with 'required devices'; I want to receive an email notification for all service down alarms. How do I configure this?
  23. 23. #2 Log a ticket Steps : 1. Setting up the integration with ServiceDesk Plus 2. Create a notification profile for 'log a ticket'; - Select the category, group and technician; - Select the required 'alarm criteria'; - Associate the profile with 'required devices'; I want OpManager to create a ticket in ServiceDesk Plus whenever a problem is detected in the interface. The ticket should have the fields like category, group and technician filled automatically.
  24. 24. IT workflow automation
  25. 25. • Get more space on the server for better performance • Test SNMP service • Export/ Import available templates IT workflow automation Create a workflow Associate devices Schedule/trigger tasks 1 2 3
  26. 26. Tips and tricks
  27. 27. Tips and tricks • Configure device dependencies to stop polling a dependent device when its parent device is down • Suppress known alarms from an individual device • Configure the downtime scheduler and stop polling devices during maintenance windows • Configure alarm escalation and notify the super admin when a critical alarm is not cleared within a given amount of time
  28. 28. opmanager- +1 (888) 720-9500 / +1 (408) 916- 9400 Need more help?
  29. 29. Free ITOM Seminar
  30. 30. THANK YOU