Role of OpManager in event and fault management

2,585 views
2,310 views

Published on

Managing Event and Fault are not new to any IT managers. However if not implemented properly, this could be the most daunting of network monitoring and network management tasks.

Check out this presentation, to understand

# The basics of Event and Fault Management &
# How ManageEngine OpManager helps in effective Fault Management

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,585
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
199
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Is Fault Management all about detecting the events?
  • Detect events, Isolate faults, Inform or notify admins and Resolve or aid faster resolution
  • Other e.g. of Active polling are monitoring through SNMP, WMI, Telnet, SSH, Custom scripts, Remote query & more…
  • Other e.g. of Passive monitoring are SNMP TRAPS, Syslog, NetFlow, Packet forwarding & more …
  • ManageEngine OpManager is comprehensive, easy-to-use network monitoring & management software. For free trial visit - www.opmanager.com / For product demos - mail us at [email_address] / Call at +1 888 720 9500
  • ManageEngine is the only IT Management vendor focused on bringing a complete IT Management portfolio to the mid-sized enterprise. Trusted by over 45,000 customers including 3 out of every 5 fortune 500 companies. More at www.manageengine.com
  • opmanager@ manageengine .com The network monitoring and network management software from ManageEngine www.manageengine.com/network-monitoring/
  • Role of OpManager in event and fault management

    1. 1. The Role of OpManager in Event and Fault Management Team OpManager www.opmanager.com
    2. 2. Agenda <ul><li>Brushing up Fault management </li></ul><ul><ul><li>Reactive Vs. Pro-active </li></ul></ul><ul><li>The four processes and OpManager’s role </li></ul><ul><ul><li>Detect </li></ul></ul><ul><ul><li>Isolate </li></ul></ul><ul><ul><li>Inform </li></ul></ul><ul><ul><li>Resolve </li></ul></ul>
    3. 3. Reactive Fault Management <ul><li>Firefighting in nature </li></ul><ul><li>Troubleshooting starts after business is impacted </li></ul><ul><li>Higher resolution time </li></ul><ul><li>Least preferred by both IT admins & End users </li></ul>User IT Admin It is not working!
    4. 4. Proactive Fault Management <ul><li>Alerts on an impending fault </li></ul><ul><li>Resolution time reduced drastically </li></ul><ul><li>Reduced operation cost </li></ul>NMS has reported a problem & I’m working on it User IT Admin
    5. 5. What is Fault and Event Management? <ul><li>Detecting events </li></ul><ul><li>Make sense of them </li></ul><ul><li>Present only actionable events </li></ul>*An event can be informational, a cleared event, warning, trouble or even a critical problem
    6. 6. The four processes
    7. 7. The four processes explained <ul><li>Active Monitoring </li></ul><ul><li>Passive Monitoring </li></ul><ul><li>De-duplication </li></ul><ul><li>Correlation </li></ul><ul><li>Automation </li></ul><ul><li>Visual representation </li></ul><ul><li>Ticketing </li></ul><ul><li>Alerting </li></ul><ul><li>Automatic correction </li></ul><ul><li>Troubleshooting tools </li></ul>
    8. 8. Detect – Capture events <ul><li>Active Polling/ Probing/ Query monitoring </li></ul>Active Monitoring: e.g. SNMP Polling Other e.g. of Active polling are monitoring through SNMP, WMI, Telnet, SSH, Custom scripts, Remote query & more…
    9. 9. Detect – Capture events <ul><li>Passive or Event-based Monitoring </li></ul>Passive Monitoring e.g. SNMP TRAP Other e.g. of Passive monitoring are SNMP TRAPS, Syslog, NetFlow, Packet forwarding & more …
    10. 10. Isolate – Present actionable faults <ul><li>Helps identify the root cause of the problem quickly; reduces Mean-Time-To-Resolve (MTTR) </li></ul><ul><li>Includes tasks to </li></ul><ul><ul><li>Understand event source </li></ul></ul><ul><ul><li>Filters-out redundant or known events </li></ul></ul><ul><ul><li>Projects only actionable faults </li></ul></ul><ul><li>*Network Management System’s Fault management engine plays a vital role </li></ul>
    11. 11. <ul><li>De-duplication </li></ul><ul><li>Drops recurrent events from displaying </li></ul><ul><li>Build them as event history </li></ul>Isolate – Present actionable faults
    12. 12. <ul><li>De-duplication </li></ul><ul><li>OpManager Alarms view – Showing unique alerts for every device and type of alarms </li></ul><ul><li>Detailed alarm history page with list of alarm actions </li></ul>Isolate – Present actionable faults
    13. 13. <ul><li>Correlation </li></ul><ul><li>Relates previous events and interdependency </li></ul><ul><li>Projects only the root cause of the problem </li></ul>Isolate – Present actionable faults
    14. 14. <ul><li>Correlation </li></ul><ul><li>OpManager has automated and custom network maps that lets you identify the root cause much quickly. </li></ul><ul><li>Lets you configure device dependencies to project only the root of the problem </li></ul>Isolate – Present actionable faults
    15. 15. <ul><li>Automation </li></ul><ul><li>Ignore incidental events </li></ul><ul><li>Remove cleared faults </li></ul><ul><li>Suppress known alarms (Automated/ Manual Suppression) </li></ul>Isolate – Present actionable faults
    16. 16. <ul><li>Automation </li></ul><ul><li>Threshold configuration – Consecutive Times and Rearm Value </li></ul><ul><li>Suppress known alarms – Downtime Scheduler </li></ul>Isolate – Present actionable faults
    17. 17. <ul><li>Automation </li></ul><ul><li>Suppress known alarms - Manual suppression for devices and interfaces </li></ul>Isolate – Present actionable faults
    18. 18. <ul><li>Visual representation of faults to facilitate NOC admins </li></ul><ul><li>Ticketing and Alert remote admins </li></ul>Inform – Notify admins
    19. 19. Inform – Notify admins <ul><li>Alarms color coding </li></ul><ul><li>Web Alarms and Dashboards </li></ul><ul><li>Dynamic network or custom maps showing the network and device status </li></ul>
    20. 20. Inform – Notify admins <ul><li>Trouble ticketing </li></ul><ul><li>Through Email for other helpdesk products </li></ul><ul><li>Automatic ticket creation with ManageEngine ServiceDesk plus, through integration </li></ul>
    21. 21. Inform – Notify admins <ul><li>Alert remote admins – Email, SMS, RSS feeds, Twitter Alerts, iPhone/ Smartphone GUI </li></ul>Email RSS Twitter DM Smart Phone UI SMS
    22. 22. Resolve – Aid faster resolution <ul><li>Needs proprietary knowledge of your IT infrastructure, policies & agreed SLAs. </li></ul><ul><li>NMS should help </li></ul><ul><ul><li>Execute such automation logics (Communicate execution faults, if any) </li></ul></ul><ul><ul><li>Back manual troubleshooting with set of IT tools </li></ul></ul>
    23. 23. Resolve – Aid faster resolution <ul><li>Automated Fault resolution </li></ul><ul><li>Run a command or Run a program on a remote machine with options to append error messages </li></ul><ul><li>Restart Windows service or the server, if the service is found to be down </li></ul>
    24. 24. Resolve – Aid faster resolution <ul><li>Server Troubleshooting Tools </li></ul><ul><li>Remote Process Diagnostics </li></ul><ul><li>Device Tools: Ping, Trace route, Tools to remotely connect to the server – Web console, Telnet/ SSH, MS terminal server </li></ul>
    25. 25. Resolve – Aid faster resolution <ul><li>Network Troubleshooting Tools </li></ul><ul><li>Switch Port Mapper </li></ul><ul><li>Network Traffic Analysis </li></ul><ul><li>Switch port disabling option </li></ul>
    26. 26. Resolve – Aid faster resolution <ul><li>Network Troubleshooting Tools </li></ul><ul><li>WAN link hop-wise latency count graph </li></ul><ul><li>Network Change and Configuration Management (NCCM) </li></ul>
    27. 27. Resolve – Aid faster resolution <ul><li>Other Troubleshooting Tools </li></ul><ul><li>Real-time performance graphs </li></ul><ul><li>MIB Browser and Syslog viewer </li></ul>
    28. 28. Tons of features that we’ve not talked about <ul><li>Automatic network discovery </li></ul><ul><li>Device and Interface monitoring templates </li></ul><ul><li>Network Maps/ Custom Maps </li></ul><ul><li>WAN RTT and VoIP Monitoring </li></ul><ul><li>Network Traffic Analysis </li></ul><ul><li>Network Change and Configuration Mgmt. </li></ul><ul><li>Server Monitoring (Windows/ Linux/ UNIX flavor OSes) </li></ul><ul><li>ESX VMware Monitoring </li></ul><ul><li>MS Exchange, SQL and Active Directory Monitoring </li></ul><ul><li>Service Monitoring, Website monitoring, Process and File/ Folder monitoring </li></ul><ul><li>Processing SNMP TRAPs, Syslogs & Event Log </li></ul><ul><li>Monitors any pingable and SNMP enabled device </li></ul>ManageEngine OpManager is comprehensive, easy-to-use network monitoring & management software. For free trial visit - www.opmanager.com For product demos - Mail us at [email_address] Call at +1 888 720 9500
    29. 29. About ManageEngine ManageEngine is the only IT Management vendor focused on bringing a complete IT Management portfolio to the mid-sized enterprise. Trusted by over 45,000 customers including 3 out of every 5 fortune 500 companies. More at www.manageengine.com
    30. 30. Summary Fault and Event Management Proactive and Reactive approach Four processes of Fault Management : Detect: Active and Passive Monitoring Isolate: De-duplication, Correlation, Automation Inform: Visual fault representation, ticketing and alerting Resolve: Automated Scripts and Tools to aid manual troubleshooting In each process OpManager’s role in Fault and Event management About ManageEngine and its various IT management products
    31. 31. Questions ? Thank you opmanager@ manageengine .com

    ×