Lessons learned from an HP Network Automation and Network Node Manager i integrated deployment with TelAlert notification in an MPLS environment
Upcoming SlideShare
Loading in...5
×
 

Lessons learned from an HP Network Automation and Network Node Manager i integrated deployment with TelAlert notification in an MPLS environment

on

  • 4,903 views

This case study demonstrates how the integration of HP Network Automation, Network Node Manager i (NNMi), and TelAlert Urgent Messaging System reduced costs, improved configuration standards, and ...

This case study demonstrates how the integration of HP Network Automation, Network Node Manager i (NNMi), and TelAlert Urgent Messaging System reduced costs, improved configuration standards, and helped an energy company through a major acquisition. The implementation team will discuss the benefits of migrating to NNMi, particularly configuration ease. They will also give configuration tips on obtaining full map functionality in an MPLS environment. They’ll report on improved standardization and dramatically reduced MTTR with existing personnel achieved by deploying Network Automation to a network spread across 125 sites including such diverse elements as radio transmission towers and SCADA devices. And they’ll focus particularly on maximizing the shared nodes between NA and NNMi. To close, the team will illustrate the benefits and process of integrating TelAlert Urgent Messaging System to deliver paging notification of essential root cause incidents to both the core network management team and the responsible technical team at affected sites.

Statistics

Views

Total Views
4,903
Views on SlideShare
4,903
Embed Views
0

Actions

Likes
0
Downloads
113
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Lessons learned from an HP Network Automation and Network Node Manager i integrated deployment with TelAlert notification in an MPLS environment Lessons learned from an HP Network Automation and Network Node Manager i integrated deployment with TelAlert notification in an MPLS environment Presentation Transcript

  • Lessons learned from an HP Network Automation and Network Node Manager I integrated deployment with TelAlert notification in an MPLS environment Bill P. Fanelli, Principal Architect Allen Corporation of America ©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
  • Allen Corporation of America, Inc. • Headquarters: Fairfax, VA • Organization — Training Systems Division — Integrated Technologies Division — CyberSecurity Division — Logistics Services Division • Regional Offices: Colonial Heights, VA; Ithaca, NY; Myrtle Beach, SC; The Hague, Netherlands • Sites in 22 States, with Worldwide Operations • 250+ employees • Private Corporation - Small business • Secret Facilities Clearance 2
  • Cyber Security Division Cyber Security, Enterprise Management Services Complete Life-Cycle Support Security Management Enterprise Notification Solutions 3
  • Agenda • Integrating NA with NNMi – Benefits of integration – Implementation Tips • Monitoring MPLS with NNMi – Issues with virtual networks – How to best match the map to your environment • Stabilizing staffing using Notification with TelAlert – Taming the workload with automation 4
  • Case Study • Large Energy company – Diverse network – includes radio transmission towers and SCADA devices – Growth by acquisition – reserves grew by a factor of 50 over 15 years • Issues in IT – Assimilation of acquired infrastructure • NNMi & NA – MTTR for field outages was 2 ½ - 3 days • NA – Network staff could not grow linearly with company • Reserves doubled every four months • NNMi on MPLS • TelAlert 5
  • NA and NNMi Selection Drivers • See what is running • Assimilate acquired infrastructure – Technology • Cisco – Process • Standardize configurations with NA • Centralize monitoring with NNMi – People • Automated notification from NNMi to TelAlert 6
  • Let’s Get Started • Integrating NA with NNMi • Monitoring MPLS with NNMi • Stabilizing staffing using Notification with TelAlert 7
  • Benefits of Integrated NA/NNMi Process • High percentage of outages due to changes – Coordinate changes – Ability to roll back changes, both authorized and unauthorized • Standardize and Automate – SNMP community string change • Add new string • Confirm all nodes are configured and working • Remove old string • Expedite Field Replacements – Drop ship replacement devices to field location – Push configuration over the wire 8
  • Features of NA/NNMi Integration • GUI integration – Cross launch with context – Telnet or SSH access to devices – Bring NA diagnostics to NNMi • Data integration – Import NNMi devices into NA – Secret Ingredient • NA must have NNMi Node UUID to make the match 9
  • Linking NA with NNMi • Run the Connector installer on the NA machine – Connects to NNMi and installs components there as well • Dependence on whether NA and NNMi are co-resident – Some default ports are the same • Install NNMi first, then NA installer will accommodate – Separate Connector installers as well • Learn from us – Initially co-resident and then moved NA – Many extra steps involved • Not worth a ―try and see‖ approach – Think your way through impact of co-residency • NNMi has huge memory requirement 10
  • Import NNMi Devices to NA • On NNMi, run nnmimport • Queries NA for a list of supported OIDs • Dumps nodes from NNMi database matching supported OIDs only • Pushes node information – particularly the NNMi Node UUID – to NA • Wanted All Devices from NNMi to NA – Even Unsupported 11
  • Adding Devices from NNMi to NA • On the NA server, add the OIDs to {NA_DIR}/jre/adjustable_options.rcx • Format <array name="drivers/custom_sysoids"> <value> <!-- sys oid --> </value> <value> <!-- another sys oid --> </value> <value> <!-- etc. --> </value> </array> • For example <array name="drivers/custom_sysoids"> <value>1.3.6.1.4.1.9.1.479</value> </array> • Save and restart NAS
  • Finding Supported OIDs in NA • telnet or ssh to NA box • Login as an NA User • Run the command list sys oids all • All OIDs supported by NA will be listed 13
  • Finding OIDs in Use in NNMi • On the NNMi server, run the command nnmtopodump.ovpl -legacy long -type node pipe this to find "SNMP OBJECT ID: " or grep "SNMP OBJECT ID:" and redirect to a file, such as OIDs_in_use.out • nnmtopodump.ovpl -legacy long -type node | find "SNMP OBJECT ID:" > OIDs_in_use.out 14
  • Determine OIDs to Add to NA • Sort, cut and compare these two lists • Generate a list of OIDs – from the NNMi ―OIDs in use‖ list – that are not in the NA ―supported OIDs‖ list • Add these to the adjustable_options.rcx file • The next time nnmimport is run on the NNM box – NA will respond that the added OIDs are supported – therefore nnmimport will include them in the push to NA • Warning – nnmimport has the tendency to create duplicate entries in NA – This is not due to modifying adjustable_options.rcx – Use nnmimport carefully until you understand the impact on NA in your environment 15
  • Restart NAS You Say… 16
  • Where Are We • Integrating NA with NNMi • Monitoring MPLS with NNMi • Stabilizing staffing using Notification with TelAlert 17
  • Monitoring MPLS with NNMi • Discovery across virtual boundaries is inherently difficult – Contiguous map – Downstream suppression 18
  • Contiguous Map • NNMi has Subnet Connection Rules • NNMi can create Layer 2 Connections for subnets at the edge of subnetworks that are directly connected via Wide Area Networks (WANs). • Define rules to control which subnets and interfaces NNMi uses to create additional Layer 2 connections. 19
  • Small Subnets Rule • All rules are on by default 20
  • Discovery Islands 21
  • Discovery Islands • Good – not perfect • Remember that we do not manage large networks by Maps – Manage by events • Topology that NNMi knows about that is represented by these maps is most important • Status representation on maps is also important – Maintain user confidence • Issue with map status display with MPLS connected sites – Downstream suppression rule prevents nodes and containers from representing MPLS outage 22
  • Downstream Suppression: The Situation • NNMi analyses the Layer 2 information and determines when a set of nodes are not connected at layer 2 as far as it can discover. • This applies to MPLS connected sites • NNMi puts these nodes into NNMi defined node groups named Island nnnn, where nnnn is a unique number for each set of layer 2 connected nodes that are not connected to the NNMi server. • When an island is isolated by an MPLS failure, all the nodes in the island are put into a warning or unknown state. 23
  • Downstream Suppression—The Fix • If a node is added to the Important Nodes node group and it goes down or becomes isolated, it will be set to critical status. This overrides the island logic which sets it to warning or unknown. • Added filter rules to the Important Nodes node group on NNMi server as follows: – Device Filters • Device = Gateway or Router – Additional Filters • Island = not null • Automatically populates the Important Nodes node group with the routers in the islands 24
  • Downstream Suppression—Outcome • When MPLS site is isolated – All routers go critical • Could be further filtered • NNMi does produce a Critical Event – Without adding nodes to Important Nodes Node Group, the node and containers do not reflect outage 25
  • Home Stretch • Integrating NA with NNMi • Monitoring MPLS with NNMi • Stabilizing staffing using Notification with TelAlert 26
  • The Case for Notification • Text or Text-to-Speech messaging has lower barrier to entry since almost everyone now carries a cell phone • Normal Hours – Get someone’s attention at their desk or away from it • Off Hours – Staffing for 7 x 24 monitoring is cost prohibitive for most organizations • Rule of 13/8 – Need for 7 x 24 monitoring is growing as companies become more network dependent 27
  • Desired Workflow • Immediate Notification – Core network team only – SNMP IFdown Trap • Root Cause Event – District and Site where event occurred – Could be: • Node Down • Remote site containing node is unreachable • Node or Connection Down • Interface Down – Typically delayed three minutes • Reminder on open incidents – Core network team after one hour 28
  • NNMi Actions • Trigger on Lifecycle States – Registered, In Progress, Completed, and Closed – Typically use Registered and Closed • Large number of parameters for configuring incident actions plus Custom Incident Attributes – By pairing Lifecycle States, Message ID stays the same – Node Down Registered is cleared by Node Down Closed • Instead of separate Node Up event • Effect in TelAlert – When Registered telalertc –g NetCore –m Node $sourceNodeName Down –ticket $id – When Closed telalertc –ack –ticket $id 29
  • Implemented Workflow • Immediate Notification – When SNMP Trap Incident enters Registered State – Send message now to core network group – telalertc -g NetCore -subject "$severity fault on $sourceNodeName― -m "Fault: $name on $sourceObjectName on node $sourceNodeName at $lastOccurrenceTime― • Notify Site and District – When Root Cause Incident enters Registered State – Send final message to core network, site and district groups – telalertc -g NetAll -ticket $id -delay 3m -subject "$severity fault on $sourceNodeName―-m "Fault: $name on $sourceObjectName on node $sourceNodeName at $lastOccurrenceTime― 30
  • Implemented Workflow • Reminder on open incidents – When Root Cause Incident enters Registered State – telalertc -g NetCore -delay 60m -ticket $id -subject "Reminder message on $sourceNodeName― -m "Reminder message on $sourceNodeName― • Recovery – When ―Down‖ Incident enters Closed State – telalertc -ack -ticket $id 31
  • Typical Scenario • Router loses power • SNMP IFdown Trap from upstream router – NNMi sends message to NetCore group for immediate delivery • Causal engine posts Interface Down Root Cause Incident – NNMi sends message to NetAll group with three minute delay – NNMi sends reminder to NetCore group with one hour delay • Causal engine posts Node or Connection Down Incident – Interface Down Incident is closed • NNMi sends –ack to clear Interface Down message and reminder – NNMi sends message to NetAll group with three minute delay – NNMi sends reminder to NetCore group with one hour delay 32
  • Typical Scenario • Causal engine posts Node Down Incident – Interface Down Incident is closed – NNMi sends –ack to clear Node or Connection Down message and reminder – NNMi sends message to NetAll group with three minute delay – NNMi sends reminder to NetCore group with one hour delay • Three minute delay timer expires – Node Down message delivered to all groups • One hour delay timer expires – Reminder message delivered to NetCore group 33
  • Conclusion • Integrating NA with NNMi – Consistency of configurations – Same nodes in both tools • Monitoring MPLS with NNMi – Monitor by Incidents – Map status should reflect real world status • Stabilizing staffing using Notification with TelAlert – Demands on staff are growing faster than the staff headcount – Automation is the key to survival 34
  • Allen Corporation Allen Corporation of America, Inc. 10400 Eaton Place, Suite 450 Fairfax, VA 22030 (866) HQ - ALLEN Bill Fanelli (866) 472-5536 bfanelli@allencorp.com 571.321.1648 Voice www.allencorp.com 35
  • Questions or Comments? ******* Thank you for your time
  • To learn more on this topic, and to connect with your peers after the conference, visit the HP Software Solutions Community: www.hp.com/go/swcommunity 37 ©2010 Hewlett-Packard Development Company, L.P.