IP Network Management Richard Mortier
Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP ...
Overview <ul><li>Introduction </li></ul><ul><ul><li>What’s it all about then? </li></ul></ul><ul><li>Abstractions </li></u...
What is  network management ? <ul><li>One point-of-view: a large field full of acronyms </li></ul><ul><ul><li>EMS, TMN, NE...
What is  network management ? <ul><li>Computer networks are considered to have three operating timescales </li></ul><ul><u...
Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><ul><li>ISO FCAPS, TMN EMS, ATM </li></ul></ul>...
ISO FCAPS:  functional  separation <ul><li>F ault </li></ul><ul><ul><li>Recognize, isolate, correct, log faults </li></ul>...
TMN EMS:  administrative  separation <ul><li>Telecommunications Management Network </li></ul><ul><li>Element Management Sy...
The B-ISDN reference model <ul><li>Asynchronous Transfer Mode “cube” </li></ul><ul><ul><li>See IAP lectures, maybe   </li...
Network management <ul><li>Models of general communication networks </li></ul><ul><ul><li>Tend to be quite abstract and  e...
Network management <ul><li>We’ll concentrate on IP networks </li></ul><ul><ul><li>Still acronym city: ICMP, SNMP, MIB, RFC...
Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><ul><li...
IP primer (you probably know all this) <ul><li>Destination-routed packets – no connections </li></ul><ul><ul><li>Time-to-l...
So, how do you build an IP network? <ul><li>Buy (lease) routers </li></ul><ul><li>Buy (lease) fibre </li></ul><ul><li>Conn...
Multiple router flavours <ul><li>Core </li></ul><ul><ul><li>OC-12 (622Mbps) and up (to OC-768 ~= 40Gbps) </li></ul></ul><u...
Network design <ul><li>Whose network? </li></ul><ul><ul><li>ISPs, IXs, enterprise, campus </li></ul></ul><ul><ul><li>POPs,...
Router configuration <ul><li>Initialization </li></ul><ul><ul><li>Name the router, setup boot options, setup authenticatio...
Router configuration fragments hostname FOOBAR ! boot system flash slot0:a-boot-image.bin boot system flash bootflash: log...
<ul><li>Lots of large, fragile text files  </li></ul><ul><ul><li>00s/000s routers, 00s/000s lines per config </li></ul></u...
Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP ...
ICMP <ul><li>Internet Control Message Protocol [RFC792] </li></ul><ul><ul><li>IP protocol #1 </li></ul></ul><ul><ul><li>In...
Ping (Packet INternet Groper) <ul><li>Test for liveness </li></ul><ul><ul><li>… also used to measure (round-trip) latency ...
Traceroute <ul><li>Which route do my packets take to their destination? </li></ul><ul><ul><li>Send UDP packets with increa...
SNMP <ul><li>Protocol to manage information tables at devices </li></ul><ul><li>Provides get, set, trap, notify operations...
IPFIX <ul><li>IETF working group </li></ul><ul><ul><li>Export of flow based data out of IP network devices </li></ul></ul>...
Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP ...
An hypothetical NMS <ul><li>GUI around ICMP (ping, traceroute), SNMP, etc </li></ul><ul><ul><li>Recursive host discovery <...
NOC, NOC.  Calling AT&T…
What are they all looking at? http://www.stat.ee.ethz.ch/mrtg/
An hypothetical NMS <ul><li>All very straightforward?  No, not really </li></ul><ul><ul><li>A lot of software engineering:...
Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP ...
Anemone <ul><li>An endsystem network management platform </li></ul><ul><ul><li>Collect  flow information  from endsystems,...
Applications <ul><li>Real-time and historical analysis </li></ul><ul><ul><li>Current topology + ingress, egress flows give...
Challenges <ul><li>How do you instrument an entire network?  Do you need to instrument all endsystems? </li></ul><ul><li>W...
Summary <ul><li>Introduction </li></ul><ul><ul><li>What is network management?  </li></ul></ul><ul><li>Abstractions </li><...
The end <ul><li>Questions </li></ul><ul><li>Answers? </li></ul><ul><li>http:// www.cisco.com / </li></ul><ul><li>http://ww...
Backup slides <ul><li>Internet routeing </li></ul><ul><li>OSPF </li></ul><ul><li>BGP </li></ul>
Internet routeing <ul><li>Q: how to get a packet from node to destination? </li></ul><ul><li>A1: advertise all reachable d...
OSPF (~link state routeing) <ul><li>Q: how to route given packet from any node to destination? </li></ul><ul><li>A: learn ...
BGP (~path vector routeing) <ul><li>Q: how to route given packet from any node to destination? </li></ul><ul><li>A: neighb...
 
Anemone: where are we now? <ul><li>Three major components </li></ul><ul><ul><li>Flow collection </li></ul></ul><ul><ul><li...
Data collection <ul><li>Flow collection </li></ul><ul><ul><li>Hosts track active flows  </li></ul></ul><ul><ul><ul><li>Usi...
The distributed database <ul><li>Logically contains  </li></ul><ul><ul><li>Traffic flow matrix (bandwidths),  {srcs}   ×  ...
The distributed database <ul><li>Construct traffic matrix from flow monitoring </li></ul><ul><ul><li>Hosts can supply flow...
The distributed database <ul><li>Building simulation model </li></ul><ul><ul><li>OSPF data gives topology, event list, rou...
Upcoming SlideShare
Loading in …5
×

Guest Lecture by Dr Richard Mortier, MSR

1,316 views
1,223 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,316
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Two types: link-state (OSPF), distance/path-vector (BGP) IP options Loose source routing, strict source routing, record route Can be treated separately, ignored, dropped
  • 1. Core router        - generally consist of interface speeds of OC12 upwards.        - move packets from A to B with minimal additional latency          and minimal jitter        - should be capable of implementing ACLs with no          performance degredation but primary role is to push packets        - just about mandatory these days that can handle interfaces          pushing maximum packets/sec with minimum packet size so          as to be able to withstand DDoS attacks - either at it, or          through it 2. Transit or peering-facing router        - interface speeds of &gt;OC3, probably decent GbE density          desirable        - mandatory implementation of ACLs        - mandatory full-feature BGP features &amp; widgets        - mandatory implementation of uRPF or similar        - ideally be capable of traffic &apos;accounting&apos; mechanisms          (e.g. packet-sampling, netflow etc) 3. customer-facing router (FR/ATM/..)        - decent system-density for customer connections        - GbE uplink interface(s)        - mandatory implementation of ACLs        - mandatory full-feature BGP features &amp; widgets        - mandatory implementation of uRPF or similar        - ideally be capable of traffic &apos;accounting&apos; mechanisms          (e.g. packet-sampling, netflow, anomoly detection etc)        - ideally be able to implement &apos;better&apos; queueing mechanisms          than just standard FIFO.  e.g. low-latency queueing for          voice traffic, fair-queueing for fairness, deep(er) buffers          to attempt to minimize packet drop due to speed-mismatch 4. broadband aggregation router (e.g. LNS)        - handle large numbers of logical sessions from central          configuration/policy (e.g. tie into RADIUS server(s))        - GbE uplink interface(s)        - mandatory implementation of ACLs        - mandatory implementation of uRPF or similar        - ideally be capable of traffic &apos;accounting&apos; mechanisms          (e.g. packet-sampling, netflow, anomoly detection etc)        - ideally be able to implement &apos;better&apos; queueing mechanisms          than just standard FIFO.  e.g. low-latency queueing for          voice traffic, fair-queueing for fairness, deep(er) buffers          to attempt to minimize packet drop due to speed-mismatch        - sufficient control-plane CPU to handle large # of connection          establishments/sec (e.g. connection to LAC being lost) 5. customer-premises router (CPE)        - generally low-speed (&lt;30Mbps)        - end-users love ones with built-in NAT, DHCP, firewall,          wireless, probably VoIP, ...        - low-cost        - minimal CPU - no need to handle DoS attack because WAN          bandwidth is exhausted before PPS limit of CPU is hit going through these, i&apos;d say &amp;quot;ASIC based&amp;quot; or multiple-distributed-CPU is  what you want for (1).  anything less than that means you&apos;re likely to have reduced customer satisfaction. (2), (3) &amp; (4) generally are a mix of s/w and h/w-based routers - architectures vary quite greatly but with silicon developments in the last few years, most semi-recent products are typically a combination of h/w and s/w with (ideally) the work split 90/10.  or 99/1.  or 100/0 in an ideal world. (5) can stay software.
  • Layer violation Control protocol encapsulated by controlled protocol...
  • Our solution
  • What else does it do, as well as solve The Problem?
  • imperative; Complex, hard to control, independent of (eg) SNMP and other management tools
  • imperative; Complex, hard to control, independent of (eg) SNMP and other management tools
  • Netflow: sample flows in middle of net (partial info about any flow); enma, sample flows at edges (complete info about some flows)
  • Guest Lecture by Dr Richard Mortier, MSR

    1. 1. IP Network Management Richard Mortier
    2. 2. Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP network management protocols </li></ul><ul><li>Pulling it all together </li></ul><ul><li>An alternative approach </li></ul>
    3. 3. Overview <ul><li>Introduction </li></ul><ul><ul><li>What’s it all about then? </li></ul></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP network management protocols </li></ul><ul><li>Pulling it all together </li></ul><ul><li>An alternative approach </li></ul>
    4. 4. What is network management ? <ul><li>One point-of-view: a large field full of acronyms </li></ul><ul><ul><li>EMS, TMN, NE, CMIP, CMISE, OSS, AN.1, TL1, EML, FCAPS, ITU, ... </li></ul></ul><ul><ul><li>(Don’t ask me what all of those mean, I don’t care!) </li></ul></ul><ul><li>From question.com: </li></ul><ul><ul><li>In 1989, a random of the journalistic persuasion asked hacker Paul Boutin “What do you think will be the biggest problem in computing in the 90s?” Paul’s straight-faced response: “There are only 17,000 three-letter acronyms.” </li></ul></ul><ul><li>We will ignore most of them  </li></ul>
    5. 5. What is network management ? <ul><li>Computer networks are considered to have three operating timescales </li></ul><ul><ul><li>Data : packet forwarding [μs, ms] </li></ul></ul><ul><ul><li>Control : flows/connections [ secs, mins ] </li></ul></ul><ul><ul><li>Management : aggregates, networks [ hours, days ] </li></ul></ul><ul><li>… so we’re concerned with “the network” rather than particular devices or protocols </li></ul><ul><li>Standardization is key! </li></ul>
    6. 6. Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><ul><li>ISO FCAPS, TMN EMS, ATM </li></ul></ul><ul><li>IP network components </li></ul><ul><li>IP network management protocols </li></ul><ul><li>Pulling it all together </li></ul><ul><li>An alternative approach </li></ul>
    7. 7. ISO FCAPS: functional separation <ul><li>F ault </li></ul><ul><ul><li>Recognize, isolate, correct, log faults </li></ul></ul><ul><li>C onfiguration </li></ul><ul><ul><li>Collect, store, track configurations </li></ul></ul><ul><li>A ccounting </li></ul><ul><ul><li>Collect statistics, bill users, enforce quotas </li></ul></ul><ul><li>P erformance </li></ul><ul><ul><li>Monitor trends, set thresholds, trigger alarms </li></ul></ul><ul><li>S ecurity </li></ul><ul><ul><li>Identify, secure, manage risks </li></ul></ul>
    8. 8. TMN EMS: administrative separation <ul><li>Telecommunications Management Network </li></ul><ul><li>Element Management System </li></ul><ul><li>“ ...simple but elegant...” (!) </li></ul><ul><ul><li>(my emphasis) </li></ul></ul><ul><ul><li>(often the two go together) </li></ul></ul><ul><li>NEL: network elements (switches, transmission systems) </li></ul><ul><li>EML: element management (devices, links) </li></ul><ul><li>NML: network management (capacity, congestion) </li></ul><ul><li>SML: service management (SLAs, time-to-market) </li></ul><ul><li>BML: business management (RoI, market share, blah) </li></ul>
    9. 9. The B-ISDN reference model <ul><li>Asynchronous Transfer Mode “cube” </li></ul><ul><ul><li>See IAP lectures, maybe  </li></ul></ul><ul><li>Plane management… </li></ul><ul><ul><li>The whole network </li></ul></ul><ul><li>… vs layer management </li></ul><ul><ul><li>Specific layers </li></ul></ul><ul><li>Topology </li></ul><ul><li>Configuration </li></ul><ul><li>Fault </li></ul><ul><li>Operations </li></ul><ul><li>Accounting </li></ul><ul><li>Performance </li></ul>management plane user plane control plane higher layers ATM layer physical layer plane management layer management ATM adaptation layer higher layers
    10. 10. Network management <ul><li>Models of general communication networks </li></ul><ul><ul><li>Tend to be quite abstract and exceedingly tedious! </li></ul></ul><ul><ul><li>Many practitioners still seem excited about OO programming, WIMP interfaces, etc </li></ul></ul><ul><ul><li>… probably because implementation is hard due to so many excessively long and complex standards! </li></ul></ul><ul><li>My view: basic “need-to-know” requirements are </li></ul><ul><ul><li>What should be happening? [ c ] </li></ul></ul><ul><ul><li>What is happening? [ f, p, a ] </li></ul></ul><ul><ul><li>What shouldn’t be happening? [ f, s ] </li></ul></ul><ul><ul><li>What will be happening? [ p, a ] </li></ul></ul>
    11. 11. Network management <ul><li>We’ll concentrate on IP networks </li></ul><ul><ul><li>Still acronym city: ICMP, SNMP, MIB, RFC  </li></ul></ul><ul><ul><li>Sample size: 10 2 routers, 10 5 hosts </li></ul></ul><ul><li>We’ll concentrate on the network core </li></ul><ul><ul><li>Routers, not hosts </li></ul></ul><ul><li>We’ll ignore “service management” </li></ul><ul><ul><li>DNS, AD, file stores, etc </li></ul></ul>
    12. 12. Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><ul><li>IP, networks, routers </li></ul></ul><ul><li>IP network management protocols </li></ul><ul><li>Pulling it all together </li></ul><ul><li>An alternative approach </li></ul>
    13. 13. IP primer (you probably know all this) <ul><li>Destination-routed packets – no connections </li></ul><ul><ul><li>Time-to-live field: allow removal of looping packets </li></ul></ul><ul><li>Routers forward packets based on routeing tables </li></ul><ul><ul><li>Tables populated by routeing protocols </li></ul></ul><ul><li>Routers and protocols operate independently </li></ul><ul><ul><li>… although protocols aim to build consistent state </li></ul></ul><ul><li>RFCs ~= standards </li></ul><ul><ul><li>Often much looser semantics than e.g. ISO, ITU standards </li></ul></ul><ul><ul><li>Compare for example OSPF [RFC2327] and IS-IS [RFC1142, RFC1195], two link-state routeing protocols </li></ul></ul>
    14. 14. So, how do you build an IP network? <ul><li>Buy (lease) routers </li></ul><ul><li>Buy (lease) fibre </li></ul><ul><li>Connect them all together </li></ul><ul><li>Configure routers </li></ul><ul><li>Configure end-systems </li></ul>$1m? $2m? for a new, populated, backbone router! Wayleaves = $$$ Be a landowner! Correctly. For now. Mwuhahaha. Someone else’s can of worms.
    15. 15. Multiple router flavours <ul><li>Core </li></ul><ul><ul><li>OC-12 (622Mbps) and up (to OC-768 ~= 40Gbps) </li></ul></ul><ul><ul><li>Big, fat, fast, expensive </li></ul></ul><ul><ul><li>E.g. Cisco HFR, Juniper T-640 </li></ul></ul><ul><ul><ul><li>HFR: 1.2Tbps each, interconnect up to 72 giving 92Tbps, start at $450k </li></ul></ul></ul><ul><li>Transit/Peering-facing </li></ul><ul><ul><li>OC-3 and up, good GigE density </li></ul></ul><ul><ul><li>ACLs, full-on BGP, uRPF, accounting </li></ul></ul><ul><li>Customer-facing </li></ul><ul><ul><li>FR/ATM/… </li></ul></ul><ul><ul><li>Feature set as above, plus fancy queues, etc </li></ul></ul><ul><li>Broadband aggregator </li></ul><ul><ul><li>High scalability: sessions, ports, reconnections </li></ul></ul><ul><ul><li>Feature set as above </li></ul></ul><ul><li>Customer-premises (CPE) </li></ul><ul><ul><li>100Mbps, maybe </li></ul></ul><ul><ul><li>NAT, DHCP, firewall, wireless, VoIP, … </li></ul></ul><ul><ul><li>Low cost, low-end, perhaps just software on a PC </li></ul></ul>A sample taxonomy Cisco CRS-1 Multi-shelf system
    16. 16. Network design <ul><li>Whose network? </li></ul><ul><ul><li>ISPs, IXs, enterprise, campus </li></ul></ul><ul><ul><li>POPs, DCs </li></ul></ul><ul><li>Many designs: flat, hierarchical, hybrids, multiple scales </li></ul><ul><li>Many constraints </li></ul><ul><ul><li>Business </li></ul></ul><ul><ul><ul><li>Backwards compatibility. Who to connect. Peering. </li></ul></ul></ul><ul><ul><li>Technology </li></ul></ul><ul><ul><ul><li>Power – directly (24x7 operation) and indirectly (cooling) </li></ul></ul></ul><ul><ul><ul><li>Port density vs. raw bandwidth </li></ul></ul></ul><ul><ul><ul><li>Software reliability </li></ul></ul></ul><ul><ul><ul><li>Hardware/software capability </li></ul></ul></ul><ul><ul><ul><ul><li>Addressing schemes for scalability, summarization </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Can’t run feature X with feature Y on vendor C in network size N </li></ul></ul></ul></ul><ul><ul><li>Connectivity/resiliency </li></ul></ul><ul><ul><ul><li>“ All core routers connect to at least 2 other core routers” </li></ul></ul></ul><ul><ul><ul><li>“ All edge routers connect to at least 2 core routers” </li></ul></ul></ul>
    17. 17. Router configuration <ul><li>Initialization </li></ul><ul><ul><li>Name the router, setup boot options, setup authentication options </li></ul></ul><ul><li>Configure interfaces </li></ul><ul><ul><li>Loopback, ethernet, fibre, ATM </li></ul></ul><ul><ul><li>Subnet/mask, filters, static routes </li></ul></ul><ul><ul><li>Shutdown (or not), queueing options, full/half duplex </li></ul></ul><ul><li>Configure routeing protocols (OSPF, BGP, IS-IS, …) </li></ul><ul><ul><ul><li>Process number, addresses to accept routes from, networks to advertise </li></ul></ul></ul><ul><li>Access lists, filters, ... </li></ul><ul><ul><li>Numeric id, permit/deny, subnet/mask, protocol, port </li></ul></ul><ul><li>Route-maps, matching routes rather than data traffic </li></ul><ul><li>Other configuration aspects: traps, syslog, etc </li></ul><ul><li>(Oh, and switch configuration is about as painful) </li></ul>
    18. 18. Router configuration fragments hostname FOOBAR ! boot system flash slot0:a-boot-image.bin boot system flash bootflash: logging buffered 100000 debugging logging console informational aaa new-model aaa authentication login default tacacs local aaa authentication login consoleport none aaa authentication ppp default if-needed tacacs aaa authorization network tacacs ! ip tftp source-interface Loopback0 no ip domain-lookup ip name-server 10.34.56.78 ! ip multicast-routing ip dvmrp route-limit 7000 ip cef distributed interface Loopback0 description router-1.network.corp.com ip address 10.65.21.43 255.255.255.255 ! interface FastEthernet0/0/0 description Link to New York ip address 10.65.43.21 255.255.255.128 ip access-group 175 in ip helper-address 10.65.12.34 ip pim sparse-mode ip cgmp ip dvmrp accept-filter 98 neighbor-list 99 full-duplex ! interface FastEthernet4/0/0 no ip address ip access-group 183 in ip pim sparse-mode ip cgmp shutdown full-duplex router ospf 2 log-adjacency-changes passive-interface FastEthernet0/0/0 passive-interface FastEthernet0/1/0 passive-interface FastEthernet1/0/0 passive-interface FastEthernet1/1/0 passive-interface FastEthernet2/0/0 passive-interface FastEthernet2/1/0 passive-interface FastEthernet3/0/0 network 10.65.23.45 0.0.0.255 area 1.0.0.0 network 10.65.34.56 0.0.0.255 area 1.0.0.0 network 10.65.43.0 0.0.0.127 area 1.0.0.0 access-list 24 remark Mcast ACL access-list 24 permit 239.255.255.254 access-list 24 permit 224.0.1.111 access-list 24 permit 239.192.0.0 0.3.255.255 access-list 24 permit 232.192.0.0 0.3.255.255 access-list 24 permit 224.0.0.0 0.0.0.255 access-list 1011 deny 0000.0000.0000 ffff.ffff.ffff ffff.ffff.ffff 0000.0000.0000 0xD1 2 eq 0x42 access-list 1011 permit 0000.0000.0000 ffff.ffff.ffff 0000.0000.0000 ffff.ffff.ffff tftp-server slot1:some-other-image.bin tacacs-server host 10.65.0.2 tacacs-server key xxxxxxxx rmon event 1 trap Trap1 description &quot;CPU Utilization>75%&quot; owner config rmon event 2 trap Trap2 description &quot;CPU Utilization>95%&quot; owner config
    19. 19. <ul><li>Lots of large, fragile text files </li></ul><ul><ul><li>00s/000s routers, 00s/000s lines per config </li></ul></ul><ul><ul><li>Errors are hard to find and have non-obvious results </li></ul></ul><ul><ul><li>Router configuration also editable on-line </li></ul></ul><ul><ul><li>Order matters! </li></ul></ul><ul><li>How to keep track of them all? </li></ul><ul><ul><li>Naming schemes, directory trees, CVS, ssh upload and atomic commit to router </li></ul></ul><ul><ul><li>Perhaps even a proper database </li></ul></ul><ul><li>State of the art is pretty basic </li></ul><ul><ul><li>Few tools to check consistency, design goals </li></ul></ul><ul><ul><li>Generally generate configurations from templates and have human-intensive process to control access to running configs </li></ul></ul><ul><li>Topic of current research [Feamster et al] </li></ul>Router configuration This counts as advanced!
    20. 20. Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP network management protocols </li></ul><ul><ul><li>ICMP, SNMP, NetFlow </li></ul></ul><ul><li>Pulling it all together </li></ul><ul><li>An alternative approach </li></ul>
    21. 21. ICMP <ul><li>Internet Control Message Protocol [RFC792] </li></ul><ul><ul><li>IP protocol #1 </li></ul></ul><ul><ul><li>In-band “control” </li></ul></ul><ul><li>Variety of message types </li></ul><ul><ul><li>echo/echo reply [ PING (packet internet groper) ] </li></ul></ul><ul><ul><li>time exceeded [ TRACEROUTE ] </li></ul></ul><ul><ul><li>destination unreachable, redirect </li></ul></ul><ul><ul><li>source quench </li></ul></ul>
    22. 22. Ping (Packet INternet Groper) <ul><li>Test for liveness </li></ul><ul><ul><li>… also used to measure (round-trip) latency </li></ul></ul><ul><li>Send ICMP echo </li></ul><ul><li>Valid IP host [RFC1122, RFC1123] must reply with ICMP echo response </li></ul><ul><li>Subnet PING? </li></ul><ul><ul><li>Useful but often not available/deprecated </li></ul></ul><ul><ul><li>“ ACK” implosion could be a problem </li></ul></ul><ul><ul><li>RFCs ~= standards </li></ul></ul>
    23. 23. Traceroute <ul><li>Which route do my packets take to their destination? </li></ul><ul><ul><li>Send UDP packets with increasing time-to-live values </li></ul></ul><ul><ul><li>Compliant IP host must respond with ICMP “time exceeded” </li></ul></ul><ul><ul><li>Triggers each host along path to so respond </li></ul></ul><ul><li>Not quite that simple </li></ul><ul><ul><li>One router, many IP addresses: which source address? </li></ul></ul><ul><ul><li>Router control processor, inbound or outbound interface </li></ul></ul><ul><ul><li>Asymmetric routes (return path != outbound path) </li></ul></ul><ul><ul><li>Routes change </li></ul></ul><ul><li>Do we want full-mesh host-host routes anyway?! </li></ul><ul><ul><li>Size of data set, amount of probe traffic </li></ul></ul><ul><ul><li>This is topology, what about load on links? </li></ul></ul>
    24. 24. SNMP <ul><li>Protocol to manage information tables at devices </li></ul><ul><li>Provides get, set, trap, notify operations </li></ul><ul><ul><li>get, set: read, write values </li></ul></ul><ul><ul><li>trap: signal a condition (e.g. threshold exceeded) </li></ul></ul><ul><ul><li>notify: reliable trap </li></ul></ul><ul><li>Complexity mostly in the MIB design </li></ul><ul><ul><li>Some standard tables, but many vendor specific </li></ul></ul><ul><ul><li>Non-critical, so often tables populated incorrectly </li></ul></ul><ul><ul><li>Many tens of MIBs (thousands of lines) per device </li></ul></ul><ul><ul><li>Different versions, different data, different semantics </li></ul></ul><ul><li>Yet another configuration tracking problem </li></ul><ul><ul><li>Inter-relationships between MIBs </li></ul></ul>
    25. 25. IPFIX <ul><li>IETF working group </li></ul><ul><ul><li>Export of flow based data out of IP network devices </li></ul></ul><ul><ul><li>Developing suitable protocol from Cisco NetFlow™ v9 </li></ul></ul><ul><ul><li>[RFC3954, RFC3955] </li></ul></ul><ul><li>Statistics reporting </li></ul><ul><ul><li>Setup template </li></ul></ul><ul><ul><li>Send data records matching template </li></ul></ul><ul><li>Many variables </li></ul><ul><ul><li>Packet/flow counters, rule matches, quite flexible </li></ul></ul>
    26. 26. Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP network management protocols </li></ul><ul><li>Pulling it all together </li></ul><ul><ul><li>Network mapping, statistics gathering, control </li></ul></ul><ul><li>An alternative approach </li></ul>
    27. 27. An hypothetical NMS <ul><li>GUI around ICMP (ping, traceroute), SNMP, etc </li></ul><ul><ul><li>Recursive host discovery </li></ul></ul><ul><ul><li>Broadcast ping, ARP, default gateway: start somewhere </li></ul></ul><ul><ul><li>Recursively SNMP query for known hosts/connected networks </li></ul></ul><ul><ul><li>Ping known hosts to test liveness </li></ul></ul><ul><ul><li>Iterate </li></ul></ul><ul><li>Display topology: allow “drill-down” to particular devices </li></ul><ul><li>Configure and monitor known devices </li></ul><ul><ul><li>Trap, Netflow™, syslog message destinations </li></ul></ul><ul><ul><li>Counter thresholds, CPU utilization threshold, fault reporting </li></ul></ul><ul><ul><li>Particular faults or fault patterns </li></ul></ul><ul><li>Interface statistics and graphs (MRTG) </li></ul>
    28. 28. NOC, NOC. Calling AT&T…
    29. 29. What are they all looking at? http://www.stat.ee.ethz.ch/mrtg/
    30. 30. An hypothetical NMS <ul><li>All very straightforward? No, not really </li></ul><ul><ul><li>A lot of software engineering: corner cases, traceroute interpretation, NATs, etc </li></ul></ul><ul><li>Correctness </li></ul><ul><ul><li>MIBs may contain rubbish </li></ul></ul><ul><ul><li>Can only view inside your network anyway </li></ul></ul><ul><ul><li>Tunnelled, encrypted protocols becoming prevalent </li></ul></ul><ul><li>Efficiency </li></ul><ul><ul><li>Rate pacing discovery traffic: ping implosion/explosion </li></ul></ul><ul><ul><li>SNMP overloading router CPUs </li></ul></ul><ul><li>Using NMSs also not straightforward </li></ul><ul><ul><li>How to setup “correct” thresholds? </li></ul></ul><ul><ul><li>How to decide when something “bad” has happened? </li></ul></ul><ul><ul><li>How to present (or even interpret) reams and reams of data? </li></ul></ul>
    31. 31. Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP network management protocols </li></ul><ul><li>Pulling it all together </li></ul><ul><li>An alternative approach </li></ul><ul><ul><li>From the edges… </li></ul></ul>
    32. 32. Anemone <ul><li>An endsystem network management platform </li></ul><ul><ul><li>Collect flow information from endsystems, and </li></ul></ul><ul><ul><li>Combine with topology information from routeing protocols </li></ul></ul><ul><li>Endsystems have more information about their traffic network devices </li></ul><ul><ul><li>No router support required </li></ul></ul><ul><li>A platform to support many applications </li></ul><ul><li>Currently concentrating on “managed networks” </li></ul><ul><ul><li>E.g. governments, enterprises, etc </li></ul></ul><ul><ul><li>High complexity, high value </li></ul></ul><ul><ul><li>High degree of endsystem control </li></ul></ul>
    33. 33. Applications <ul><li>Real-time and historical analysis </li></ul><ul><ul><li>Current topology + ingress, egress flows gives global picture of network behaviour </li></ul></ul><ul><ul><li>Capacity planning, anomaly detection </li></ul></ul><ul><li>Modelling “what if” scenarios </li></ul><ul><ul><li>Plug into a simulator back-end </li></ul></ul><ul><ul><li>“ What happens to the network if we move all our Exchange servers to a single data centre?” </li></ul></ul><ul><li>Automatic configuration </li></ul><ul><ul><li>Close the loop: enable network to meet dynamic SLAs </li></ul></ul><ul><ul><li>Reconfigure network to track e.g. time of day load changes </li></ul></ul>
    34. 34. Challenges <ul><li>How do you instrument an entire network? Do you need to instrument all endsystems? </li></ul><ul><li>What network information should be captured and stored? </li></ul><ul><li>How do you access data stores distributed across a large network (ca. 300,000 nodes)? </li></ul>1% coverage gives 99.999% bytes and flows Flow data augmented with application and user context Use distributed query system
    35. 35. Summary <ul><li>Introduction </li></ul><ul><ul><li>What is network management? </li></ul></ul><ul><li>Abstractions </li></ul><ul><ul><li>ISO FCAPS, TMN EMS, ATM </li></ul></ul><ul><li>IP network components </li></ul><ul><ul><li>IP, networks, routers </li></ul></ul><ul><li>IP network management protocols </li></ul><ul><ul><li>ICMP, SNMP, etc </li></ul></ul><ul><li>Pulling it all together </li></ul><ul><ul><li>Outline of a network management system </li></ul></ul><ul><li>An alternative approach: from the edges </li></ul>
    36. 36. The end <ul><li>Questions </li></ul><ul><li>Answers? </li></ul><ul><li>http:// www.cisco.com / </li></ul><ul><li>http://www.routergod.com/ </li></ul><ul><li>http://www.ietf.org/ </li></ul><ul><li>http:// ipmon.sprintlabs.com/pyrt / </li></ul><ul><li>http:// www.nanog.org / </li></ul>
    37. 37. Backup slides <ul><li>Internet routeing </li></ul><ul><li>OSPF </li></ul><ul><li>BGP </li></ul>
    38. 38. Internet routeing <ul><li>Q: how to get a packet from node to destination? </li></ul><ul><li>A1: advertise all reachable destinations and apply a consistent cost function ( distance vector) </li></ul><ul><li>A2: learn network topology and compute consistent shortest paths ( link state ) </li></ul><ul><ul><li>Each node (1) discovers and advertises adjacencies ; (2) builds link state database ; (3) computes shortest paths </li></ul></ul><ul><li>A1, A2: Forward to next-hop using longest-prefix-match </li></ul>
    39. 39. OSPF (~link state routeing) <ul><li>Q: how to route given packet from any node to destination? </li></ul><ul><li>A: learn network topology; compute shortest paths </li></ul><ul><li>For each node </li></ul><ul><ul><li>Discover adjacencies (~ immediate neighbours ) ; advertise </li></ul></ul><ul><ul><li>Build link state database (~ network topology ) </li></ul></ul><ul><ul><li>Compute shortest paths to all destination prefixes </li></ul></ul><ul><ul><li>Forward to next-hop using longest-prefix-match (~ most specific route ) </li></ul></ul>
    40. 40. BGP (~path vector routeing) <ul><li>Q: how to route given packet from any node to destination? </li></ul><ul><li>A: neighbours tell you destinations they can reach; pick cheapest option </li></ul><ul><li>For each node </li></ul><ul><ul><li>Receive (destination, cost, next-hop) for all destinations known to neighbour </li></ul></ul><ul><ul><li>Longest-prefix-match among next-hops for given destination </li></ul></ul><ul><ul><li>Advertise selected (destination, cost+  , next-hop ' ) for all known destinations </li></ul></ul><ul><li>Selection process is complicated </li></ul><ul><li>Routes can be modified/hidden at all three stages </li></ul><ul><ul><li>General mechanism for application of policy </li></ul></ul>
    41. 42. Anemone: where are we now? <ul><li>Three major components </li></ul><ul><ul><li>Flow collection </li></ul></ul><ul><ul><li>Route collection </li></ul></ul><ul><ul><li>Distributed database </li></ul></ul><ul><li>Building prototypes, simulating system </li></ul>
    42. 43. Data collection <ul><li>Flow collection </li></ul><ul><ul><li>Hosts track active flows </li></ul></ul><ul><ul><ul><li>Using low overhead event posting infrastructure, ETW </li></ul></ul></ul><ul><ul><ul><li>Built prototype device driver provider & user-space consumer </li></ul></ul></ul><ul><ul><li>Used packet traces for feasibility study on (client, server) </li></ul></ul><ul><ul><ul><li>Peaks at (165, 5667) live and (39, 567) active flows per sec </li></ul></ul></ul><ul><li>Route collection </li></ul><ul><ul><li>OSPF is link-state: passively collect link state adverts </li></ul></ul><ul><ul><li>Extension of my work at Sprint (for IS-IS and BGP); also been done at AT&T (NSDI’04 paper) </li></ul></ul>
    43. 44. The distributed database <ul><li>Logically contains </li></ul><ul><ul><li>Traffic flow matrix (bandwidths), {srcs} × {dsts} </li></ul></ul><ul><ul><li>… each entry annotated with current route from src to dst </li></ul></ul><ul><ul><ul><li>N.B. src/dst might be e.g. (IP end-point, application) </li></ul></ul></ul><ul><li>Large dynamic data set suggests aggregation </li></ul><ul><li>Related work </li></ul><ul><ul><li>{ distributed, continuous query, temporal } databases </li></ul></ul><ul><ul><li>Sensor networks </li></ul></ul><ul><li>Potential starting points: Astrolabe or SDIMS (SIGCOMM’04) </li></ul><ul><ul><li>Where/what/how much to aggregate? </li></ul></ul><ul><ul><ul><li>Is data read- or write-dominated? </li></ul></ul></ul><ul><ul><ul><li>Which is more dynamic, flow or topology data? </li></ul></ul></ul><ul><ul><ul><li>Can the system successfully self-tune? </li></ul></ul></ul>
    44. 45. The distributed database <ul><li>Construct traffic matrix from flow monitoring </li></ul><ul><ul><li>Hosts can supply flows they source and sink </li></ul></ul><ul><ul><li>Only need a subset of this data to get complete traffic matrix </li></ul></ul><ul><li>Construct topology from route collection </li></ul><ul><ul><li>OSPF supplies topology -> routes </li></ul></ul><ul><li>Wish to be able to answer queries like </li></ul><ul><ul><li>“ Who are the top-10 traffic generators?” </li></ul></ul><ul><ul><ul><li>Easy to aggregate, don’t care about topology </li></ul></ul></ul><ul><ul><li>“ What is the load on link l ?” </li></ul></ul><ul><ul><ul><li>Can aggregate from hosts, but need to know routes </li></ul></ul></ul><ul><ul><li>“ What happens if we remove links {l…m} ?” </li></ul></ul><ul><ul><ul><li>Interaction between traffic matrix, topology, even flow control </li></ul></ul></ul>
    45. 46. The distributed database <ul><li>Building simulation model </li></ul><ul><ul><li>OSPF data gives topology, event list, routes </li></ul></ul><ul><ul><li>Simple load model to start with (load ~ # subnets) </li></ul></ul><ul><ul><li>Precedence matrix (from SPF) reduces flow-data query set </li></ul></ul><ul><li>Can we do as well/better than e.g. NetFlow? </li></ul><ul><ul><li>Accuracy/coverage trade-off </li></ul></ul><ul><li>How should we distribute the DB? </li></ul><ul><ul><li>Just OSPF data? Just flow data? A mixture? </li></ul></ul><ul><li>How many levels of aggregation? </li></ul><ul><ul><li>How many nodes do queries touch? </li></ul></ul><ul><li>What sort of API is suitable? </li></ul><ul><ul><li>Example queries for sample applications </li></ul></ul>

    ×