Network Management Richard Mortier Microsoft Research, Cambridge (Guest lecture, Digital Communications II)
Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP ...
Overview <ul><li>Introduction </li></ul><ul><ul><li>What’s it all about then? </li></ul></ul><ul><li>Abstractions </li></u...
What is  network management ? <ul><li>One point-of-view: a large field full of acronyms </li></ul><ul><ul><li>EMS, TMN, NE...
What is  network management ? <ul><li>Computer networks are considered to have three operating timescales </li></ul><ul><u...
Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><ul><li>ISO FCAPS, TMN EMS, ATM </li></ul></ul>...
ISO FCAPS:  functional  separation <ul><li>F ault </li></ul><ul><ul><li>Recognize, isolate, correct, log faults </li></ul>...
TMN EMS:  administrative  separation <ul><li>Telecommunications Management Network </li></ul><ul><li>Element Management Sy...
The B-ISDN reference model <ul><li>Asynchronous Transfer Mode “cube” </li></ul><ul><ul><li>See IAP lectures, maybe   </li...
Network management <ul><li>Models of general communication networks </li></ul><ul><ul><li>Tend to be quite abstract and  e...
Network management <ul><li>We’ll concentrate on IP networks </li></ul><ul><ul><li>Still acronym city: ICMP, SNMP, MIB, RFC...
Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><ul><li...
IP primer (you probably know all this) <ul><li>Destination-routed packets – no connections </li></ul><ul><ul><li>Time-to-l...
So, how do you build an IP network? <ul><li>Buy (lease) routers </li></ul><ul><li>Buy (lease) fibre </li></ul><ul><li>Conn...
Router configuration <ul><li>Initialization </li></ul><ul><ul><li>Name the router, setup boot options, setup authenticatio...
Router configuration fragments hostname FOOBAR ! boot system flash slot0:a-boot-image.bin boot system flash bootflash: log...
<ul><li>Lots of quite large and fragile text files  </li></ul><ul><ul><li>00s/000s routers, 00s/000s lines per config </li...
Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP ...
ICMP <ul><li>Internet Control Message Protocol [RFC792] </li></ul><ul><ul><li>IP protocol #1 </li></ul></ul><ul><ul><li>In...
Ping (Packet INternet Groper) <ul><li>Test for liveness </li></ul><ul><ul><li>… also used to measure (round-trip) latency ...
Traceroute <ul><li>Which route do my packets take to their destination? </li></ul><ul><ul><li>Send UDP packets with increa...
SNMP <ul><li>Protocol to manage information tables at devices </li></ul><ul><li>Provides  get, set, trap, notify  operatio...
IPFIX <ul><li>IETF working group </li></ul><ul><ul><li>Export of flow based data out of IP network devices </li></ul></ul>...
Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP ...
An hypothetical NMS <ul><li>GUI around ICMP (ping, traceroute), SNMP, etc </li></ul><ul><li>Recursive host discovery </li>...
A real NOC (Network Operations Centre) [ from AT&T ]
An hypothetical NMS <ul><li>All very straightforward?  No, not really </li></ul><ul><ul><li>A lot of software engineering:...
Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP ...
<ul><li>Edge-based network management platform </li></ul><ul><ul><li>Collect flow information from hosts, and </li></ul></...
System outline Control Packets Flows Routeing protocol Topology Visualize Simulate Simulator Distributed database Traffic ...
<ul><li>Pictures of current topology and traffic </li></ul><ul><ul><li>Routes+flows+forwarding rules     BIG PICTURE </li...
Where might my traffic go tomorrow? <ul><li>Plug into a simulator back-end </li></ul><ul><ul><li>Discrete event simulator,...
Where  should  my traffic be going? <ul><li>Close the loop: compute link weights to implement policy goals </li></ul><ul><...
Where are we now? <ul><li>Three major components </li></ul><ul><ul><li>Flow collection </li></ul></ul><ul><ul><li>Route co...
Data collection <ul><li>Flow collection </li></ul><ul><ul><li>Hosts track active flows  </li></ul></ul><ul><ul><ul><li>Usi...
The distributed database <ul><li>Logically contains  </li></ul><ul><ul><li>Traffic flow matrix (bandwidths),  {srcs}   ×  ...
The distributed database <ul><li>Construct traffic matrix from flow monitoring </li></ul><ul><ul><li>Hosts can supply flow...
The distributed database <ul><li>Building simulation model </li></ul><ul><ul><li>OSPF data gives topology, event list, rou...
Summary <ul><li>Introduction </li></ul><ul><ul><li>What is network management?  </li></ul></ul><ul><li>Abstractions </li><...
The end <ul><li>Questions </li></ul><ul><li>Answers? </li></ul><ul><li>http:// www.cisco.com / </li></ul><ul><li>http://ww...
Backup slides <ul><li>Internet routeing </li></ul><ul><li>OSPF </li></ul><ul><li>BGP </li></ul>
Internet routeing <ul><li>Q: how to get a packet from node to destination? </li></ul><ul><li>A1: advertise all reachable d...
OSPF (~link state routeing) <ul><li>Q: how to route given packet from any node to destination? </li></ul><ul><li>A: learn ...
BGP (~path vector routeing) <ul><li>Q: how to route given packet from any node to destination? </li></ul><ul><li>A: neighb...
Upcoming SlideShare
Loading in …5
×

Richard Mortier, Microsoft Research, Guest Lecture on Network ...

1,734 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,734
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Two types: link-state (OSPF), distance/path-vector (BGP) IP options Loose source routing, strict source routing, record route Can be treated separately, ignored, dropped
  • Layer violation Control protocol encapsulated by controlled protocol...
  • Note advantages: unlike routers and netflow
  • Annotations state what’s happening; italics are outcomes; boxes are hosts (which generate packets); blue arrowed circles are routers (which generate routeing protocol info)
  • Netflow: sample flows in middle of net (partial info about any flow); enma, sample flows at edges (complete info about some flows)
  • imperative; Complex, hard to control, independent of (eg) SNMP and other management tools
  • imperative; Complex, hard to control, independent of (eg) SNMP and other management tools
  • Richard Mortier, Microsoft Research, Guest Lecture on Network ...

    1. 1. Network Management Richard Mortier Microsoft Research, Cambridge (Guest lecture, Digital Communications II)
    2. 2. Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP network management protocols </li></ul><ul><li>Pulling it all together </li></ul><ul><li>An alternative approach </li></ul>
    3. 3. Overview <ul><li>Introduction </li></ul><ul><ul><li>What’s it all about then? </li></ul></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP network management protocols </li></ul><ul><li>Pulling it all together </li></ul><ul><li>An alternative approach </li></ul>
    4. 4. What is network management ? <ul><li>One point-of-view: a large field full of acronyms </li></ul><ul><ul><li>EMS, TMN, NE, CMIP, CMISE, OSS, AN.1, TL1, EML, FCAPS, ITU, ... </li></ul></ul><ul><ul><li>(Don’t ask me what all of those mean, I don’t care!) </li></ul></ul><ul><li>From question.com: </li></ul><ul><ul><li>In 1989, a random of the journalistic persuasion asked hacker Paul Boutin “What do you think will be the biggest problem in computing in the 90s?” Paul's straight-faced response: “There are only 17,000 three-letter acronyms.” (To be exact, there are 26^3 = 17,576.) </li></ul></ul><ul><li>Will ignore most of them  </li></ul>
    5. 5. What is network management ? <ul><li>Computer networks are considered to have three operating timescales </li></ul><ul><ul><li>Data : packet forwarding [ μs, ms ] </li></ul></ul><ul><ul><li>Control : flows/connections [ secs, mins ] </li></ul></ul><ul><ul><li>Management : aggregates, networks [ hours,days ] </li></ul></ul><ul><li>… so we’re concerned with “the network” rather than particular devices </li></ul><ul><li>Standardization is key! </li></ul>
    6. 6. Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><ul><li>ISO FCAPS, TMN EMS, ATM </li></ul></ul><ul><li>IP network components </li></ul><ul><li>IP network management protocols </li></ul><ul><li>Pulling it all together </li></ul><ul><li>An alternative approach </li></ul>
    7. 7. ISO FCAPS: functional separation <ul><li>F ault </li></ul><ul><ul><li>Recognize, isolate, correct, log faults </li></ul></ul><ul><li>C onfiguration </li></ul><ul><ul><li>Collect, store, track configurations </li></ul></ul><ul><li>A ccounting </li></ul><ul><ul><li>Collect statistics, bill users, enforce quotas </li></ul></ul><ul><li>P erformance </li></ul><ul><ul><li>Monitor trends, set thresholds, trigger alarms </li></ul></ul><ul><li>S ecurity </li></ul><ul><ul><li>Identify, secure, manage risks </li></ul></ul>
    8. 8. TMN EMS: administrative separation <ul><li>Telecommunications Management Network </li></ul><ul><li>Element Management System </li></ul><ul><li>“ ...simple but elegant...” (!) </li></ul><ul><ul><li>(my emphasis) </li></ul></ul><ul><li>NEL: network elements (switches, transmission systems) </li></ul><ul><li>EML: element management (devices, links) </li></ul><ul><li>NML: network management (capacity, congestion) </li></ul><ul><li>SML: service management (SLAs, time-to-market) </li></ul><ul><li>BML: business management (RoI, market share, blah) </li></ul>
    9. 9. The B-ISDN reference model <ul><li>Asynchronous Transfer Mode “cube” </li></ul><ul><ul><li>See IAP lectures, maybe  </li></ul></ul><ul><li>Plane management… </li></ul><ul><ul><li>The whole network </li></ul></ul><ul><li>… vs layer management </li></ul><ul><ul><li>Specific layers </li></ul></ul><ul><li>Topology </li></ul><ul><li>Configuration </li></ul><ul><li>Fault </li></ul><ul><li>Operations </li></ul><ul><li>Accounting </li></ul><ul><li>Performance </li></ul>management plane user plane control plane higher layers ATM layer physical layer plane management higher layers ATM adaptation layer layer management
    10. 10. Network management <ul><li>Models of general communication networks </li></ul><ul><ul><li>Tend to be quite abstract and exceedingly tedious! </li></ul></ul><ul><ul><li>Many practitioners still seem excited about OO programming, WIMP interfaces, etc </li></ul></ul><ul><ul><li>… probably because implementation is hard due to so many excessively long and complex standards! </li></ul></ul><ul><li>My view: basic “need-to-know” requirements are </li></ul><ul><ul><li>What should be happening? [ c ] </li></ul></ul><ul><ul><li>What is happening? [ f, p, a ] </li></ul></ul><ul><ul><li>What shouldn’t be happening? [ f, s ] </li></ul></ul><ul><ul><li>What will be happening? [ p, a ] </li></ul></ul>
    11. 11. Network management <ul><li>We’ll concentrate on IP networks </li></ul><ul><ul><li>Still acronym city: ICMP, SNMP, MIB, RFC  </li></ul></ul><ul><ul><li>Sample size: 10 2 routers, 10 5 hosts </li></ul></ul><ul><li>We’ll concentrate on the network core </li></ul><ul><ul><li>Routers, not hosts </li></ul></ul><ul><li>We’ll ignore “service management” </li></ul><ul><ul><li>DNS, AD, file stores, etc </li></ul></ul>
    12. 12. Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><ul><li>IP primer, router configuration </li></ul></ul><ul><li>IP network management protocols </li></ul><ul><li>Pulling it all together </li></ul><ul><li>An alternative approach </li></ul>
    13. 13. IP primer (you probably know all this) <ul><li>Destination-routed packets – no connections </li></ul><ul><ul><li>Time-to-live field: allow removal of looping packets </li></ul></ul><ul><li>Routers forward packets based on routeing tables </li></ul><ul><ul><li>Tables populated by routeing protocols </li></ul></ul><ul><li>Routers and protocols operate independently </li></ul><ul><ul><li>… although protocols aim to build consistent state </li></ul></ul><ul><li>RFCs ~= standards </li></ul><ul><ul><li>Often much looser semantics than e.g. ISO, ITU standards </li></ul></ul><ul><ul><li>Compare for example OSPF [RFC2327] and IS-IS [RFC1142, RFC1195], two link-state routeing protocols </li></ul></ul>
    14. 14. So, how do you build an IP network? <ul><li>Buy (lease) routers </li></ul><ul><li>Buy (lease) fibre </li></ul><ul><li>Connect them all together </li></ul><ul><li>Configure routers appropriately </li></ul><ul><li>Configure end-systems appropriately </li></ul><ul><li>Assume you’ve done 1–3 and someone else is doing 5… </li></ul>
    15. 15. Router configuration <ul><li>Initialization </li></ul><ul><ul><li>Name the router, setup boot options, setup authentication options </li></ul></ul><ul><li>Configure interfaces </li></ul><ul><ul><li>Loopback, ethernet, fibre, ATM </li></ul></ul><ul><ul><li>Subnet/mask, filters, static routes </li></ul></ul><ul><ul><li>Shutdown (or not), queueing options, full/half duplex </li></ul></ul><ul><li>Configure routeing protocols (OSPF, BGP, IS-IS, …) </li></ul><ul><ul><ul><li>Process number, addresses to accept routes from, networks to advertise </li></ul></ul></ul><ul><li>Access lists, filters, ... </li></ul><ul><ul><li>Numeric id, permit/deny, subnet/mask, protocol, port </li></ul></ul><ul><li>Route-maps, matching routes rather than data traffic </li></ul><ul><li>Other configuration aspects: traps, syslog, etc </li></ul>
    16. 16. Router configuration fragments hostname FOOBAR ! boot system flash slot0:a-boot-image.bin boot system flash bootflash: logging buffered 100000 debugging logging console informational aaa new-model aaa authentication login default tacacs local aaa authentication login consoleport none aaa authentication ppp default if-needed tacacs aaa authorization network tacacs ! ip tftp source-interface Loopback0 no ip domain-lookup ip name-server 10.34.56.78 ! ip multicast-routing ip dvmrp route-limit 7000 ip cef distributed interface Loopback0 description router-1.network.corp.com ip address 10.65.21.43 255.255.255.255 ! interface FastEthernet0/0/0 description Link to New York ip address 10.65.43.21 255.255.255.128 ip access-group 175 in ip helper-address 10.65.12.34 ip pim sparse-mode ip cgmp ip dvmrp accept-filter 98 neighbor-list 99 full-duplex ! interface FastEthernet4/0/0 no ip address ip access-group 183 in ip pim sparse-mode ip cgmp shutdown full-duplex router ospf 2 log-adjacency-changes passive-interface FastEthernet0/0/0 passive-interface FastEthernet0/1/0 passive-interface FastEthernet1/0/0 passive-interface FastEthernet1/1/0 passive-interface FastEthernet2/0/0 passive-interface FastEthernet2/1/0 passive-interface FastEthernet3/0/0 network 10.65.23.45 0.0.0.255 area 1.0.0.0 network 10.65.34.56 0.0.0.255 area 1.0.0.0 network 10.65.43.0 0.0.0.127 area 1.0.0.0 access-list 24 remark Mcast ACL access-list 24 permit 239.255.255.254 access-list 24 permit 224.0.1.111 access-list 24 permit 239.192.0.0 0.3.255.255 access-list 24 permit 232.192.0.0 0.3.255.255 access-list 24 permit 224.0.0.0 0.0.0.255 access-list 1011 deny 0000.0000.0000 ffff.ffff.ffff ffff.ffff.ffff 0000.0000.0000 0xD1 2 eq 0x42 access-list 1011 permit 0000.0000.0000 ffff.ffff.ffff 0000.0000.0000 ffff.ffff.ffff tftp-server slot1:some-other-image.bin tacacs-server host 10.65.0.2 tacacs-server key xxxxxxxx rmon event 1 trap Trap1 description &quot;CPU Utilization>75%&quot; owner config rmon event 2 trap Trap2 description &quot;CPU Utilization>95%&quot; owner config
    17. 17. <ul><li>Lots of quite large and fragile text files </li></ul><ul><ul><li>00s/000s routers, 00s/000s lines per config </li></ul></ul><ul><ul><li>Errors are hard to find and have non-obvious results </li></ul></ul><ul><ul><li>Router configuration also editable on-line </li></ul></ul><ul><li>How to keep track of them all? </li></ul><ul><ul><li>Naming schemes, directory hierarchies, CVS </li></ul></ul><ul><ul><li>ssh upload and atomic commit to router </li></ul></ul><ul><ul><li>Perhaps even a database </li></ul></ul><ul><li>State of the art is pretty basic </li></ul><ul><ul><li>Few tools to check consistency </li></ul></ul><ul><ul><li>Generally generate configurations from templates and have human-intensive process to control access to running configs </li></ul></ul><ul><li>Topic of current research [Feamster et al] </li></ul>Router configuration this counts as quite advanced!
    18. 18. Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP network management protocols </li></ul><ul><ul><li>ICMP, SNMP, Netflow </li></ul></ul><ul><li>Pulling it all together </li></ul><ul><li>An alternative approach </li></ul>
    19. 19. ICMP <ul><li>Internet Control Message Protocol [RFC792] </li></ul><ul><ul><li>IP protocol #1 </li></ul></ul><ul><ul><li>In-band “control” </li></ul></ul><ul><li>Variety of message types </li></ul><ul><ul><li>echo/echo reply [ PING (packet internet groper) ] </li></ul></ul><ul><ul><li>time exceeded [ TRACEROUTE ] </li></ul></ul><ul><ul><li>destination unreachable, redirect </li></ul></ul><ul><ul><li>source quench </li></ul></ul>
    20. 20. Ping (Packet INternet Groper) <ul><li>Test for liveness </li></ul><ul><ul><li>… also used to measure (round-trip) latency </li></ul></ul><ul><li>Send ICMP echo </li></ul><ul><li>Valid IP host [RFC1122, RFC1123] must reply with ICMP echo response </li></ul><ul><li>Subnet PING? </li></ul><ul><ul><li>Useful but often not available/deprecated </li></ul></ul><ul><ul><li>“ ACK” implosion could be a problem </li></ul></ul><ul><ul><li>RFCs ~= standards </li></ul></ul>
    21. 21. Traceroute <ul><li>Which route do my packets take to their destination? </li></ul><ul><ul><li>Send UDP packets with increasing time-to-live values </li></ul></ul><ul><ul><li>Compliant IP host must respond with ICMP “time exceeded” </li></ul></ul><ul><ul><li>Triggers each host along path to so respond </li></ul></ul><ul><li>Not quite that simple </li></ul><ul><ul><li>One router, many IP addresses: which source address? </li></ul></ul><ul><ul><ul><li>Router control processor, inbound or outbound interface </li></ul></ul></ul><ul><ul><li>Routes often asymmetric, so return path != outbound path </li></ul></ul><ul><ul><li>Routes change </li></ul></ul><ul><li>Do we want full-mesh host-host routes anyway?! </li></ul><ul><ul><li>Size of data set, amount of probe traffic </li></ul></ul><ul><ul><li>This is topology, what about load on links? </li></ul></ul>
    22. 22. SNMP <ul><li>Protocol to manage information tables at devices </li></ul><ul><li>Provides get, set, trap, notify operations </li></ul><ul><ul><li>get , set : read, write values </li></ul></ul><ul><ul><li>trap : signal a condition (e.g. threshold exceeded) </li></ul></ul><ul><ul><li>notify : reliable trap </li></ul></ul><ul><li>Complexity mostly in the MIB design </li></ul><ul><ul><li>Some standard tables, but many vendor specific </li></ul></ul><ul><ul><li>Non-critical, so often tables populated incorrectly </li></ul></ul><ul><ul><li>Many tens of MIBs (thousands of lines) per device </li></ul></ul><ul><ul><li>Different versions, different data, different semantics </li></ul></ul><ul><ul><ul><li>Yet another configuration tracking problem </li></ul></ul></ul><ul><ul><li>Inter-relationships between MIBs </li></ul></ul>
    23. 23. IPFIX <ul><li>IETF working group </li></ul><ul><ul><li>Export of flow based data out of IP network devices </li></ul></ul><ul><ul><li>Developing suitable protocol based on Cisco NetFlow™ v9 </li></ul></ul><ul><ul><li>[RFC3954, RFC3955] </li></ul></ul><ul><li>Statistics reporting </li></ul><ul><ul><li>Setup template </li></ul></ul><ul><ul><li>Send data records matching template </li></ul></ul><ul><li>Many variables </li></ul><ul><ul><li>Packet/flow counters, rule matches, quite flexible </li></ul></ul>
    24. 24. Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP network management protocols </li></ul><ul><li>Pulling it all together </li></ul><ul><ul><li>Network mapping, statistics gathering, control </li></ul></ul><ul><li>An alternative approach </li></ul>
    25. 25. An hypothetical NMS <ul><li>GUI around ICMP (ping, traceroute), SNMP, etc </li></ul><ul><li>Recursive host discovery </li></ul><ul><ul><li>Broadcast ping, ARP, default gateway: start somewhere </li></ul></ul><ul><ul><li>Recursively SNMP query for known hosts/connected networks </li></ul></ul><ul><ul><li>Ping known hosts to test liveness </li></ul></ul><ul><ul><li>Iterate </li></ul></ul><ul><li>Display topology: allow “drill-down” to particular devices </li></ul><ul><li>Configure and monitor known devices </li></ul><ul><ul><li>Trap, Netflow™, syslog message destinations </li></ul></ul><ul><ul><li>Counter thresholds, CPU utilization threshold, fault reporting </li></ul></ul><ul><ul><li>Particular faults or fault patterns </li></ul></ul><ul><ul><li>Interface statistics and graphs </li></ul></ul>
    26. 26. A real NOC (Network Operations Centre) [ from AT&T ]
    27. 27. An hypothetical NMS <ul><li>All very straightforward? No, not really </li></ul><ul><ul><li>A lot of software engineering: corner cases, traceroute interpretation, NATs, etc </li></ul></ul><ul><ul><li>MIBs may contain rubbish </li></ul></ul><ul><ul><li>Can only view inside your network anyway </li></ul></ul><ul><li>Efficiency </li></ul><ul><ul><li>Rate pacing discovery traffic: ping implosion/explosion </li></ul></ul><ul><ul><li>SNMP overloading router CPUs </li></ul></ul><ul><li>Tunnelled, encrypted protocols becoming prevalent </li></ul><ul><li>Using NMSs also not straightforward </li></ul><ul><ul><li>How to setup “correct” thresholds? </li></ul></ul><ul><ul><li>How to decide when something “bad” has happened? </li></ul></ul><ul><ul><li>How to present (or even interpret) reams and reams of data? </li></ul></ul>
    28. 28. Overview <ul><li>Introduction </li></ul><ul><li>Abstractions </li></ul><ul><li>IP network components </li></ul><ul><li>IP network management protocols </li></ul><ul><li>Pulling it all together </li></ul><ul><li>An alternative approach </li></ul><ul><ul><li>From the edges… </li></ul></ul>
    29. 29. <ul><li>Edge-based network management platform </li></ul><ul><ul><li>Collect flow information from hosts, and </li></ul></ul><ul><ul><li>Combine with topology information from routeing protocols </li></ul></ul><ul><li>Enable visualization, analysis, simulation, control </li></ul><ul><li>Avoid problems of not-quite-standard interfaces </li></ul><ul><ul><li>Management support is typically ‘non-critical’ (i.e. buggy  ) and not extensively tested for inter-operability </li></ul></ul><ul><li>Do the work where resources are plentiful </li></ul><ul><ul><li>Hosts have lots of cycles and little traffic (relatively) </li></ul></ul><ul><li>Protocol visibility: see into tunnels, IPSec, etc </li></ul>ENMA
    30. 30. System outline Control Packets Flows Routeing protocol Topology Visualize Simulate Simulator Distributed database Traffic matrix Set of routes srcs dsts routes
    31. 31. <ul><li>Pictures of current topology and traffic </li></ul><ul><ul><li>Routes+flows+forwarding rules  BIG PICTURE </li></ul></ul><ul><li>In fact, where did my traffic go yesterday? </li></ul><ul><ul><li>Keep historical data for capacity planning, etc </li></ul></ul><ul><li>A platform for anomaly detection </li></ul><ul><ul><li>Historical data suggests “normality,” live monitoring allows anomalies to be detected </li></ul></ul>Where is my traffic going today?
    32. 32. Where might my traffic go tomorrow? <ul><li>Plug into a simulator back-end </li></ul><ul><ul><li>Discrete event simulator, flow allocation solver </li></ul></ul><ul><li>Run multiple ‘what-if’ scenarios </li></ul><ul><ul><li>… failures </li></ul></ul><ul><ul><li>… reconfigurations </li></ul></ul><ul><ul><li>… technology deployments </li></ul></ul><ul><li>E.g. “What happens if we coalesce all the Exchange servers in one data-centre?” </li></ul>
    33. 33. Where should my traffic be going? <ul><li>Close the loop: compute link weights to implement policy goals </li></ul><ul><ul><li>Recompute on order of hours/days </li></ul></ul><ul><li>Allows more dynamic policies </li></ul><ul><ul><li>Modify network configuration to track e.g. time of day load changes </li></ul></ul><ul><li>Make network more efficient (~cheaper)? </li></ul>
    34. 34. Where are we now? <ul><li>Three major components </li></ul><ul><ul><li>Flow collection </li></ul></ul><ul><ul><li>Route collection </li></ul></ul><ul><ul><li>Distributed database </li></ul></ul><ul><li>Building prototypes, simulating system </li></ul>
    35. 35. Data collection <ul><li>Flow collection </li></ul><ul><ul><li>Hosts track active flows </li></ul></ul><ul><ul><ul><li>Using low overhead event posting infrastructure, ETW </li></ul></ul></ul><ul><ul><ul><li>Built prototype device driver provider & user-space consumer </li></ul></ul></ul><ul><ul><li>Used packet traces for feasibility study on (client, server) </li></ul></ul><ul><ul><ul><li>Peaks at (165, 5667) live and (39, 567) active flows per sec </li></ul></ul></ul><ul><li>Route collection </li></ul><ul><ul><li>OSPF is link-state: passively collect link state adverts </li></ul></ul><ul><ul><li>Extension of my work at Sprint (for IS-IS and BGP); also been done at AT&T (NSDI’04 paper) </li></ul></ul>
    36. 36. The distributed database <ul><li>Logically contains </li></ul><ul><ul><li>Traffic flow matrix (bandwidths), {srcs} × {dsts} </li></ul></ul><ul><ul><li>… each entry annotated with current route from src to dst </li></ul></ul><ul><ul><ul><li>N.B. src/dst might be e.g. (IP end-point, application) </li></ul></ul></ul><ul><li>Large dynamic data set suggests aggregation </li></ul><ul><li>Related work </li></ul><ul><ul><li>{ distributed, continuous query, temporal } databases </li></ul></ul><ul><ul><li>Sensor networks </li></ul></ul><ul><li>Potential starting points: Astrolabe or SDIMS (SIGCOMM’04) </li></ul><ul><ul><li>Where/what/how much to aggregate? </li></ul></ul><ul><ul><ul><li>Is data read- or write-dominated? </li></ul></ul></ul><ul><ul><ul><li>Which is more dynamic, flow or topology data? </li></ul></ul></ul><ul><ul><ul><li>Can the system successfully self-tune? </li></ul></ul></ul>
    37. 37. The distributed database <ul><li>Construct traffic matrix from flow monitoring </li></ul><ul><ul><li>Hosts can supply flows they source and sink </li></ul></ul><ul><ul><li>Only need a subset of this data to get complete traffic matrix </li></ul></ul><ul><li>Construct topology from route collection </li></ul><ul><ul><li>OSPF supplies topology -> routes </li></ul></ul><ul><li>Wish to be able to answer queries like </li></ul><ul><ul><li>“ Who are the top-10 traffic generators?” </li></ul></ul><ul><ul><ul><li>Easy to aggregate, don’t care about topology </li></ul></ul></ul><ul><ul><li>“ What is the load on link l ?” </li></ul></ul><ul><ul><ul><li>Can aggregate from hosts, but need to know routes </li></ul></ul></ul><ul><ul><li>“ What happens if we remove links {l…m} ?” </li></ul></ul><ul><ul><ul><li>Interaction between traffic matrix, topology, even flow control </li></ul></ul></ul>
    38. 38. The distributed database <ul><li>Building simulation model </li></ul><ul><ul><li>OSPF data gives topology, event list, routes </li></ul></ul><ul><ul><li>Simple load model to start with (load ~ # subnets) </li></ul></ul><ul><ul><li>Precedence matrix (from SPF) reduces flow-data query set </li></ul></ul><ul><li>Can we do as well/better than e.g. NetFlow? </li></ul><ul><ul><li>Accuracy/coverage trade-off </li></ul></ul><ul><li>How should we distribute the DB? </li></ul><ul><ul><li>Just OSPF data? Just flow data? A mixture? </li></ul></ul><ul><li>How many levels of aggregation? </li></ul><ul><ul><li>How many nodes do queries touch? </li></ul></ul><ul><li>What sort of API is suitable? </li></ul><ul><ul><li>Example queries for sample applications </li></ul></ul>
    39. 39. Summary <ul><li>Introduction </li></ul><ul><ul><li>What is network management? </li></ul></ul><ul><li>Abstractions </li></ul><ul><ul><li>ISO FCAPS, TMN EMS, ATM </li></ul></ul><ul><li>IP network components </li></ul><ul><ul><li>IP, routers, configurations </li></ul></ul><ul><li>IP network management protocols </li></ul><ul><ul><li>ICMP, SNMP, etc </li></ul></ul><ul><li>Pulling it all together </li></ul><ul><ul><li>Outline of a network management system </li></ul></ul><ul><li>An alternative approach: from the edges </li></ul>
    40. 40. The end <ul><li>Questions </li></ul><ul><li>Answers? </li></ul><ul><li>http:// www.cisco.com / </li></ul><ul><li>http://www.routergod.com/ </li></ul><ul><li>http://www.ietf.org/ </li></ul><ul><li>http:// ipmon.sprintlabs.com/pyrt / </li></ul><ul><li>http:// www.nanog.org / </li></ul>
    41. 41. Backup slides <ul><li>Internet routeing </li></ul><ul><li>OSPF </li></ul><ul><li>BGP </li></ul>
    42. 42. Internet routeing <ul><li>Q: how to get a packet from node to destination? </li></ul><ul><li>A1: advertise all reachable destinations and apply a consistent cost function ( distance vector) </li></ul><ul><li>A2: learn network topology and compute consistent shortest paths ( link state ) </li></ul><ul><ul><li>Each node (1) discovers and advertises adjacencies ; (2) builds link state database ; (3) computes shortest paths </li></ul></ul><ul><li>A1, A2: Forward to next-hop using longest-prefix-match </li></ul>
    43. 43. OSPF (~link state routeing) <ul><li>Q: how to route given packet from any node to destination? </li></ul><ul><li>A: learn network topology; compute shortest paths </li></ul><ul><li>For each node </li></ul><ul><ul><li>Discover adjacencies (~ immediate neighbours ) ; advertise </li></ul></ul><ul><ul><li>Build link state database (~ network topology ) </li></ul></ul><ul><ul><li>Compute shortest paths to all destination prefixes </li></ul></ul><ul><ul><li>Forward to next-hop using longest-prefix-match (~ most specific route ) </li></ul></ul>
    44. 44. BGP (~path vector routeing) <ul><li>Q: how to route given packet from any node to destination? </li></ul><ul><li>A: neighbours tell you destinations they can reach; pick cheapest option </li></ul><ul><li>For each node </li></ul><ul><ul><li>Receive (destination, cost, next-hop) for all destinations known to neighbour </li></ul></ul><ul><ul><li>Longest-prefix-match among next-hops for given destination </li></ul></ul><ul><ul><li>Advertise selected (destination, cost+  , next-hop ' ) for all known destinations </li></ul></ul><ul><li>Selection process is complicated </li></ul><ul><li>Routes can be modified/hidden at all three stages </li></ul><ul><ul><li>General mechanism for application of policy </li></ul></ul>

    ×