Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Traffic Engineering for ISP Networks Jennifer Rexford IP Network Management and Performance AT&T Labs - Research; Florham Park, NJ http://www.research.att.com/~jrex
  2. 2. Outline <ul><li>Internet routing protocols </li></ul><ul><li>Traffic engineering using traditional protocols </li></ul><ul><ul><li>Optimizing network configuration to prevailing traffic </li></ul></ul><ul><ul><li>Requirements for topology, routing, and traffic info </li></ul></ul><ul><li>Traffic demands </li></ul><ul><ul><li>Volume of load between edges of the network </li></ul></ul><ul><ul><li>Technique for measuring the traffic demands </li></ul></ul><ul><li>Route optimization </li></ul><ul><ul><li>Tuning the link weights to the offered traffic </li></ul></ul><ul><ul><li>Incorporating various operational constraints </li></ul></ul><ul><li>Other ways of measuring the traffic matrix </li></ul><ul><li>Conclusions </li></ul>
  3. 3. Autonomous Systems (ASes) <ul><li>Internet divided into ASes </li></ul><ul><ul><li>Distinct regions of administrative control (~14,000) </li></ul></ul><ul><ul><li>Routers and links managed by a single institution </li></ul></ul><ul><ul><li>Service provider, company, university, … </li></ul></ul><ul><li>Hierarchy of ASes </li></ul><ul><ul><li>Large, tier-1 provider with a nationwide backbone </li></ul></ul><ul><ul><li>Medium-sized regional provider with smaller backbone </li></ul></ul><ul><ul><li>Small network run by a single company or university </li></ul></ul><ul><li>Interaction between ASes </li></ul><ul><ul><li>Internal topology is not shared between ASes </li></ul></ul><ul><ul><li>… but, neighbors interact to coordinate routing </li></ul></ul>
  4. 4. Path Traversing Multiple ASes 1 2 3 4 5 6 7 Client Web server Path: 6, 5, 4, 3, 2, 1
  5. 5. Interdomain Routing: Border Gateway Protocol <ul><li>ASes exchange info about who they can reach </li></ul><ul><ul><li>IP prefix: block of destination IP addresses </li></ul></ul><ul><ul><li>AS path: sequence of ASes along the path </li></ul></ul><ul><li>Policies configured by the AS’s network operator </li></ul><ul><ul><li>Path selection: which of the paths to use? </li></ul></ul><ul><ul><li>Path export: which neighbors to tell? </li></ul></ul>Client ( 1 2 3 “ I can reach” “ I can reach via AS 1”
  6. 6. Intradomain Routing: OSPF or IS-IS <ul><li>Shortest path routing based on link weights </li></ul><ul><ul><li>Routers flood the link-state information to each other </li></ul></ul><ul><ul><li>Routers compute the “next hop” to reach other routers </li></ul></ul><ul><li>Weights configured by the AS’s network operator </li></ul><ul><ul><li>Simple heuristics: link capacity or physical distance </li></ul></ul><ul><ul><li>Traffic engineering: tuning the link weights to the traffic </li></ul></ul>3 2 2 1 1 3 1 4 5 3
  7. 7. Motivating Problem: Congested Link <ul><li>Detecting that a link is congested </li></ul><ul><ul><li>Utilization statistics reported every five minutes </li></ul></ul><ul><ul><li>Sample probe traffic suffers degraded performance </li></ul></ul><ul><ul><li>Customers complain (via the telephone network?) </li></ul></ul><ul><li>Reasons why the link might be congested </li></ul><ul><ul><li>Increase in demand between some set of src-dest pairs </li></ul></ul><ul><ul><li>Failed router/link within the AS causes routing change </li></ul></ul><ul><ul><li>Failure/reconfiguration in another AS changes routes </li></ul></ul><ul><li>Challenges </li></ul><ul><ul><li>Know the cause , not just the manifestations </li></ul></ul><ul><ul><li>Predict the effects of possible changes to link weights </li></ul></ul>
  8. 8. Traffic Engineering in an ISP Backbone <ul><li>Topology of the ISP backbone </li></ul><ul><ul><li>Connectivity and capacity of routers and links </li></ul></ul><ul><li>Traffic demands </li></ul><ul><ul><li>Offered load between points in the network </li></ul></ul><ul><li>Routing configuration </li></ul><ul><ul><li>Link weights for selecting paths </li></ul></ul><ul><li>Performance objective </li></ul><ul><ul><li>Balanced load, low latency, … </li></ul></ul><ul><li>Question: Given the topology and traffic demands in an IP network, what link weights should be used? </li></ul>
  9. 9. Modeling Traffic Demands <ul><li>Volume of traffic V(s,d,t) </li></ul><ul><ul><li>From a particular source s </li></ul></ul><ul><ul><li>To a particular destination d </li></ul></ul><ul><ul><li>Over a particular time period t </li></ul></ul><ul><li>Time period </li></ul><ul><ul><li>Performance debugging -- minutes or tens of minutes </li></ul></ul><ul><ul><li>Time-of-day traffic engineering -- hours or days </li></ul></ul><ul><ul><li>Network design -- days to weeks </li></ul></ul><ul><li>Sources and destinations </li></ul><ul><ul><li>Individual hosts -- interesting, but huge! </li></ul></ul><ul><ul><li>Individual prefixes -- still big; not seen by any one AS! </li></ul></ul><ul><ul><li>Individual edge links in an ISP backbone -- hmmm…. </li></ul></ul>
  10. 10. Traffic Matrix in out Traffic matrix: V(in,out,t) for all pairs (in,out)
  11. 11. Problem: Hot Potato Routing <ul><li>ISP is in the middle of the Internet </li></ul><ul><ul><li>Multiple connections to multiple other ASes </li></ul></ul><ul><ul><li>Egress point depends on intradomain routing </li></ul></ul><ul><li>Problem with point-to-point models </li></ul><ul><ul><li>Want to predict impact of changing intradomain routing </li></ul></ul><ul><ul><li>But, a change in weights may change the egress point! </li></ul></ul>1 2 3 4
  12. 12. Demand Matrix: Motivating Example Big Internet Web Site User Site
  13. 13. Coupling of Inter and Intradomain Routing Web Site User Site AS 1 AS 3 AS 4 U AS 4, AS 3, U AS 2 AS 3, U AS 3, U AS 3, U
  14. 14. Intradomain Routing: Hot Potato Zoom in on AS1 200 110 10 110 300 25 75 110 300 IN OUT 2 110 Hot-potato routing: change in internal routing (link weights) configuration changes flow exit point! OUT 3 OUT 1 50
  15. 15. Traffic Demand: Multiple Egress Points <ul><li>Definition: V(in, {out}, t) </li></ul><ul><ul><li>Entry link (in) </li></ul></ul><ul><ul><li>Set of possible egress links ({out}) </li></ul></ul><ul><ul><li>Time period (t) </li></ul></ul><ul><ul><li>Volume of traffic (V(in,{out},t)) </li></ul></ul><ul><li>Computing the traffic demands </li></ul><ul><ul><li>Measure the traffic where it enters the ISP backbone </li></ul></ul><ul><ul><li>Identify the set of egress links where traffic could leave </li></ul></ul><ul><ul><li>Sum over all traffic with same in, {out}, and t </li></ul></ul>
  16. 16. Traffic Mapping: Ingress Measurement <ul><li>Packet measurement (e.g., Netflow, sampling) </li></ul><ul><ul><li>Ingress point i </li></ul></ul><ul><ul><li>Destination prefix d </li></ul></ul><ul><ul><li>Traffic volume V id </li></ul></ul>i d ingress destination
  17. 17. Traffic Mapping: Egress Point(s) <ul><li>Routing data (e.g., forwarding tables) </li></ul><ul><ul><li>Destination prefix d </li></ul></ul><ul><ul><li>Set of egress points e d </li></ul></ul>d destination
  18. 18. Traffic Mapping: Combining the Data <ul><li>Combining multiple types of data </li></ul><ul><ul><li>Traffic: V id (ingress i , destination prefix d ) </li></ul></ul><ul><ul><li>Routing: e d (set e d of egress links toward d ) </li></ul></ul><ul><ul><li>Combining: sum over V id with same e d </li></ul></ul>i ingress egress set
  19. 19. Application on the AT&T Backbone <ul><li>Measurement data </li></ul><ul><ul><li>Netflow data (ingress traffic) </li></ul></ul><ul><ul><li>Forwarding tables (sets of egress points) </li></ul></ul><ul><ul><li>Configuration files (topology and link weights) </li></ul></ul><ul><li>Effectiveness </li></ul><ul><ul><li>Ingress traffic could be matched with egress sets </li></ul></ul><ul><ul><li>Simulated flow of traffic consistent with link loads </li></ul></ul><ul><li>Challenges </li></ul><ul><ul><li>Loss of Netflow records during delivery (can correct for it!) </li></ul></ul><ul><ul><li>Egress set changes between table dumps (not very many) </li></ul></ul><ul><ul><li>Topology changes between configuration dumps (just one!) </li></ul></ul>
  20. 20. Traffic Analysis Results <ul><li>Small number of demands contribute most traffic </li></ul><ul><ul><li>Optimize routes for just the heavy hitters </li></ul></ul><ul><ul><li>Measure a small fraction of the traffic </li></ul></ul><ul><ul><li>Must watch out for changes in load and set of exit links </li></ul></ul><ul><li>Time-of-day fluctuations </li></ul><ul><ul><li>Reoptimize routes a few times a day (three?) </li></ul></ul><ul><li>Traffic (in)stability </li></ul><ul><ul><li>Select routes that are good for different demand sets </li></ul></ul><ul><ul><li>Reoptimize routes after sudden changes in load </li></ul></ul>
  21. 21. Three Traffic Demands in San Francisco
  22. 22. Underpinnings of the Optimization <ul><li>Route prediction engine (“what-if” tool) </li></ul><ul><ul><li>Model the influence of link weights on traffic flow </li></ul></ul><ul><ul><ul><li>Select a closest exit point based on link weights </li></ul></ul></ul><ul><ul><ul><li>Compute shortest path(s) based on link weights </li></ul></ul></ul><ul><ul><ul><li>Capture splitting over multiple shortest paths </li></ul></ul></ul><ul><ul><li>Sum the traffic volume traversing each link </li></ul></ul><ul><li>Objective function </li></ul><ul><ul><li>Rate the “goodness” of a setting of the link weights </li></ul></ul><ul><ul><li>E.g., “max link utilization” or “sum of exp(utilization)” </li></ul></ul>
  23. 23. Weight Optimization <ul><li>Local search </li></ul><ul><ul><li>Generate a candidate setting of the weights </li></ul></ul><ul><ul><li>Predict the resulting load on the network links </li></ul></ul><ul><ul><li>Compute the value of the objective function </li></ul></ul><ul><ul><li>Repeat, and select solution with lowest objective function </li></ul></ul><ul><li>Efficient computation </li></ul><ul><ul><li>Explore the “neighborhood” around good solutions </li></ul></ul><ul><ul><li>Exploit efficient incremental graph algorithms </li></ul></ul><ul><li>Performance results on AT&T’s network </li></ul><ul><ul><li>Much better than simple heuristics (link capacity, distance) </li></ul></ul><ul><ul><li>Quite competitive with multi-commodity flow solution </li></ul></ul>
  24. 24. Incorporating Operational Realities <ul><li>Minimize changes to the network </li></ul><ul><ul><li>Changing just one or two link weights is often enough </li></ul></ul><ul><li>Tolerate failure of network equipment </li></ul><ul><ul><li>Weights settings usually remain good after failure </li></ul></ul><ul><ul><li>… or can be fixed by changing one or two weights </li></ul></ul><ul><li>Limit the number of distinct weight values </li></ul><ul><ul><li>Small number of integer values is sufficient </li></ul></ul><ul><li>Limit dependence on accuracy of traffic demands </li></ul><ul><ul><li>Good weights remain good after introducing random noise </li></ul></ul><ul><li>Limit frequency of changes to the weights </li></ul><ul><ul><li>Joint optimization for day and night traffic matrices </li></ul></ul>
  25. 25. Ways to Populate the Domain-Wide Models <ul><li>Mapping : assumptions about routing </li></ul><ul><ul><li>Traffic data: packet/flow statistics at network edge </li></ul></ul><ul><ul><li>Routing data: egress point(s) per destination prefix </li></ul></ul><ul><li>Inference : assumptions about traffic and routing </li></ul><ul><ul><li>Traffic data: byte counts per link (over time) </li></ul></ul><ul><ul><li>Routing data: path(s) between each pair of nodes </li></ul></ul><ul><li>Direct observation : no assumptions </li></ul><ul><ul><li>Traffic data: packet samples at every link </li></ul></ul><ul><ul><li>Routing data: none </li></ul></ul>
  26. 26. Inference: Network Tomography 4Mbps 4Mbps 3Mbps 5Mbps Sources Destinations From link counts to the traffic matrix
  27. 27. Tomography: Formalizing the Problem <ul><li>Source-destination pairs </li></ul><ul><ul><li>p is a source-destination pair of nodes </li></ul></ul><ul><ul><li>x p is the (unknown) traffic volume for this pair </li></ul></ul><ul><li>Routing </li></ul><ul><ul><li>R lp = 1 if link l is on the path for src-dest pair p </li></ul></ul><ul><ul><li>Or, R lp is the proportion of p ’s traffic that traverses l </li></ul></ul><ul><li>Links in the network </li></ul><ul><ul><li>l is a unidirectional edge </li></ul></ul><ul><ul><li>y l is the observed traffic volume on this link </li></ul></ul><ul><li>Relationship: y = Rx (now work back to get x ) </li></ul>
  28. 28. Tomography: Single Observation is Insufficient <ul><li>Linear system is underdetermined </li></ul><ul><ul><li>Number of nodes n </li></ul></ul><ul><ul><li>Number of links e is around O(n) </li></ul></ul><ul><ul><li>Number of src-dest pairs c is O(n 2 ) </li></ul></ul><ul><ul><li>Dimension of solution sub-space at least c - e </li></ul></ul><ul><li>Multiple observations are needed </li></ul><ul><ul><li>k independent observations (over time) </li></ul></ul><ul><ul><li>Stochastic model with src-dest counts Poisson & i.i.d </li></ul></ul><ul><ul><li>Maximum likelihood estimation to infer traffic matrix </li></ul></ul><ul><ul><li>Vardi, “Network Tomography, ” JASA , March 1996 </li></ul></ul>
  29. 29. Tomography: Limitations <ul><li>Cannot handle packet loss or multicast traffic </li></ul><ul><ul><li>Assumes traffic flows from an entry to an egress </li></ul></ul><ul><li>Statistical assumptions don’t match IP traffic </li></ul><ul><ul><li>Poisson, Gaussian </li></ul></ul><ul><li>Significant error even with large # of samples </li></ul><ul><ul><li>Faulty assumptions and traffic changes over time </li></ul></ul><ul><li>High computation overhead for large networks </li></ul><ul><ul><li>Large matrix inversion problem </li></ul></ul>
  30. 30. Promising Extension: Gravity Models [Zhang, Roughan, Duffield, Greenberg] <ul><li>Gravitational assumption </li></ul><ul><ul><li>Ingress point a has traffic v i a </li></ul></ul><ul><ul><li>Egress point b has traffic v e b </li></ul></ul><ul><ul><li>Pair (a,b) has traffic proportional to v i a * v e b </li></ul></ul><ul><li>Incorporating hot-potato routing </li></ul><ul><ul><li>Combine traffic across egress points to the same peer </li></ul></ul><ul><ul><li>Gravity divides a ’s traffic proportional to peer loads </li></ul></ul><ul><ul><li>“ Hot potato” identifies single egress point for a ’s traffic </li></ul></ul><ul><li>Experimental results on AT&T network </li></ul><ul><ul><li>Reasonable accuracy, especially for large (a,b) pairs </li></ul></ul><ul><ul><li>Sufficient accuracy for traffic engineering applications </li></ul></ul>
  31. 31. Conclusions <ul><li>Our approach </li></ul><ul><ul><li>Measure: network-wide view of traffic and routing </li></ul></ul><ul><ul><li>Model: data representations and “what-if” tools </li></ul></ul><ul><ul><li>Control: intelligent changes to operational network </li></ul></ul><ul><li>Application in AT&T’s network </li></ul><ul><ul><li>Capacity planning </li></ul></ul><ul><ul><li>Customer acquisition </li></ul></ul><ul><ul><li>Preparing for maintenance activities </li></ul></ul><ul><ul><li>Comparing different routing protocols </li></ul></ul><ul><li>Ongoing work: inter domain traffic engineering </li></ul>
  32. 32. To Learn More… <ul><li>Overview papers </li></ul><ul><ul><li>“ Traffic engineering for IP networks” ( http://www.research.att.com/~jrex/papers/ieeenet00.ps ) </li></ul></ul><ul><ul><li>“ Traffic engineering with traditional IP routing protocols” ( http://www.research.att.com/~jrex/papers/ieeecomm02.ps ) </li></ul></ul><ul><li>Traffic measurement </li></ul><ul><ul><li>&quot;Measurement and analysis of IP network usage and behavior” ( http://www.research.att.com/~jrex/papers/ieeecomm00.ps ) </li></ul></ul><ul><ul><li>“ Deriving traffic demands for operational IP networks” ( http://www.research.att.com/~jrex/papers/ton01.ps ) </li></ul></ul><ul><li>Topology and configuration </li></ul><ul><ul><li>“ IP network configuration for intradomain traffic engineering” ( http://www.research.att.com/~jrex/papers/ieeenet01.ps ) </li></ul></ul><ul><li>Intradomain route optimization </li></ul><ul><ul><li>“ Internet traffic engineering by optimizing OSPF weights” ( http://www.ieee-infocom.org/2000/papers/165.ps ) </li></ul></ul><ul><ul><li>“ Optimizing OSPF/IS-IS weights in a changing world” ( http://www.research. att .com/~ mthorup /PAPERS/change_ ospf . ps ) </li></ul></ul>