17. network border patrol preventing congestion collapse and promoting fairness in the internet

  • 2,116 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
2,116
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
28
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 1 Network Border Patrol: Preventing Congestion Collapse and Promoting Fairness in the Internet C´ lio Albuquerquey, Brett J. Vickersz and Tatsuya Suday e y Dept. of Information and Computer Science z Dept. of Computer Science University of California, Irvine Rutgers University fcelio,sudag@ics.uci.edu bvickers@cs.rutgers.edu Abstract The end-to-end nature of Internet congestion control is an important factor in its scalability and robustness. However, end-to-endcongestion control algorithms alone are incapable of preventing the congestion collapse and unfair bandwidth allocations created byapplications that are unresponsive to network congestion. To address this flaw, we propose and investigate a novel congestion avoidancemechanism called Network Border Patrol (NBP). NBP relies on the exchange of feedback between routers at the borders of a network inorder to detect and restrict unresponsive traffic flows before they enter the network. An enhanced core-stateless fair queueing mechanismis proposed in order to provide fair bandwidth allocations among competing flows. NBP is compliant with the Internet philosophy ofpushing complexity toward the edges of the network whenever possible. Simulation results show that NBP effectively eliminatescongestion collapse that, when combined with fair queueing, NBP achieves approximately max-min fair bandwidth allocations forcompeting network flows. Keywords Internet, congestion control, congestion collapse, max-min fairness, end-to-end argument, core-stateless mechanisms, border con-trolT I. I NTRODUCTION HE essential philosophy behind the Internet is expressed by the scalability argument: no protocol, al- gorithm or service should be introduced into the Internet if it does not scale well. A key corollary to This research is supported by the National Science Foundation through grant NCR-9628109. It has also been supported by grants from the Universityof California MICRO program, Hitachi America, Hitachi, Standard Microsystem Corp., Canon Information Systems Inc., Nippon Telegraph andTelephone Corp. (NTT), Nippon Steel Information and Communica-tion Systems Inc. (ENICOM), Tokyo Electric Power Co., Fujitsu, Novell,Matsushita Electric Industrial Co. and Fundacao CAPES/Brazil. ¸˜
  • 2. 2the scalability argument is the end-to-end argument: to maintain scalability, algorithmic complexity should bepushed to the edges of the network whenever possible. Perhaps the best example of the Internet philosophy isTCP congestion control, which is achieved primarily through algorithms implemented at end systems. Unfor-tunately, TCP congestion control also illustrates some of the shortcomings of the end-to-end argument. As aresult of its strict adherence to end-to-end congestion control, the current Internet suffers from two maladies:congestion collapse from undelivered packets, and unfair allocations of bandwidth between competing trafficflows. The first malady—congestion collapse from undelivered packets—arises when bandwidth is continually con-sumed by packets that are dropped before reaching their ultimate destinations [1]. John Nagle coined the term“congestion collapse” in 1984 to describe a network that remains in a stable congested state [2]. At that time,the primary cause of congestion collapse was the poor calculation of retransmission timers by TCP sources,which led to the unnecessary retransmission of delayed packets. This problem was corrected with more recentimplementations of TCP [3]. Recently, however, a potentially more serious cause of congestion collapse hasbecome increasingly common. Network applications are now frequently written to use transport protocols, suchas UDP, which are oblivious to congestion and make no attempt to reduce packet transmission rates when packetsare discarded by the network [4]. In fact, during periods of congestion some applications actually increase theirtransmission rates, just so that they will be less sensitive to packet losses [5]. Unfortunately, the Internet currentlyhas no effective way to regulate such applications. The second malady—unfair bandwidth allocation—arises in the Internet for a variety of reasons, one of whichis the presence of application which do not adapt to congestion. Adaptive applications (e.g., TCP-based ap-plications) that respond to congestion by rapidly reducing their transmission rates are likely to receive unfairlysmall bandwidth allocations when competing with unresponsive or malicious applications. The Internet proto-cols themselves also introduce unfairness. The TCP algorithm, for instance, inherently causes each TCP flow toreceive a bandwidth that is inversely proportional to its round trip time [6]. Hence, TCP connections with shortround trip times may receive unfairly large allocations of network bandwidth when compared to connectionswith longer round trip times.
  • 3. 3 To address these maladies, we introduce and investigate a new Internet traffic control protocol called NetworkBorder Patrol. The basic principle of Network Border Patrol (NBP) is to compare, at the borders of a network,the rates at which packets from each application flow are entering and leaving the network. If a flow’s packets areentering the network faster than they are leaving it, then the network is likely buffering or, worse yet, discardingthe flow’s packets. In other words, the network is receiving more packets than it can handle. NBP prevents thisscenario by “patrolling” the network’s borders, ensuring that each flow’s packets do not enter the network at arate greater than they are able to leave it. This patrolling prevents congestion collapse from undelivered packets,because an unresponsive flow’s otherwise undeliverable packets never enter the network in the first place. As an option, Network Border Patrol can also realize approximately max-min fair bandwidth allocations forcompeting packet flows if it is used in conjunction with an appropriate fair queueing mechanism. Weightedfair queuing (WFQ) [7], [8] is an example of one such mechanism. Unfortunately, WFQ imposes significantcomplexity on routers by requiring them to maintain per-flow state and perform per-flow scheduling of packets.In this paper we introduce an enhanced core-stateless fair queuing mechanism, in order to achieve some of theadvantages of WFQ without all of its complexity. NBP’s prevention of congestion collapse comes at the expense of some additional network complexity, sincerouters at the border of the network are expected to monitor and control the rates of individual flows. NBPalso introduces an added communication overhead, since in order for an edge router to know the rate at whichits packets are leaving the network, it must exchange feedback with other edge routers. However, unlike otherproposed approaches to the problem of congestion collapse, NBP’s added complexity is isolated to edge routers;routers within the core do not participate in the prevention of congestion collapse. Moreover, end systems operatein total ignorance of the fact that NBP is implemented in the network, so no changes to transport protocols arenecessary. The remainder of this paper is organized as follows. Section II describes why existing mechanisms are noteffective in preventing congestion collapse nor providing fairness in the presence of unresponsive flows. Insection III, we describe the architectural components of Network Border Patrol in further detail and presentthe feedback and rate control algorithms used by NBP edge routers to prevent congestion collapse. Section IV
  • 4. 4explains the enhanced core-stateless fair queueing mechanism, and illustrates the advantages of providing lowerqueueing delays to flows transmiting at lower rates. In section V, we present the results of several simulations,which illustrate the ability of NBP to avoid congestion collapse and provide fairness to competing network flow.The scalability of NBP is further examined. In section VI, we discuss several implementation issues that must beaddressed in order to make deployment of NBP feasible in the Internet. Finally, in section VII we provide someconcluding remarks. II. R ELATED W ORK The maladies of congestion collapse from undelivered packets and of unfair bandwidth allocations have notgone unrecognized. Some have argued that there are social incentives for multimedia applications to be friendlyto the network, since an application would not want to be held responsible for throughput degradation in theInternet. However, malicious denial-of-service attacks using unresponsive UDP flows are becoming disturbinglyfrequent in the Internet and they are an example that the Internet cannot rely solely on social incentives to controlcongestion or to operate fairly. Some have argued that these maladies may be mitigated through the use of improved packet scheduling [12] orqueue management [13] mechanisms in network routers. For instance, per-flow packet scheduling mechanismslike Weighted Fair Queueing (WFQ) [7], [8] attempt to offer fair allocations of bandwidth to flows contending forthe same link. So do Core-Stateless Fair Queueing (CSFQ) [9], Rainbow Fair Queueing [10] and CHOKe [11],which are approximations of WFQ that do not require core routers to maintain per-flow state. Active queuemanagement mechanisms like Fair Random Early Detection (FRED) [14] also attempt to limit malicious orunresponsive flows by preferentially discarding packets from flows that are using more than their fair share ofa link’s bandwidth. All of these mechanisms are more complex and expensive to implement than simple FIFOqueueing, but they reduce the causes of unfairness and congestion collapse in the Internet. Nevertheless, they donot eradicate them. For illustration of this fact, consider the example shown in Figure 1. Two unresponsive flowscompete for bandwidth in a network containing two bottleneck links arbitrated by a fair queueing mechanism.At the first bottleneck link (R1 -R2 ), fair queueing ensures that each flow receives half of the link’s availablebandwidth (750 kbps). On the second bottleneck link (R2 -S4), much of the traffic from flow B is discarded
  • 5. 5 S1 Flow A S3 10 Mbps 10 Mbps 1.5 Mbps R1 R2 10 Mbps 128 kbps S2 Flow B S4 Fig. 1. Example of a network which experiences congestion collapsedue to the link’s limited capacity (128 kbps). Hence, flow A achieves a throughput of 750 kbps and flow Bachieves a throughput of 128 kbps. Clearly, congestion collapse has occurred, because flow B packets, which areultimately discarded on the second bottleneck link, unnecessarily limit the throughput of flow A across the firstbottleneck link. Furthermore, while both flows receive equal bandwidth allocations on the first bottleneck link,their allocations are not globally max-min fair. An allocation of bandwidth is said to be globally max-min fair if,at every link, all active flows not bottlenecked at another link are allocated a maximum, equal share of the link’sremaining bandwidth [15]. A globally max-min fair allocation of bandwidth would have been 1.372 Mbps forflow A and 128 kbps for flow B. This example, which is a variant of an example presented by Floyd and Fall [1], illustrates the inability of localscheduling mechanisms, such as WFQ, to eliminate congestion collapse and achieve global max-min fairnesswithout the assistance of additional network mechanisms. Jain et al. have proposed several rate control algorithms that are able to prevent congestion collapse and pro-vide global max-min fairness to competing flows [16]. These algorithms (e.g., ERICA, ERICA+) are designedfor the ATM Available Bit Rate (ABR) service and require all network switches to compute fair allocations ofbandwidth among competing connections. However, these algorithms are not easily tailorable to the current In-ternet, because they violate the Internet design philosophy of keeping router implementations simple and pushingcomplexity to the edges of the network. Rangarajan and Acharya proposed a network border-based approach, which aims to prevent congestion col-lapse through early regulation of unresponsive flows (ERUF) [17]. ERUF border routers rate control the inputtraffic, while core routers generate source quenches on packet drops to advise sources and border routers to re-duce their sending rates. While this approach may prevent congestion collapse, it does so after packets have beendropped and the network is congested. It also lacks mechanisms to provide fair bandwidth allocations to flows
  • 6. 6 1 0 1 0 11 00 11 00 11 00 1111 0000 11 00 11 00 11 11 00 Domain 1 00 Domain 2 11 00 1 01 0 1 0 1 01 0 1 0 1 0 1 011 00 1 0 1 0 1 011 00 11 00 111 000 1 01 0 11 00 111 000 1 0 1 0 1 0 1 0 11 00 11 00 11 00 1 0 11 00 00 11 1 0 11 00 1 0 1 0 11 00 End systems 11 00 router 11 00Edge Core router Fig. 2. The core-stateless Internet architecture assumed by NBPthat are responsive and unresponsive to congestion. Floyd and Fall have approached the problem of congestion collapse by proposing low-complexity router mech-anisms that promote the use of adaptive or “TCP-friendly” end-to-end congestion control [1]. Their suggestedapproach requires selected gateway routers to monitor high-bandwidth flows in order to determine whether theyare responsive to congestion. Flows determined to be unresponsive to congestion are penalized by a higher packetdiscarding rate at the gateway router. A limitation of this approach is that the procedures currently available toidentify unresponsive flows are not always successful [9]. III. N ETWORK B ORDER PATROL Network Border Patrol is a network layer congestion avoidance protocol that is aligned with the core-statelessapproach. The core-stateless approach, which has recently received a great deal of research attention [18], [9],allows routers on the borders (or edges) of a network to perform flow classification and maintain per-flow statebut does not allow routers at the core of the network to do so. Figure 2 illustrates this architecture. As in otherwork on core-stateless approaches, we draw a further distinction between two types of edge routers. Dependingon which flow it is operating on, an edge router may be viewed as an ingress or an egress router. An edge routeroperating on a flow passing into a network is called an ingress router, whereas an edge router operating on a flowpassing out of a network is called an egress router. Note that a flow may pass through more than one egress (oringress) router if the end-to-end path crosses multiple networks. NBP prevents congestion collapse through a combination of per-flow rate monitoring at egress routers and
  • 7. 7 Flow 1 Rate Monitor Arriving packets Flow Classifier To forwarding function and Flow n output ports Rate Monitor Rate 1 Rate n Forward Backward Feedback Feedback Flow Flow Feedback Controller Fig. 3. An input port of an NBP egress routerper-flow rate control at ingress routers. Rate monitoring allows an egress router to determine how rapidly eachflow’s packets are leaving the network, whereas rate control allows an ingress router to police the rate at whicheach flow’s packets enter the network. Linking these two functions together are the feedback packets exchangedbetween ingress and egress routers; ingress routers send egress routers forward feedback packets to inform themabout the flows that are being rate controlled, and egress routers send ingress routers backward feedback packetsto inform them about the rates at which each flow’s packets are leaving the network. This section describes three important aspects of the NBP mechanism: (1) the architectural components,namely the modified edge routers, which must be present in the network, (2) the feedback control algorithm,which determines how and when information is exchanged between edge routers, and (3) the rate control algo-rithm, which uses the information carried in feedback packets to regulate flow transmission rates and therebyprevent congestion collapse in the network.A. Architectural Components The only components of the network that require modification by NBP are edge routers; the input ports ofegress routers must be modified to perform per-flow monitoring of bit rates, and the output ports of ingressrouters must be modified to perform per-flow rate control. In addition, both the ingress and the egress routersmust be modified to exchange and handle feedback. Figure 3 illustrates the architecture of an egress router’s input port. Data packets sent by ingress routersarrive at the input port of the egress router and are first classified by flow. In the case of IPv6, this is doneby examining the packet header’s flow label, whereas in the case of IPv4, it is done by examining the packet’s
  • 8. 8 Flow 1 Traffic Shaper Outgoing packets Flow Classifier To output buffer and Flow n Traffic network Shaper Rate 1 Rate n Backward Forward Feedback Rate Controller Feedback Flow Flow Feedback Controller Fig. 4. An output port of an NBP ingress routersource and destination addresses and port numbers. Each flow’s bit rate is then rate monitored using a rateestimation algorithm such as the Time Sliding Window (TSW) [19]. These rates are collected by a feedbackcontroller, which returns them in backward feedback packets to an ingress router whenever a forward feedbackpacket arrives from that ingress router. The output ports of ingress routers are also enhanced. Each contains a flow classifier, per-flow traffic shapers(e.g., leaky buckets), a feedback controller, and a rate controller. See Figure 4. The flow classifier classifiespackets into flows, and the traffic shapers limit the rates at which packets from individual flows enter the net-work. The feedback controller receives backward feedback packets returning from egress routers and passestheir contents to the rate controller. It also generates forward feedback packets, which it occasionally transmitsto the network’s egress routers. The rate controller adjusts traffic shaper parameters according to a TCP-like ratecontrol algorithm, which is described later in this section.B. The Feedback Control Algorithm The feedback control algorithm determines how and when feedback packets are exchanged between edgerouters. Feedback packets take the form of ICMP packets and are necessary in NBP for three reasons. First,they allow egress routers to discover which ingress routers are acting as sources for each of the flows they aremonitoring. Second, they allow egress routers to communicate per-flow bit rates to ingress routers. Third, theyallow ingress routers to detect incipient network congestion by monitoring edge-to-edge round trip times. The contents of feedback packets are shown in Figure 5. Contained within the forward feedback packet is a
  • 9. 9 Forward Feedback (FF) Packet IP/ICMP Flow Flow Timestamp ... Headers Spec 1 Spec n FF Ingress Egress Router Router BF IP/ICMP Hop Flow Egress Flow Egress Timestamp ... Headers Count Spec 1 Rate 1 Spec n Rate n Backward Feedback (BF) Packet Fig. 5. Forward and backward feedback packets exchanged by edge routerstime stamp and a list of flow specifications for flows originating at the ingress router. The time stamp is used tocalculate the round trip time between two edge routers, and the list of flow specifications indicates to an egressrouter the identities of active flows originating at the ingress router. A flow specification is a value uniquelyidentifying a flow. In IPv6 it is the flow’s flow label; in IPv4, it is the combination of source address, destinationaddress, source port number, and destination port number. An edge router adds a flow to its list of active flowswhenever a packet from a new flow arrives; it removes a flow when the flow becomes inactive. In the eventthat the network’s maximum transmission unit size is not sufficient to hold an entire list of flow specifications,multiple forward feedback packets are used. When an egress router receives a forward feedback packet, it immediately generates a backward feedbackpacket and returns it to the ingress router. Contained within the backward feedback packet are the forwardfeedback packet’s original time stamp, a router hop count, and a list of observed bit rates, called egress rates,collected by the egress router for each flow listed in the forward feedback packet. The router hop count, whichis used by the ingress router’s rate control algorithm, indicates how many routers are in the path between theingress and the egress router. The egress router determines the hop count by examining the time to live (TTL)field of arriving forward feedback packets. When the backward feedback packet arrives at the ingress router, itscontents are passed to the ingress router’s rate controller, which uses them to adjust the parameters of each flow’straffic shaper.
  • 10. 10 on arrival of Backward Feedback packet p from egress router e currentRTT = currentTime - p.timestamp; if (currentRTT < e.baseRTT) e.baseRTT = currentRTT; deltaRTT = currentRTT - e.baseRTT; RTTsElapsed = (currentTime - e.lastFeedbackTime) / currentRTT; e.lastFeedbackTime = currentTime; for each flow f listed in p rateQuantum = min (MSS / currentRTT, f.egressRate / QF); if (f.phase == SLOW_START) if (deltaRTT × f.ingressRate < MSS × e.hopcount) f.ingressRate = f.ingressRate × 2 ^ RTTsElapsed; else f.phase = CONGESTION_AVOIDANCE; if (f.phase == CONGESTION_AVOIDANCE) if (deltaRTT × f.ingressRate < MSS × e.hopcount) f.ingressRate = f.ingressRate + rateQuantum × RTTsElapsed; else f.ingressRate = f.egressRate - rateQuantum; Fig. 6. Pseudocode for ingress router rate control algorithm In order to determine how often to generate forward feedback packets, an ingress router keeps a byte trans-mission counter for each flow it processes. Whenever a flow’s byte counter exceeds a threshold, denoted Tx, theingress router generates and transmits a forward feedback packet to the flow’s egress router. The forward feed-back packet includes a list of flow specifications for all flows going to the same egress router, and the countersfor all flows described in the feedback packet are reset. Using a byte counter for each flow ensures that feed-back packets are generated more frequently when flows transmit at high rates, thereby allowing ingress routersto respond more quickly to impending congestion collapse. To maintain a frequent flow of feedback betweenedge routers even when data transmission rates are low, ingress routers also generate forward feedback packetswhenever a time-out interval, denoted f, is exceeded.C. The Rate Control Algorithm The NBP rate control algorithm regulates the rate at which each flow enters the network. Its primary goalis to converge on a set of per-flow transmission rates (hereinafter called ingress rates) that prevents congestioncollapse from undelivered packets. It also attempts to lead the network to a state of maximum link utilization andlow router buffer occupancies, and it does this in a manner that is similar to TCP. In the NBP rate control algorithm, shown in Figure 6, a flow may be in one of two phases, slow start or conges-tion avoidance, which are similar to the phases of TCP congestion control. New flows enter the network in the
  • 11. 11slow start phase and proceed to the congestion avoidance phase only after the flow has experienced congestion.The rate control algorithm is invoked whenever a backward feedback packet arrives at an ingress router. Recallthat BF packets contain a list of flows arriving at the egress router from the ingress router as well as the monitoredegress rates for each flow. Upon the arrival of a backward feedback packet, the algorithm calculates the current round trip time betweenthe edge routers and updates the base round trip time, if necessary. The base round trip time reflects the bestobserved round trip time between the two edge routers. The algorithm then calculates deltaRTT, which is thedifference between the current round trip time (currentRTT) and the base round trip time (e.baseRTT). A deltaRTTvalue greater than zero indicates that packets are requiring a longer time to traverse the network than they oncedid, and this can only be due to the buffering of packets within the network. NBP’s rate control algorithm decides that a flow is experiencing congestion whenever it estimates that thenetwork has buffered the equivalent of more than one of the flow’s packets at each router hop. To do this, thealgorithm first computes the product of the flow’s ingress rate and deltaRTT. This value provides an estimateof the amount of the flow’s data that is buffered somewhere in the network. If the amount is greater than thenumber of router hops between the ingress and the egress router multiplied by the size of the largest possiblepacket, then the flow is considered to be experiencing congestion. The rationale for determining congestion inthis manner is to maintain both high link utilization and low queueing delay. Ensuring there is always at leastone packet buffered for transmission on a network link is the simplest way to achieve full utilization of the link,and deciding that congestion exists when more than one packet is buffered at the link keeps queueing delays low.A similar approach is used in the DECbit congestion avoidance mechanism [20]. When the rate control algorithm determines that a flow is not experiencing congestion, it increases the flow’singress rate. If the flow is in the slow start phase, its ingress rate is doubled for each round trip time that haselapsed since the last backward feedback packet arrived. The estimated number of round trip times since the lastfeedback packet arrived is denoted as RTTsElapsed. Doubling the ingress rate during slow start allows a newflow to rapidly capture available bandwidth when the network is underutilized. If, on the other hand, the flow isin the congestion avoidance phase, then its ingress rate is conservatively incremented by one rateQuantum value
  • 12. 12for each round trip that has elapsed since the last backward feedback packet arrived. This is done to avoid thecreation of congestion. The rate quantum is computed as the maximum segment size divided by the current roundtrip time between the edge routers. This results in rate growth behavior that is similar to TCP in its congestionavoidance phase. Furthermore, the rate quantum is not allowed to exceed the flow’s current egress rate dividedby a constant quantum factor (QF). This guarantees that rate increments are not excessively large when the roundtrip time is small. When the rate control algorithm determines that a flow is experiencing congestion, it reduces the flow’s ingressrate. If a flow is in the slow start phase, it enters the congestion avoidance phase. If a flow is already in thecongestion avoidance phase, its ingress rate is reduced to the flow’s egress rate decremented by MRC. In otherwords, an observation of congestion forces the ingress router to send the flow’s packets into the network at a rateslightly lower than the rate at which they are leaving the network. IV. A DDING FAIRNESS TO N ETWORK B ORDER PATROL Although Network Border Patrol prevents congestion collapse, it does not guarantee that all flows are treatedfairly when they compete for bottleneck links. To address this concern, we consider the interoperation of NetworkBorder Patrol and various fair queueing mechanisms. Fair bandwidth allocations can be achieved by using per-flow packet scheduling mechanisms such as FairQueuing (FQ) [7], [8]. Fair Queuing attempts to emulate the behavior of a fluid flow system; each packet streamis treated as a fluid flow entering a pipe, and all flows receive an equal proportion of the pipe’s capacity. FairQueuing is effective; it fairly allocates bandwidth to packet flows competing for a single link. However, inorder to provide this benefit, Fair Queuing requires each link to maintain queues and state for each flow. Thiscomplexity overhead impedes the scalability of Fair Queuing, making it impractical for wide area networks inwhich thousands of flows may be active at any one time. Recognizing the scalability difficulties of Fair Queuing, several researchers have proposed more scalable“core-stateless” approximations of Fair Queuing, such as Core-Stateless Fair Queuing [9], Rainbow Fair Queu-ing [10] and CHOke [11]. The general idea behind these mechanisms is that edge routers label packets enteringthe network with the state of their associated flows, and core routers use the state recorded in the packets to de-
  • 13. 13cide whether to drop them or when to schedule them for transmission. This makes the core-stateless mechanismsmore scalable than Fair Queuing, since they limit per-flow operations and state maintenance to routers on theedges of a network. Although these core-stateless fair queuing mechanisms work well with most congestion control algorithms thatrely on packet losses to indicate congestion, they do not work as well with congestion avoidance algorithms thatprevent congestion before packet loss occurs. Examples of such congestion avoidance algorithms include TCPVegas [22], [23], TCP with Explicit Congestion Notification [24] and Network Border Patrol. Two simulationexperiments illustrate this phenomenon. In both experiments, two TCP flows and a 1 Mbps constant bit rateUDP flow share a single 1.5 Mbps bottleneck link. In the first experiment, the TCP sources use the TCP Renoimplementation, which relies on observations of packet loss to indicate congestion. As Figure 7(a) shows, CSFQworks well with this kind of TCP source, providing approximately fair allocations of bottleneck link bandwidthto all three flows. In the second experiment, the TCP Reno sources are replaced by TCP Vegas sources, whichrely on round trip time measurements to predict incipient congestion and keep buffer occupancies small. Here,as Figure 7(b) shows, CSFQ fails to provide fair allocations of bandwidth to the TCP flows. CSFQ fails for congestion avoidance algorithms that prevent packet loss, because it does not accurately ap-proximate the delay characteristics of Fair Queuing. In Fair Queuing, flows transmitting at rates less than orequal to their fair share are guaranteed timely delivery of their packets since they do not share the same bufferas packets from other flows. In the core-stateless approximations of Fair Queuing, this is not the case, since theyaggregate packets from all flows into a single buffer and rely on packet discarding to balance the service of eachflow. Hence, the existing core-stateless mechanisms are incompatible with congestion avoidance mechanismsthat keep buffer occupancies small or rely on round trip time measurements to indicate incipient congestion. We propose a slightly modified version of CSFQ, hereinafter referred to as Enhanced CSFQ (ECSFQ). ECSFQnot only achieves the scalability of CSFQ, but at the same time it is designed to work with preventive congestionavoidance mechanisms like TCP Vegas and Network Border Patrol. The basic idea of ECSFQ is to introduce anadditional high priority buffer at each core router. The high priority buffer is used to hold packets from flowstransmitting at rates less than their fair share, while the original buffer holds all other packets. Packets in the
  • 14. 14 1 UDP TCP-1 TCP-2 0.8 Bandwidth (Mbps) 0.6 0.4 0.2 0 0 20 40 60 80 100 Time (sec) (a) CSFQ achieves approximately fair bandwidth allocations when TCP Reno sources are used. 1 UDP TCP-1 TCP-2 0.8 Throughput (Mbps) 0.6 0.4 0.2 0 0 20 40 60 80 100 Time (sec) (b) CSFQ fails to achieve fair bandwidth allocations when TCP Vegas sources are used. Fig. 7. CSFQ does not achieve fair bandwidth allocations when used with some congestion avoidance mechanismshigh priority buffer are served first and therefore experience short delays. Once a flow’s rate meets or exceeds itsfair share, the flow’s packets enter the low priority buffer and its packets experience the same delays as packetfrom other existing flows transmitting at or above the fair share. Apart from the addition of a high priority buffer,ECSFQ behaves identically to the original CSFQ algorithm. The results in Figure 8 were obtained by repeating the previous experiment with ECSFQ and TCP Vegas. Dueto the use of high priority buffers, TCP Vegas packets experience lower queueing delays than the UDP packets.Because of this, all three flows achieve approximately fair shares of the bottleneck link bandwidth. One potential drawback of ECSFQ is that it makes possible the reordering of a flow’s packets. However,we believe packet reordering will be rare, since it happens only when a flow’s packets are queued in the highpriority buffer after previously being queued in the low priority buffer. This can occur in two cases: (1) when
  • 15. 15 1 UDP TCP-1 TCP-2 0.8 Bandwidth (Mbps) 0.6 0.4 0.2 0 0 20 40 60 80 100 Time (sec) Fig. 8. Enhanced CSFQ restores fairness when used with TCP Vegas Simulation parameter Value Packet size 1000 bytes Router queue size 100 packets Maximum segment size (MSS) 1500 bytes TCP implementation Reno [26] TCP window size 100 kbytes NBP MRC factor (MF) 10 NBP Tx 40000 bytes NBP f 100 msec TSW window size 10 msec End-system-to-edge propagation delay 100 sec End-system-to-edge link bandwidth 10 Mbps Table 1. Default simulation parametersa flow originally transmits at or above its fair share allocation but later decreases its transmission rate belowthe fair share, or (2) when bandwidth becomes available and the flow’s fair share suddenly increases. Packetreordering in the first case is possible but unlikely because by reducing its rate, the flow is reducing the load onthe bottleneck, thereby allowing the packets in the low priority buffer to be processed faster, resulting in a lowprobability of packets from this flow to be found in the low priority buffer. Packet reordering in the second case isalso possible but unlikely, since the low priority buffer empties rapidly when new bandwidth becomes available. V. S IMULATION E XPERIMENTS In this section, we present the results of several simulation experiments, each of which is designed to testa different aspect of Network Border Patrol. The first set of experiments examines the ability of NBP to pre-vent congestion collapse, while the second set of experiments examines the ability of ECSFQ to provide fairbandwidth allocations to competing network flows. Together, NBP and ECSFQ prevent congestion collapse and
  • 16. 16 TCP Flow S1 I1 E1 S3 10 Mbps 10 Mbps 2 ms 10 ms 1.5 Mbps R1 R2 3 ms 3 ms 5 ms 10 Mbps 128 kbps S2 I2 E2 S4 Unresponsive UDP Flow I = Ingress Router E = Egress Router R = Core Router S = End System Fig. 9. A network with a single shared linkprovide fair allocations of bandwidth to competing network flows. The third set of experiments assesses thescalability constraints of NBP. All simulations were run for 100 seconds using the UC Berkeley/LBNL/VINTns-2 simulator [27]. The ns-2 code implementing NBP and the scripts to run these simulations are available atthe UCI Network Research Group web site [28]. Default simulation parameters are shown in Table 1. They areset to values commonly used in the Internet and are used in all simulation experiments unless otherwise noted.A. Preventing Congestion Collapse The first set of simulation experiments explores NBP’s ability to prevent congestion collapse from undeliveredpackets. In the first experiment, we study the scenario depicted in Figure 9. One flow is a TCP flow generated byan application which always has data to send, and the other flow is an unresponsive constant bit rate UDP flow.Both flows compete for access to a shared 1.5 Mbps bottleneck link (R1-R2 ), and only the UDP flow traverses asecond bottleneck link (R2-E2 ), which has a limited capacity of 128 kbps. Figure 10 shows the throughput achieved by the two flows as the UDP source’s transmission rate is increasedfrom 32 kbps to 2 Mbps. The combined throughput delivered by the network (i.e., the sum of both flow through-puts) is also shown. Four different cases are examined under this scenario. The first is the benchmark caseused for comparison: NBP is not used between edge routers, and all routers schedule the delivery of packets ona FIFO basis. As Figure 10(a) shows, the network experiences severe congestion collapse as the UDP flow’stransmission rate increases, because the UDP flow fails to respond adaptively to the discarding of its packets onthe second bottleneck link. When the UDP load increases to 1.5 Mbps, the TCP flow’s throughput drops nearlyto zero. In the second case we show that fair queueing mechanisms alone cannot prevent congestion collapse. As
  • 17. 17 1.6 Combined 1.4 TCP UDP Throughput (Mbps) 1.2 1 0.8 0.6 0.4 0.2 0 0 500 1000 1500 2000 UDP input traffic load (Kbps) (a) Severe congestion collapse using FIFO only 1.6 Combined 1.4 TCP Throughput (Mbps) UDP 1.2 1 0.8 0.6 0.4 0.2 0 0 500 1000 1500 2000 UDP input traffic load (Kbps) (b) Moderate congestion collapse using ECSFQ only 1.6 1.4 Throughput (Mbps) 1.2 Combined TCP 1 UDP 0.8 0.6 0.4 0.2 0 0 500 1000 1500 2000 UDP input traffic load (Kbps) (c) No congestion collapse using NBP with FIFOFig. 10. Congestion collapse observed as unresponsive traffic load increases. The solid line shows the combined throughput delivered by the network.shown in Figure 10(b), better throughput is achieved for the TCP flow when compared to the FIFO only case.However, as indicated by the combined throughput of both flows, congestion collapse still occurs as the UDPload increases. Although ECSFQ allocates about 750 kbps to both flows at the first bottleneck link, only 128kbps of this bandwidth is successfully exploited by the UDP flow, which is even more seriously bottlenecked bya second link. The remaining 622 kbps is wasted on undelivered packets. In the third case, as Figure 10(c) shows, NBP effectively eliminates congestion collapse; the TCP flow achievesa nearly optimal throughput of 1.37 Mbps, and the combined throughput remains very close to 1.5 Mbps.
  • 18. 18 1.6 1.4 Combined Throughput (Mbps) 1.2 TCP 1 UDP 0.8 0.6 0.4 0.2 0 0 500 1000 1500 2000 UDP input traffic load (Kbps) (a) Severe unfairness using FIFO only 1.6 1.4 Throughput (Mbps) Combined 1.2 TCP 1 UDP 0.8 0.6 0.4 0.2 0 0 500 1000 1500 2000 UDP input traffic load (Kbps) (b) Moderate unfairness using NBP with FIFO 1.6 1.4 Combined Throughput (Mbps) 1.2 TCP 1 UDP 0.8 0.6 0.4 0.2 0 0 500 1000 1500 2000 UDP input traffic load (Kbps) (c) Approximate fairness using ECSFQ Fig. 11. Unfairness as the unresponsive traffic load increasesB. Achieving Fairness Besides targeting to prevent congestion collapse from occurring, this paper aims to improve the fairness ofbandwidth allocations to competing network flows. In the first fairness experiment, we consider the scenario depicted in Figure 9 but replace the second bottlenecklink (R2-E2 ) with a higher capacity 10 Mbps link. The TCP flow is generated by an application which always hasdata to send, and the UDP flow is generated by an unresponsive source which transmits packets at a constant bitrate. Since there is only one 1.5 Mbps bottleneck link (R1 -R2) in this scenario, the max-min fair allocation ofbandwidth between the flows is 750 kbps (if the UDP source exceeds a transmission rate of 750 kbps). However,as Figure 11(a) shows, fairness is clearly not achieved when only FIFO scheduling is used in routers. As the
  • 19. 19 D E E F H H AAA CCC GGGGG GGBBB E E E E E E 20ms 10ms 5ms 5ms 5ms 10ms R R R R R R R 50 100 50 150 150 50 Mbps Mbps Mbps Mbps Mbps Mbps I I I I I I A B D E E AB A F B H H C C C GGGGGGG Fig. 12. The GFC-2 network Simulation results Ideal global Throughput Throughput Throughput Throughput Flow max-min using using NBP using NBP using NBP Group fair share WFQ only with FIFO with WFQ with ECSFQ A 10 8.32 10.96 10.00 10.40 B 5 5.04 1.84 5.04 4.48 C 35 27.12 31.28 34.23 31.52 D 35 16.64 33.84 34.95 32.88 E 35 16.64 37.76 34.87 33.36 F 10 8.32 7.60 10.08 8.08 G 5 4.96 1.04 4.96 5.28 H 52.5 36.15 46.87 50.47 47.76 Table 2. Per-flow throughput in the GFC-2 network (in Mbps)unresponsive UDP traffic load increases, the TCP flow experiences congestion and reduces its transmission rate,thereby granting an unfairly large amount of bandwidth to the unresponsive UDP flow. Thus, although there isno congestion collapse from undelivered packets, there is clearly unfairness. When NBP is deployed with FIFOscheduling, Figure 11(b) shows that the unfair allocation of bandwidth is only lightly reduced. Figure 11(c)shows the throughput of each flow when ECSFQ is introduced. Notice that ECSFQ is able to approximate fairbandwidth allocations. In the second fairness experiment, we study whether NBP combined with ECSFQ can provide max-min fair-ness in a complex network. We consider the network model shown in Figure 12. This model is adapted from thesecond General Fairness Configuration (GFC-2), which is specifically designed to test the max-min fairness oftraffic control algorithms [29]. It consists of 22 unresponsive UDP flows, each generated by a source transmittingat a constant bit rate of 100 Mbps. Flows belong to flow groups which are labeled from A to H, and the networkis designed in such a way that members of each flow group receive the same max-min bandwidth allocations.Links connecting core routers serve as bottlenecks for at least one of the 22 flows, and all links have propagationdelays of 5 msec and bandwidths of 150 Mbps unless otherwise shown in the figure.
  • 20. 20 The first column of Table 2 lists the global max-min fair share allocations for all flows shown in Figure 12.These values represent the ideal bandwidth allocations for any traffic control mechanism that attempts to provideglobal max-min fairness. The remaining columns list the equilibrium-state throughputs actually observed after4.5 seconds of simulation for several scenarios. (Only the results for a single member of each flow group areshown.) In the first scenario, NBP is not used and all routers perform WFQ. As indicated by comparing thevalues in the first and second columns, WFQ by itself is not able to achieve global max-min fairness for all flows.This is due to the fact that WFQ does not prevent congestion collapse. In the second scenario, NBP is introducedat edge routers and FIFO scheduling is assumed at all routers. Results listed in the third column show that NBPwith FIFO also fails to achieve global max-min fairness in the GFC-2 network, largely because NBP has nomechanism to explicitly enforce fairness. In the third and fourth simulation scenarios, NBP is combined withWFQ and ECSFQ, respectively, and in both cases bandwidth allocations that are approximately max-min fair forall flows are achieved. NBP with WFQ achieves slightly better fairness than NBP with ECSFQ, since ECSFQ is only an approxi-mation of WFQ, and its performance depends on the accuracy of its estimation of a flow’s input rate and fairshare. Figures 13(a) and 13(b) show how rapidly the throughput of each flow converges to its max-min fair bandwidthallocation for the NBP with WFQ and the NBP with ECSFQ cases, respectively. Even in a complex network likethe one simulated here, all flows converge to an approximately max-min fair bandwidth allocation within onesecond.C. Scalability Scalability is perhaps the most important performance measure of any traffic control mechanism. As we saw inthe previous section, Network Border Patrol is a core-stateless traffic control mechanism that effectively preventscongestion collapse and provides approximate max-min fairness. However, NBP’s scalability is highly dependentupon per-flow management performed by edge routers. In a large scale network, the overheads of maintainingper-flow state, communicating per-flow feedback, and performing per-flow rate control and rate monitoring maybecome inordinately expensive. The number of border routers, the number of flows, and the load of the traffic
  • 21. 21 70 A (ideal=10) B (5) 60 C (35) D (35) E (35) 50 F (10) Throughput (Mbps) G (5) 40 H (52.5) 30 20 10 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (sec) (a) Using NBP with WFQ 70 A (ideal=10) B (5) 60 C (35) D (35) E (35) 50 F (10) Throughput (Mbps) G (5) 40 H (52.5) 30 20 10 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (sec) (b) Using NBP with ECSFQ Fig. 13. Per-flow throughput in the GFC-2 networkare the determinant factors in NBP’s scalability. The amount of feedback packets an ingress router generatesdepends on the number of egress routers it communicates with and the load of the traffic. The length of thefeedback packets as well as the processing and the amount of state borders routers need to maintain is mainlydetermined by the number of flows. In this section we assess NBP’s scalability through the network shown in Figure 14. This is a very flexibletopology, in which we can vary the number of border routers connected to each of the core routers, as well asthe number of end systems connected to each border router1. The network model shown in Figure 14 consists offour core routers, 4 B border routers and 4 B S end systems, where B is the number of border routersper core router and S is the number of end systems per border router. Propagation delays between core routersare set to 5 msec, between border and core routers 1 msec and between end systems and core routers 0.1 msec.Connections are setup in such a way that data packets travel on both forward and backward directions on the 1 Since NBP is a core-stateless mechanism, the configuration of core routers doesn’t seem to be very relevant for the study of NBP’s scalability. Atandem configuration is used in order to enable scenarios with multiple congested links.
  • 22. 22 UDP sinks 2 UDP sinks 1 E E E E E I TCP sinks 2 10 Mbps 5 Mbps 5 Mbps TCP sources 2 10 Mbps 10 Mbps 10 Mbps 10 Mbps E I C C C C I E 10 Mbps 10 Mbps 10 Mbps 10 Mbps TCP sources 1 TCP sinks 1 I I I I I E UDP sources 1 UDP sources 2 Fig. 14. Simulation model for evaluating scalabilitylinks connecting core routers. TCP flows traverse all core routers while UDP flows traverse only the interiorcore routers. The link speeds between core and egress routers traversed by UDP flows are set to 5 Mbps. Allremaining link speeds are set to 10 Mbps. Therefore, UDP flows are bottlenecked downstream at 5 Mbps. TCPflows traverse multiple congested links, and compete for bandwidth with UDP flows and also among themselves.UDP flows are unresponsive and transmit at a constant rate of 5 Mbps. In the first experiment we consider a moderately large network with 10 border routers, and we varied thenumber of end systems from 8 to 48, by varying the number of end systems per border router from 1 to 6. The amount of feedback information generated by NBP is shown in Figure 15. Interestingly the amount offeedback generated by NBP is not very dependent on the number of end systems. This is because feedbackpackets are generated according to the amount of traffic admitted into the network. Since in nearly all scenariostested, the capacity of the network was fully utilized, the amount of feedback information generated by NBPdoes not increase with the number of end systems. The fraction of the bandwidth of links L2 and L3 consumedby feedback packets varied from 1.04% to 1.59% in this experiment. One of the advantages of NBP is to reduce the losses within the network. Figure 16 shows the amount ofdata packets lost in the network with and without NBP. As expected, in both scenarios, the total number ofpacket losses increase linearly with the number of end systems, according to the load of unresponsive traffic. Bymonitoring the amount of packets lost within the network, one can infer the degree of congestion collapse in thenetwork. The amount of packets lost inside the network provide a good measure of congestion collapse. Without
  • 23. 23 50000 500000 45000 Total Feedback Packets 450000 Total Feedback Information Dropped Feedback Packets 40000 400000 Number of packets Number of bytes 35000 350000 30000 300000 25000 250000 20000 200000 15000 150000 10000 100000 5000 50000 0 0 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Number of end systems Number of end systems Fig. 15. Amount of feedback packets as the number of end systems increase.NBP (a), it can be seen that once the load of the input traffic becomes larger than the capacity of the link betweenborder routers and core routers, than data packets are discarded by the border routers, and the amount of datadiscarded in core routers remains constant, clearly demonstrating congestion collapse. With NBP (b), nearly zerolosses are observed in the core of the network, showing that even in a large scale complex network, congestioncollapse can be effectively prevented. 70000 70000 Number of dropped data packets Number of dropped data packets Total Data Dropped - Without NBP Total Data Dropped - With NBP 60000 Data Dropped in Core - Without NBP 60000 Data Dropped in Core - With NBP 50000 50000 40000 40000 30000 30000 20000 20000 10000 10000 0 0 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Number of end systems Number of end systems Fig. 16. Number of dropped packets as the number of end systems increase. TCP flows traverse the entire network compete for bandwidth with the unresponsive flows. Figure 17(a) showsthat without NBP, the combined throughput of TCP flows drops to nearly zero as the load of the unresponsivetraffic increases. However, as seen in Figure 16, since NBP was able to prevent congestion collapse, the perfor-mance of TCP can be greatly improved. Figure 17 (b) shows that TCP’s throughput remains high as the numberof end systems increase. In the second experiment we vary the size of the network by varying the number of border routers from 10to 48 routers. We consider that only one end system is connected to each border router, so that links betweeningress and core routers are never congested. Only links connecting core routers and links connecting core toegress routers become congested in this experiment. Figure 18 shows the amount of data packets lost in the network with and without NBP. Like in the previous
  • 24. 24 10 TCP Throughput - Without NBP TCP Throughput - With NBP 8 Throughput (Mbps) 6 4 2 0 5 10 15 20 25 30 35 40 45 50 Number of end systems Fig. 17. Combined throughput of TCP flows as the number of end systems increase.experiment, in both scenarios, the total number of packet losses increase linearly with the number of end systems,according to the load of unresponsive traffic. However, notice that with NBP a slightly higher amount of packetsis discarded when compared to the scenario without NBP. Notice also that without NBP, all packet losses occurin the core of the network, but when NBP is used, nearly no losses are observed in the core. In this scenario,ingress routers proved to be somewhat conservative and discarded a slightly large amount of packets. 60000 60000 Number of dropped data packets Number of dropped data packets Total Data Dropped - Without NBP Total Data Dropped - With NBP 50000 Data Dropped in Core - Without NBP 50000 Data Dropped in Core - With NBP 40000 40000 30000 30000 20000 20000 10000 10000 0 0 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Number of borders Number of borders Fig. 18. Number of dropped packets as the number of borders increase. The amount of feedback information generated by NBP is shown in Figure 19 and it slightly increases as thenumber of borders increase. The fraction of the bandwidth of links L2 and L3 consumed by feedback packetsvaried from 0.92% to 1.41% in this experiment. 60000 600000 Total Feedback Packets Total Feedback Information 50000 Dropped Feedback Packets 500000 Number of packets Number of bytes 40000 400000 30000 300000 20000 200000 10000 100000 0 0 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Number of borders Number of borders Fig. 19. Amount of feedback information as the number of borders increase. The throughput achieved by TCP flows in Figure 20. Although results are not optimal, TCP’s throughput
  • 25. 25with NBP is still much larger than in the scenario without NBP. It can be seen however a large drop in TCP’sperformance as the number of border routers increase. Several reason could have contributed for this result. Firstnotice that in this experiment each TCP source is connected to a different ingress router, while in the previousexperiment all TCP sources share the same ingress router. So in order to respond to congestion, in the previousexperiment, a single feedback packet per ingress router is able to rate control all TCP and UDP sources, while inthis experiment one feedback packet per flow at each ingress router is necessary to rate control each flow. Secondfeedback packets contain congestion information regarding all flows sharing the same pair of ingress and egressrouters. Therefore the information carried by feedback packets in the first experiment is relevant to all flowswhile the information carried by feedback packets in the second experiment is exclusive to one single flow. 10 TCP Throughput - Without NBP TCP Throughput - With NBP 8 Throughput (Mbps) 6 4 2 0 5 10 15 20 25 30 35 40 45 50 Number of borders Fig. 20. Combined throughput of TCP flows as the number of borders increase. This last result sugests that flow aggregation may be benefitial to the throughput of small microflows. Coarse-grained flow aggregation has the effect of significantly reducing the number of flows seen by NBP edge routers,reducing therefore the required amount of state and processing. Flows can be classifyed very coarsely at edgerouters. Instead of classifying a flow using the packet’s addresses and port numbers, the network’s edge routersmay aggregate many flows together by, for instance, classifying flows using only the packet’s address fields.Alternatively, they might choose to classify flows even more coarsely using only the packet’s destination networkaddress. However, the drawback of flow aggregation is that adaptive flows aggregated with unresponsive flowsmay be indiscriminately punished by an ingress router. Ingress routers can attempt to prevent this unfair situationby performing per-flow scheduling. Still, flow aggregation may create a trade-off between scalability and per-flow fairness.
  • 26. 26 VI. I MPLEMENTATION I SSUES As we saw in the previous section, Network Border Patrol is a congestion avoidance mechanism that effec-tively prevents congestion collapse and provides approximate max-min fairness when used with a fair queueingmechanism. However, a number of important implementation issues must be addressed before NBP can befeasibly deployed in the Internet. Among these issues are the following:1. Scalable inter-domain deployment. Another approach to improving the scalability of NBP, inspired by asuggestion in [9], is to develop trust relationships between domains that deploy NBP. The inter-domain routerconnecting two or more mutually trusting domains may then become a simple NBP core router with no need toperform per-flow tasks or keep per-flow state. If a trust relationship cannot be established, border routers betweenthe two domains may exchange congestion information, so that congestion collapse can be prevented not onlywithin a domain, but throughout multiple domains.2. Incremental deployment. It is crucial that NBP be implemented in all edge routers of an NBP-capable network.If one ingress router fails to police arriving traffic or one egress router fails to monitor departing traffic, NBP willnot operate correctly and congestion collapse will be possible. Nevertheless, it is not necessary for all networksin the Internet to deploy NBP in order for it to be effective. Any network that deploys NBP will enjoy the benefitsof eliminated congestion collapse within the network. Hence, it is possible to incrementally deploy NBP into theInternet on a network-by-network basis.3. Multicast. Multicast routing makes it possible for copies of a flow’s packets to leave the network throughmore than one egress router. When this occurs, an NBP ingress router must examine backward feedback packetsreturning from each of the multicast flow’s egress routers. To determine whether the multicast flow is experi-encing congestion, the ingress router should execute its rate control algorithm using backward feedback packetsfrom the most congested ingress-to-egress path (i.e., the one with the lowest flow egress rate). This has the effectof limiting the ingress rate of a multicast flow according to the most congested link in the flow’s multicast tree.4. Multi-path routing. Multi-path routing makes it possible for packets from a single flow to leave the networkthrough different egress routers. In order to support this possibility, an NBP ingress router may need to examinebackward feedback packets from more than one egress router in order to determine the combined egress rate for
  • 27. 27a single flow. For a flow passing through more than one egress router, its combined egress rate is equal to thesum of the flow’s egress rates reported in backward feedback packets from each egress router.5. Integrated or differentiated services. NBP treats all flows identically, but integrated and differentiated servicesnetworks allow flows to receive different qualities of service. In such networks, NBP should be used to regulatebest effort flows only. Flows using network services other than best effort are likely to be policed by separatetraffic control mechanisms. VII. C ONCLUSION In this paper, we have presented a novel congestion avoidance mechanism for the Internet called Network Bor-der Patrol and an enhanced core-stateless fair queuing mechanism. Unlike existing Internet congestion controlapproaches, which rely solely on end-to-end control, NBP is able to prevent congestion collapse from undeliv-ered packets. It does this by ensuring at the border of the network that each flow’s packets do not enter thenetwork faster than they are able to leave it. NBP requires no modifications to core routers nor to end systems.Only edge routers are enhanced so that they can perform the requisite per-flow monitoring, per-flow rate controland feedback exchange operations. Extensive simulation results provided in this paper show that NBP successfully prevents congestion collapsefrom undelivered packets. They also show that, while NBP is unable to eliminate unfairness on its own, it is ableto achieve approximate global max-min fairness for competing network flows when combined with ECSFQ, theyapproximate global max-min fairness in a completely core-stateless fashion. R EFERENCES[1] S. Floyd and K. Fall, “Promoting the Use of End-to-End Congestion Control in the Internet,” IEEE/ACM Transactions on Networking, August 1999, To appear.[2] J. Nagle, “Congestion control in IP/TCP internetworks,” Request for Comments 896, Internet Engineering Task Force, Jan. 1984.[3] Van Jacobson, “Congestion avoidance and control,” ACM Computer Communications Review, vol. 18, no. 4, pp. 314–329, Aug. 1988, Proceed- ings of the Sigcomm ’88 Symposium in Stanford, CA, August, 1988.[4] “Real Broadcast Network White Paper,” White paper, RealNetworks Inc., January 1999, http://www.real.com/solutions/rbn/whitepaper.html.[5] “Real Video Technical White Paper,” White paper, RealNetworks Inc., January 1999, http://www.real.com/devzone/library/whitepapers/overview.html.[6] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP Throughput: A Simple Model and its Empirical Validation,” in Proc. of ACM SIGCOMM, September 1998, pp. 303–314.[7] A. Demers, S. Keshav, and S. Shenker, “Analysis and Simulation of a Fair Queueing Algorithm,” in Proc. of ACM SIGCOMM, September 1989, pp. 1–12.[8] A. Parekh and R. Gallager, “A Generalized Processor Sharing Approach to Flow Control – the Single Node Case,” IEEE/ACM Transactions on Networking, vol. 1, no. 3, pp. 344–357, June 1993.[9] I. Stoica, S. Shenker, and H. Zhang, “Core-Stateless Fair Queueing: Achieving Approximately Fair Bandwidth Allocations in High Speed Networks,” in Proc. of ACM SIGCOMM, September 1998, pp. 118–130.
  • 28. 28[10] Z. Cao, Z. Wang, and E. Zegura, “Rainbow Fair Queuing: Fair Bandwidth Sharing Without Per-Flow State,” in Proc. of IEEE Infocom ’2000, March 2000.[11] R. Pan, B. Prabhakar, and K. Psounis, “CHOKe - A stateless active queue management scheme for approximating fair bandwidth allocation,” in Proc. of IEEE Infocom ’2000, March 2000.[12] B. Suter, T.V. Lakshman, D. Stiliadis, and A. Choudhury, “Design Considerations for Supporting TCP with Per-Flow Queueing,” in Proc. of IEEE Infocom ’98, March 1998, pp. 299–305.[13] B. Braden et al., “Recommendations on Queue Management and Congestion Avoidance in the Internet,” RFC 2309, IETF, April 1998.[14] D. Lin and R. Morris, “Dynamics of Random Early Detection,” in Proc. of ACM SIGCOMM, September 1997, pp. 127–137.[15] D. Bertsekas and R. Gallager, Data Networks, second edition, Prentice Hall, 1987.[16] R. Jain, S. Kalyanaraman, R. Goyal, S. Fahmy, and R. Viswanathan, “ERICA Switch Algorithm: A Complete Description,” ATM Forum Document 96-1172, Traffic Management WG, August 1996.[17] A. Rangarajan and A. Acharya, “ERUF: Early Regulation of Unresponsive Best-Effort Traffic,” International Conference on Networks and Protocols, October 1999.[18] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss, “An Architecture for Differentiated Services,” Request for Comments 2475, Internet Engineering Task Force, December 1998.[19] D. Clark and W. Fang, “Explicit Allocation of Best-Effort Packet Delivery Service,” IEEE/ACM Transactions on Networking, vol. 6, no. 4, pp. 362–373, August 1998.[20] K.K. Ramakrishnan and R. Jain, “A Binary Feedback Scheme for Congestion Avoidance in Computer Networks with a Connectionless Network Layer,” ACM Transactions on Computing Systems, vol. 8, no. 2, pp. 158–181, May 1990.[21] L. Kalampoukas, Anujan Varma, and K. K. Ramakrishnan, “Explicit Window Adaptation: A Method to Enhance TCP Performance,” in Proc. of IEEE Infocom, San Francisco, California, April 1998.[22] L.S. Brakmo, S.W. O’Malley, and L.L. Peterson, “TCP Vegas: New techniques for congestion detection and avoidance,” in Proc. of ACM SIGCOMM, London, United Kingdom, Aug. 1994, ACM, pp. 34–35.[23] U. Hengartner, J. Bolliger, and Th. Gross, “TCP Vegas Revisited,” in Proc. of IEEE Infocom ’2000, March 2000.[24] S. Floyd, “TCP and Explicit Congestion Notification,” ACM Computer Communications Review, vol. 24, no. 5, pp. 8–23, Oct. 1994.[25] C. Albuquerque, B.J. Vickers, and T. Suda, “Network Border Patrol,” in Proc. of IEEE Infocom, March 2000.[26] W. Stevens, “TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms,” RFC 2001, IETF, January 1997.[27] LBNL Network Research Group, UCB/LBNL/VINT Network Simulator - ns (version 2), http://www-mash.cs.berkeley.edu/ns/, September 1997.[28] UCI Network Research Group, Network Border Patrol (NBP), http://netresearch.ics.uci.edu/nbp/, 1999.[29] B. Vandalore, S. Fahmy, R. Jain, R. Goyal, and M. Goyal, “A Definition of Generalized Fairness and its Support in Switch Algorithms,” ATM Forum Document 98-0151, Traffic Management WG, February 1998.[30] W.K. Tsai and Y. Kim, “Re-Examining Maxmin Protocols: A Fundamental Study on Convergence, Complexity, Variations, and Performance,” in Proc. of IEEE Infocom, April 1999, pp. 811–818.