8.analysis and simulation of a fair queueing algorithm Document Transcript
Analysis and Simulation of a Fair Queueing Algorithm Alan Demers Srinivasan Keshavt Scott Shenker Xerox PARC 3333 Coyote Hill Road Palo Alto, CA 94304Abstract often ignored, makes queueing algorithms a crucial component in effective congestion control. We discuss gateway queueing algorithms and theirrole in controlling congestion in datagram networks. Queueing algorithms can be thought of as allocat-A fair queueing aJgorithm, based on an earlier ing three nearly independent quantities: bandwidthsuggestion by Nagle, is proposed. An.alysis and (which packets get transmitted), promptness (whensimulations are used to compare this algorithm to do those packets get transmitted), and buffer spaceother congestion control schemes. We find. that fair (which packets are discarded by the gateway).queueing provides several important advantages over Currently, the most common queueing algorithm isthe usual first-come-first-serve queueing algorithm: first-come-first-serve (FCFS). FCFS queueing essen-fair allocation of bandwidth, lower delay for sources tially relegates all congestion control to the sources,using less than their full share of bandwidth, and since the order of arrival completely determines theprotection from ill-behaved sources. bandwidth, promptness, and buffer space alloca- tions. Thus, FCFS inextricably intertwines these three allocation issues. There may indeed be flow1. Introduction control algorithms that, when universally imple- mented throughout a network with FCFS gateways, Datagram networks have long suffered from perfor- can overcome these limitations and provide reason- mance degradation in the presence of congestion[GerBO]. The rapid growth, in both use and size, of ably fair and efficient congestion control. This point is discussed more fully in Sections 3 and 4,computer networks has sparked a renewed interest where several flow control algorithms are com-in methods of congestion control [DEC87abcd, pared. However, with today’s diverse and decen-Jac88a, Man89, Nag871. These methods have twopoints of implementation. The first is at the source, tralized computing environments, it is unrealistic to expect universal implementation of any givenwhere flow control algorithms vary the rate at flow control algorithm, This is not merely a ques-which the source sends packets, Of course, flow tion of standards, but also one of compliance. Evencontrol algorithms are designed primarily to ensure if a universal standard such as IS0 [IS0861 werethe presence of free buffers at the destination host, adopted, malfunctioning hardware and softwarebut we are more concerned with their role in limit- could violate the standard, and there is always theing the overall network traffic. The second point of possibility that individuals would alter the algo-implementation is at the gateway. Congestion can rithms on their own machine to improve their per-be controlled at gateways through routing and formance at the expense of others. Consequently,queueing algorithms. Adaptive routing, if properly congestion control algorithms should function wellimplemented, lessens congestion by routing packets even in the presence of ill-behaved sources. Unfor-away from network bottlenecks. Queueing algo- tunately, no matter what flow control algorithm isrithms, which control the order in which packetsare sent and the usage of the gateway’s buffer used by the well-behaved sources, networks with FCFS gateways do not have this property. A singlespace, do not affect congestion directly, in that they source, sending packets to a gateway at ado not change the total traffic on the gateway’s out- sufficiently high speed, can capture an arbitrarilygoing line. Queueing algorithms do, however, deter- high fraction of the bandwidth of the outgoing line.mine the way in which packets from different Thus, FCFS queueing is not adequate; moresources interact with each other which, in turn, discriminating queueing algorithms must be usedaffects the collective behavior of flow control algo- in conjunction with source flow control algorithmsrithms. We shall argue that this effect, which is----- --- to control congestion effectively in noncooperative+ Current Address: University of California at Berkeley environments.Permission to copy without fee all or part of this material is granted provided Following a similar line of reasoning, Naglethat the copies are not made or distributed for direct commercial advantage, [Nag87, Nag851 proposed a fair queueirtg (FQ! algo-the ACM copyright notice and the title of the publication and its date appear, rithm in which gateways maintain separate queuesand notice is given that copying is by permission of the Association forComputing Machinery. To copy otherwise, or to republish, requires a fee for packets from each individual source. The queuesand/or specific permission. are serviced in a round-robin manner. This0 1989 ACM 089791-33279/89/COO9/0001 $1.50 prevents a source from arbitrarily increasing its
share of the bandwidth or the delay of other combines round robin service and FIFO priority sources. In fact, when a source sends packets too service, and has been analyzed extensively [Lo87, quickly, it merely increases the length of its own Fra841. Also, Luan and Lucantoni present a queue. Nagle’s algorithm, by changing the way different form of bandwidth management policy for packets from different sources interact, does not circuit switched networks [Lua88]. reward, nor leave others vulnerable to, anti-social behavior. On the surface, this proposal appears to Since the completion of this work, we have learned have considerable merit, but we are not aware of of a similar Virtual Clock algorithm for gateway any published data on the performance of datagram resource allocation proposed by Zhang [Zha891. networks with such fair queueing gateways. In this Furthermore, Heybey and Davin [Hey891 have paper, we will first describe a modification of simulated a simplified version of our fair queueing Nagle’s algorithm, and then provide simulation algorithm. data comparing networks with FQ gateways and those with FCFS gateways. 2. Fair Queueing The three different components of congestion con- trol algorithms introduced above, source flow con- 2.1. Motivation What are the requirements for a trol, gateway routing, and gateway queueing algo- queueing algorithm that will allow source flow con-rithms, interact in interesting and complicated trol algorithms to provide adequate congestion con-ways. It is impossible to assess the effectiveness of trol even in the presence of ill-behaved sources?any algorithm without reference to the other com- We start with Nagle’s observation that such queue-ponents of congestion control in operation. We will ing algorithms must provide protection, so that ill-evaluate our proposed queueing algorithm in the behaved sources can only have a limited negativecontext of static routing and several widely used impact on well behaved sources. Allocatingflow control algorithms. The aim is to find a queue- bandwidth and buffer space in a fair manner, to being algorithm that functions well in current com- defined later, automatically ensures that ill-puting environments. The algorithm might, indeed behaved sources can get no more than their fairit should, enable new and improved routing and share. This led us to adopt, as our central designflow control algorithms, but it must not require consideration, the requirement that the queueingthem. algorithm allocate bandwidth and buffer space We had three goals in writing this paper. The first fairly. Ability to control the promptness, or delay, allocation somewhat independently of the was to describe a new fair queueing algorithm. In Section 2.1, we discuss the design requirements for bandwidth and buffer allocation is also desirable.an effective queueing algorithm and outline how Finally, we require that the gateway should pro-Nagle’s original proposal fails to meet them. In Sec- vide service that, at least on average, does nottion 2.2, we propose a new fair queueing algorithm depend discontinuously on a packet’s time of arrivalwhich meets these design requirements. The (this continuity condition will become clearer insecond goal was to provide some rigorous under- Section .2.2). This requirement attempts to preventstanding of the performance of this algorithm; this the efficiency of source implementations from beingis done in Section 2.3, where we present a delay- overly sensitive to timing details (timers are thethroughput curve given by our fair queueing algo- Bermuda Triangle of flow control algorithms).rithm for a specific configuration of sources. The Nagle’s proposal does not satisfy these require-third goal was to evaluate this new queueing propo- ments. The most obvious flaw is its lack of con-sal in the context of real networks. To this end, we sideration of packet lengths. A source using longdiscuss flow control algorithms in Section 3, and packets gets more bandwidth than one using shortthen, in Section 4, we present simulation data com- packets, so bandwidth is not allocated fairly. Also.paring several combinations of flow control and the proposal has no explicit promptness allocationqueueing algorithms on six benchmark networks. other than that provided by the round-robin serviceSection 5 contains an overview of our results, a dis- discipline. In addition, the static round robin ord-cussion of other proposed queueing algorithms, and ering violates the continuity requirement. In thean analysis of some criticisms of fair queueing. following section we attempt to correct these defects.In circuit switched networks where there is explicitbuffer reservation and uniform packet sizes, it has In stating our requirements for queueing algo-been established that round robin service discip- rithms, we have left the term fair undefined. Thelines allocate bandwidth fairly [Hah86, Kat871. term fair has a clear colloquial meaning, but it alsoRecently Morgan [Mor891 has examined the role has a technical definition (actually several, but onlysuch queueing algorithms play in controlling one is considered here). Consider, for example, thecongestion in circuit switched networks; while his allocation of a single resource among N users.application context is quite different from ours, his Assume there is an amount pt,tal of this resourceconclusions are qualitatively similar. In other and that each of the users requests an amount p,related work, the DATAKIT’” queueing algorithm and, under a particular allocation, receives an 2
amount p,. What is a fair allocation? The max- allocation of packets-sent but fails to guarantee amin fairness criterion [Hah86, Gaf64, DECS7d] fair allocation of bandwidth because of variations instates that an allocation is fair if (1) no user packet sizes. To see how this unfairness can bereceives more than its request, (2) no other alloca- avoided, we first consider a hypothetical service dis-tion scheme satisfying condition 1 has a higher cipline where transmission occurs in a bit-by-bitminimum allocation, and (3) condition 2 remains round robin (BR) fashion (as in a head-of-queuerecursively true as we remove the minimal user processor sharing discipline). This service discip-and reduce the total resource accordingly, line allocates bandwidth fairly since at every~~~~~~~~~~~~~- pm;,. This condition reduces to instant in time each conversation is receiving its~1~=MIN(pfat,,pJ in the simple example,Nwith pfaairr fair share. Let R(t) denote the number of roundsthe fu;r shure, being set so that plotal = 2 p,. This made in the round-robin service discipline up to 1=1 time t (R(t) is a continuous function, with the frac-concept of fairness easily generalizes to the multi- tional part indicating partially completed rounds>.ple resource case [DEC87d]. Note that implicit in Let N,,(t) denote the number of active conversa-the max-min definition of fairness is the assump- tions, i.e. those tha. have bits in their queue attion that the users have equal rights to the time t. Then, St =--IL -, dK where p is theresource. N”,(t) linespeed of the gateway’s-outgoing line (we will, In our communication application, the bandwidth for convenience, work in units such that II= 1). A and buffer demands are clearly represented by the packet of size P whose first bit gets serviced at time packets that arrive at the gateway. (Demands for to will have its last bit serviced P rounds later, at promptness are not explicitly communicated, and time t such that R(t)=R(t,)+P. Let tta be the we will return to this issue later.) However, it is time that packet i belonging to conversation anot clear what constitutes a user. The user associ- arrives at the gateway, and define the numbers S;“ ated with a packet could refer to the source of the and F,’ as the values of R(t) when the packet packet, the destination, the source-destination pair, started and finished service. With P,” denoting theor even refer to an individual process running on a size of the packet, the following relations hold:source host. Each of these definitions has limita- F, a=StQfPLa and S,“=MAX(F,-,Q, R(t,“)). Sincetions. Allocation per source unnaturally restricts R(t) is a strictly monotonically increasing functionsources such as file servers which typically consume whenever there are bits at the gateway, the order-considerable bandwidth. Ideally the gateways ing of the FLU values is the same as the ordering ofcould know that some sources deserve more the finishing times of the various packets in the BRbandwidth than others, but there is no adequate discipline.mechanism for establishing that knowledge intoday’s networks. Allocation per receiver allows a Sending packets in a bit-by-bit round robin fashion,receiver’s useful incoming bandwidth to be reduced while satisfying our requirements for an adequateby a broken or malicious source sending unwanted queueing algorithm, is obviously unrealistic. Wepackets to it. Allocation per process on a host hope to emulate this impractical algorithm by aencourages human users to start several processes practical packet-by-packet transmission scheme.communicating simultaneously, thereby avoiding Note that the functions R(t) and N,,(,t) and thethe original intent of fair allocation. Allocation per quantities S,” and F,a depend only on the packetsource-destination pair allows a malicious source to arrival times tia and not on the actual packetconsume an unlimited amount of bandwidth by transmission times, as long as we define a conversa-sending many packets all to different destinations. tion to be active whenever R(t) SF,= forWhile this does not allow the malicious source to do i =MAX(j jt;“‘t). We are thus free to use theseuseful work, it can prevent other sources from quantities in defining our packet-by-packetobtaining sufficient bandwidth. transmission algorithm. A natural way to emulate the bit-by-bit round-robin algorithm is to let theOverall, allocation on the basis of source- quantities Fla define the sending order of the pack-destination pairs, or conversations, seems the best ets. Our packet-by-packet transmission algorithmtradeoff between security and efficiency and will be is simply defined by the rule that, whenever aused here. However, our treatment will apply to packet finishes transmission, the next packet sentany of these interpretations of user. With our is the one with the smallest value of F,‘. In arequirements for an adequate queueing algorithm, preemptive version of this algorithm, newly arriv-coupled with our definitions of fairness and user, we ing packets whose finishing number Fja is smallernow turn to the description of our algorithm. than that of the packet currently in transmission preempt the transmitting packet. For practical rea-2.2. Definition of algorithm It is simple to allo- sons, we have implemented the nonpreemptive ver-cate buffer space fairly by dropping packets, when sion, but the preemptive algorithm (with resump-necessary, from the conversation with the largest tive service) is more tractable analytically. Clearlyqueue. Allocating bandwidth fairly is less straight- the preemptive and nonpreemptive packetized algo-forward. Pure round-robin service provides a fair rithms do not give the same instantaneous 3
bandwidth allocation as the BR version. However, their own poor flow control, they could not use.for each conversation the total bits sent at a giventime by these three algorithms are always withinP mex of each other, where P,,, is the maximum 2.3. Properties of Fair Queueing The desiredpacket size (this emulation discrepancy bound was bandwidth and buffer allocations are completelyproved by Greenberg and Madras [Gree89]). Thus, specified by the definition of fairness, and we haveover sufficiently long conversations, the packetized demonstrated that our algorithm achieves thosealgorithms asymptotically approach the fair goals. However, we have not been able to charac-bandwidth allocation of the BR scheme. terize the promptness allocation for an arbitrary arrival stream of packets. To obtain some quantita- Recall that the user’s request for promptness is not tive results on the promptness, or delay, perfor- made explicit. (The IP [Pos811 protocol does have a mance of a single FQ gateway, we consider a very field for type-of-service, but not enough applications restricted class of arrival streams in which there make intelligent use of this option to render it a are only two types of sources. There are FTP-like useful hint.) Consequently, promptness allocation file transfer sources, which always have ready pack- must be based solely on data already available at ets and transmit them whenever permitted by thethe gateway. One such allocation strategy is to give source flow control (which, for simplicity, is takenmore promptness (less delay) to users who utilize to be sliding window flow control), and there areless than their fair share of bandwidth. Separating Telnet-like interactive sources, which produce pack-the promptness allocation from the bandwidth allo- ets intermittently according to some unspecifiedcation can be accomplished by introducing a nonne- generation process. What are the quantities ofgative parameter 6, and defining a new quantity, interest? An FTP source is typically transferring athe bid Bia, via B~a=P;a+MAX(F,-,a, R(ti”)-6). large file, so the quantity of interest is the transferThe quantities R(t), N,,(t), FL’, and SLa remain as time of the file, which for asymptotically large filesbefore, but now the sending order is determined by depends only on the bandwidth allocation. Giventhe B’s, not the F’s. The asymptotic bandwidth the configuration of sources this bandwidth alloca-allocation is independent of 6, since the F’s control tion can be computed a priori by using the fairnessthe bandwidth allocation, but the algorithm gives property of FQ gateways. The interesting quantityslightly faster service to packets that arrive at an for Telnet sources is the average delay of eachinactive conversation, The parameter 6 controls packet, and it is for this quantity that we now pro-the extent of this additional promptness. Note that vide a rather limited result.the bid Bia is continuous in tLa, so that the con-tinuity requirement mentioned in Section 2.1 is Consider a single FQ gateway with N FTP sourcesmet. sending packets of size PF, and allow a single packet of size PT from a Telnet source to arrive at The role of this term 6 can be seen more clearly by the gateway at time t. It will be assigned a bid considering the two extreme cases 6 =O and S =a. number B =R(t) +P,- 6; thus, the dependence of If an arriving packet has R(t,a)=F, -iu, then the the queueing delay on the quantities Pr and 6 isconversation a is active (i.e. the corresponding only through the combination PT - 8. We willconversation in the BR algorithm would have bits denote the queueing delay of this packet by q(t), in the queue). In this case, the value of S is which is a periodic function with period NP,T. Weirrelevant and the bid number depends only on the are interested in the average queueing delay Afinishing number of the previous packet. However, .-VPFif R(tLa)>F,-ia, so that the a conversation is inac- BE1 J cpctkittive, the two cases are quite different. With S=O, NPF nthe bid number is given by Bia=P,a+R(t,a) and iscompletely independent of the previous history of The finishing numbers F,’ for the N FTP’s can beuser a. With S=w, the bid number is expressed, after perhaps renumbering the packets,B,“=P,“+F,-,a and depends only the previous by FLa=(ifla)PF where the l’s obey 0~1~<1. Thepacket’s finishing number, no matter how many queueing delay of the Telnet packet depends on therounds ago. For intermediate values of S, schedul- configuration of Z’s whenever P,<P,. One caning decisions for packets arriving at inactive show that the delay is bounded by the extremalconversations depends on the previous packet’s cases of la=0 for all a and la= a/N forfinishing round as long as it wasn’t too long ago, a=O,l,..., N-l. The delay values for theseand 6 controls how far back this dependence goes. extremal cases are straightforward to calculate; for the sake of brevity we omit the derivation andRecall that when the queue is full and a new merely display the result below. The averagepacket arrives, the last packet from the source queueing delay is given by A=A(Pr-6) where thecurrently using the most buffer space is dropped. function A(P), the delay with 6 =O, is defined belowWe have chosen to leave the quantities FLU and Sia (with integer k and small constant E, OIc< 1,unchanged when we drop a packet. This provides a defined via PT=PF(k+~)/N).small penalty for ill-behaved hosts, in that theywill be charged for throughput that, because of 4
Preemptive described by Z”=a/N for a =O,l,...,N-1. Define p to be the average bandwidth of the stream, meas- p> A~FY=N(P-~, for PZP, ured relative to the fair share of the Telnet: p = XP(iV+l). Then, for the nonpreemptive algo- rithm, I 1-exp --%fN($,$(1- I - (N+l) +,, III Nonpreemptive This is the throughput/delay curve the FQ gateway offers the Poisson Telnet source (the formulae for PF A~PI=N~P-~) for P?P,, different FTP synchronizations are substantially more complicated, but have the same qualitativeN,P+AIPE,--~--) PF l++[kz+k(24 for PFaP+l+-$ behavior). This can be contrasted with that offered 1 t by the FCFS gateway, although the FCFS results Pb PF PF depend in detail on the flow control used by the T~AU’)c~T) l++‘+ktZa-ill for y,l++P FTP sources and on the surrounding network I t environment. Assume that all other communica- tions speeds are ‘infinitely fast in relation to theNow consider a general Telnet packet generation outgoing linespeed of the gateway, and that theprocess (ignoring the effects of flow control) and FTP’s all have window size W, so there are alwayscharacterize this generation process by the function NW FTP packets in the queue or in transmission.Ds(Pr) which denotes the queueing delay of the Figure 1 shows the throughput/delay curves for anTelnet source when it is the sole source at the gate- FCFS gateway, along with those for a FQ gatewayway. In the BR algorithm, the queueing delay of with 6 =0 and 6 =P. For p-+0, FCFS gives a largethe Telnet source in the presence of N FTP sources queueing delay of (NW - #)P, whereas FQ gives ais merely D,((N + 12’~). For the packetized preemp- queueing delay of NP/Z for 6 =0 and P/2 for S =P.tive algorithm with S =O, we can express the This ability to provide a lower delay to lowerqueueing delay in the presence of N FTP sources, throughput sources, completely independent of thecall it D#r), in terms of Ds via the relation window sizes of the FTP’s, is one of the most impor-(averaging over all relative synchronizations tant features of fair queueing. Note also that thebetween the FTP’s and the Telnet): FQ queueing delay diverges as p+l, reflecting FQ’s insistence that no conversation gets more than its D,vPr)=Do((N+ lU’r)+A(Pr) fair share. In contrast, the FCFS curve remainswhere the term A(Pr) reflects the extra delay finite for all p < (N + 1). showing that an ill-behavedincurred when emulating the BR algorithm by the source can consume an arbitrarily large fraction ofpreemptive packetized algorithm. the bandwidth.For nonzero values 6, the generation process must What happens in a network of FQ gateways?be further characterized by the quantity ZU(P~,t) There are few results here, but Hahne [Hah861 haswhich, in a system where the Telnet is the sole shown that for strict round robin service gatewayssource, is the probability that a packet arrives to a and only FTP sources there is fair allocation ofqueue which has been idle for time t. The delay is bandwidth (in the multiple resource sense) whengiven by, the window sizes are sufficiently large. She also provides examples where insufficient window sizes (but much larger than the communication path) result in unfair allocations. We believe, but have been unable to prove, that both of these propertieswhere the last term represents the reduction in hold for our fair queueing scheme.delay due the the nonzero 6. These expressions forDN, which were derived for the preemptive case, 3. Flow Control Algorithmsare also valid for the nonpreemptive algorithmwhen PT 2 PF, Flow control algorithms are both the benchmarks against which the congestion control properties ofWhat do these forbidding formulae mean? Con- fair queueing are measured, and also the environ-sider, for concreteness, a Poisson arrival process ment in which FQ gateways will operate. Wewith arrival rate h, packet sizes PT = PF =P, a already know that, when combined with FCFSlinespeed p= 1, and an FTP synchronization gateways, these flow control algorithms all suffer 5
nowledgement). The congestion avoidance part of the algorithm is sliding window flow control, with some set window size. This algorithm has a very narrow range of validity, in that it avoids conges- tion if the window sizes are small enough, and pro- vides efficient service if the windows are large enough, but cannot respond adequately if either of these conditions is violated. The second generation of flow control algorithms, exemplified by Jacobson and Karels’ (JK) modified TCP [Jac88a] and the original DECbit proposal [DEC87a-c3, are descendants of the above generic algorithm with the added feature that the window size is allowed to respond dynamically in response to network congestion (JK also has, among other changes, substantial modifications to the timeout calculation [Jac88a,b, Kar871). The algorithms use different signals for congestion; JK uses timeouts .6 .8 1 1.2 1.4 1.6 whereas DECbit uses a header bit which is set by the gateway on all packets whenever the average Throughput (p) queue length is greater than one. These mechan- isms allocate window sizes fairly, but the relation Throughput = WindowlRoundTrip implies that Figure 1: Delay vs. Throughput conversations with different paths receive different This graph describes the queueing delay bandwidths. of a single Telnet source with Poisson generation process of strength h, sending The third generation of flow control algorithms are packets through a gateway with three similar to the second, except that now the conges- FTP conversations. The packet sizes are tion signals are sent selectively. For instance, the PF =PT =P, the throughput is measured selective DECbit proposal [DEC87d] has the gate- relative to the Telnet’s fair share, way measure the flows of the various conversations pz4XP/p where p is the linespeed. The and only send congestion signals to those users who delay is measured in units of P/p. The are using more than their fair share of bandwidth. FQ algorithm is nonpreemptive, and the This corrects the previous unfairness for sources FCFS case always has 15 FTP packets in using different paths (see [DEC87dl and section 4), the queue. and appears to offer reasonably fair and efficient congestion control in many networks. The DECfrom the fundamental problem of vulnerability to algorithm controls the delay by attempting to keepill-behaving sources. Also, there is no mechanism the average queue size close to one. However, itfor separating the promptness allocation from the does not allow individual users to make differentbandwidth and buffer allocation. The remaining delay/throughput tradeoffs; the collective tradeoff isquestion is then how fairly do these flow control set by the gateway.algorithms allocate bandwidth. Before proceeding,note that there are really two distinct problems incontrolling congestion. Congestion recouery allows 4. Simulationsa system to recover from a badly congested state, In this section we compare the various congestionwhereas congestion avoidance attempts to prevent control mechanisms, and try to illustrate the inter-the congestion from occurring. In this paper, we are play between the queueing and flow control algo-focusing on congestion avoidance and will not dis- rithms. We simulated these algorithms at thecuss congestion recooery mechanisms at length. packet level using a network simulator built on theA generic version of source flow control, as imple- Nest network simulation tool [Nes88]. In order tomented in XNS’s SPP [XerSll or in TCP [USCSl], compare the FQ and FCFS gateway algorithms in ahas two parts. There is a timeout mechanism, variety of settings, we selected several differentwhich provides for congestion recovery, whereby flow control algorithms; the generic one describedpackets that have not been acknowledged before above, JK flow control, and the selective DECbitthe timeout period are retransmitted (and a new algorithm. To enable DECbit flow control totimeout period set). The timeout periods are given operate with FQ gateways, we developed a bit-by Prtt where typically j3 -2 and rtt is the setting FQ algorithm in which the congestion bitsexponentially averaged estimate of the round trip are set whenever the source’s queue length istime (the rtt estimate for retransmitted packets is greater than + of its fair share of buffer space (notethe time from their first transmission to their ack- that this is a much simpler bit-setting algorithm 6
than the DEC scheme, which involves complicatedaverages; however, the choice of Q is completely adhoc). The Jacobson/Karels flow control algorithm isdefined by the 4.3bsd TCP implementation. Thiscode deals with many issues unrelated to conges-tion control. Rather than using that code directly in Buffer Size: 40our simulations, we have chosen to model the JKalgorithm by adding many of the congestion control c- 56kbpsideas found in that code, such as adjustable win-dows, better timeout calculations, and fastretransmit to our generic flow control algorithm.The various cases of test algorithms are labeled in e%‘P I I 0 S 1 Telnet 1table 1. I-. ! I I Quantity I Policy 1 2 3 4 Label 1 Flow Control 1 Queueing Algorithm G/FCFS 1 Generic FCFS G/FQ Generic FQ JK/FCFS 1 JK FCFS R JKlFQ / JK FQ DEUDEC 1 DECbit 1 Selective DECbit 11 DECiFQbit / DECbit FQ with bit setting Table 1: Algorithm Combinations Retrans- mittedRather than test this set of algorithms on a single - Packets EUDECrepresentc~tive network and load, we chose to define -a set of benchmark scenarios, each of which, whilesomewhat unrealistic in itself, serves to illuminate Droppeda different facet of congestion control. The load on Packetsthe network consists of a set of Telnet and FTPconversations. The Telnet sources generate 40 bytepackets by a Poisson process with a mean idter-packet interval of 5 seconds. The FTP’s have an Scenario 1: Underloaded Gatewayinfinite supply of 1000 byte packets that are sent asfast as flow control allows. Both FTP’s and Telnet’s loaded case, all of the algorithms provide fairhave their maximum window size set to 5, and the bandwidth allocation, but the cases with FQ pro-acknowledgement (ACK) packets sent back from vide much lower Telnet delay than those withthe receiving sink are 40 bytes. (The small size of FCFS. The selective DECbit gives an intermediateTelnet packets relative to the FTP packets makes value for the Telnet delay, since the flow control isthe effect of S insignificant, so the FQ algorithm designed to keep the average queue length small.was implemented with S =O). The gateways havefinite buffers which, for convenience, are measured Scenario 2 involves 6 FTP sources and 2 Telnetin packets rather than bytes. The system was sources again sending through a single gateway.allowed to stabilize for the first 1500 secotids, and The gateway, with a buffer size of only 15, is sub-then data was collected over the next 500 second stantially overloaded. This scenario probes theinterval. For each scenario, there is a figure depict- behavior of the algorithms in the presence of severeing the corresponding network layout, and a table congestion.containing the data. There are four performance When FCFS gateways are paired with generic flowmeasures for each source: total throughput (number control, the sources segregate into winners, whoof packets reaching destination), average round trip consume a large amount of bandwidth, and losers,time of the packets, the number of packet who consume very little. This phenomenonretransmissions, and number of dropped packets. develops because the queue is almost always full.We do not include confidence intervals for the data, The ACK packets received by the winners serve asbut repetitions of the simulations have consistently a signal that a buffer space has just been freed, soproduced results that lead to the same qualitative their packets are rarely dropped. The losers areconclusions. usually retransmitting, at essentially randomWe first considered several single-gateway net- times, and thus have most. of their packets dropped.works. The first scenario has two FTP sources and This analysis is due to Jacobson [JacMbl, and thetwo Telnet sources sending to a sink through a sin- segregation effect was first, pointed out to us in thisgle bottleneck gateway. Note that, in this under- context by Sturgis [Stu88]. The combination of JK 7
Buffer Size: 15 Buffer Size: 20 I GW I T - 56kbps 0 S Quantity I Policy Quantity Policy yg&+llj ‘hroughput (packets) rhroughpui (packets) Average Average Roundtrip Time Roundtrip Time Retrans- mitted IEC imloloIoIoroIo-IzrYrl Packets Dropped Packets Scenario 2: Overloaded Gateway Scenario 3: Ill-Behaved Sourceflow control with FCFS gateways produces fair Thebandwidth allocation among the FTP sources, but are no dropped packets or retransmissions.the Telnet sources are almost completely shut out. addition of FQ to the DECbit algorithm retains theThis is because the JK algorithm ensures that the fair bandwidth allocation and, in addition, lowersgateway’s buffer is usually full, causing most of the the Telnet delay by a factor of 9. Thus, for each ofTelnet packets to be dropped. the three flow control algorithms, replacing FCFS gateways with FQ gateways generally improved theWhen generic flow control is combined with FQ, the FTP performance and dramatically improved thestrict segregation disappears. However, the Telnet performance of this extremely overloadedbandwidth allocation is still rather uneven, and the network.useful bandwidth (rate of nonduplicate packets! is12% below optimal. Both of these facts are due to In scenario 3 there is a single FTP and a singlethe inflexibility of the generic flow control, which is Telnet competing with an ill-behaved source. Thisunable to reduce its load enough to prevent dropped ill-behaved source has no flow control and is send-packets. This not only necessitates retransmissions ing packets at twice the rate of the gateway’s out-but also, because of the crudeness of the timeout going line. With FCFS, the FTP and Telnet arecongestion recovery mechanism, prevents FTP’s essentially shut out by the ill-behaved source.from using their fair share of bandwidth. In con- With FQ, they obtain their fair share of bandwidth.trast, JK flow control combined with FQ produced Moreover, the ill-behaved host gets much less thanreasonably fair and efficient allocation of the its fair share, since when it has its packets droppedbandwidth. The lesson here is that fair queueing it is still charged for that throughput. Thus, FQgateways by themselves do not provide adequate gateways are effective firewalls that can protectcongestion control; they must be combined with users, and the rest of the network, from being dam-intelligent flow control algorithms at the sources. aged by ill-behaved sources.The selective DECbit algorithm manages to keep We have argued for the importance of considering athe bandwidth allocation perfectly fair, and there heterogeneous set of flow control mechanisms.
F 4 Q Buffer Size: 20 56kbps Bufh Size: 15 I I ETP Quantity 1 Policy I 1 I 2 I 3 I 4 GIFCFS o o o CI Retrans- ClFQ o o o o _ JK/FCFS o o o o mitted JK/FQ o o o o Packets DEUDEC o o o o DEClFQbit o o o o Scenario 4: Mixed ProtocolsScenario 4 has single gateway with two pairs ofFTP sources, employing generic and JK flow controlrespectively. With a FCFS gateway, the generic Scenario 5: Multihop Pathflow controlled pair has higher throughput than theJK pair. However, with a FQ gateway, the situa- nations of flow control and queueing algorithmstion is reversed (and the generic sources have function smoothly, With FCFS, sources 4 and 8 aresegregated). Note that the FQ gateway has pro- not limited by the available bandwidth, but by thevided incentive for sources to implement JK or delay their ACK packets incur waiting behind FTPsome other intelligent flow control, whereas the packets. The total throughput increases when theFCFS gateway makes such a move sacrificial. FQ gateways are used because the small ACK packets are given priority,Certainly not all of the relevant behavior of thesealgorithms can be gleaned from single gateway net- For the sake of clarity and brevity, we haveworks. Scenario 5 has a multinode network with presented a fairly clean and uncomplicated view of network dynamics. We want to emphasize thatfour FTP sources using different network paths.Three of the sources have short nonoverlapping there are many other scenarios, not presented here,conversations and the fourth source has a long path where the simulation results are confusing andthat intersects each of the short paths. When FCFS apparently involve complicated dynamic effects.gateways are used with generic or JK flow control, These results do not call into question the efficacythe conversation with the long path receives less and desirability of fair queueing, but they do chal-than 60% of its fair share. With FQ gateways, it lenge our understanding of the collective behaviorreceives its full fair share. Furthermore, the selec- of flow control algorithms in networks.tive DECbit algorithm, in keeping the averagequeue size small, wastes roughly 10% of the 5. Discussionbandwidth (and the conversation with the longpath, which should be helped by any attempt at In an FCFS gateway, the queueing delay of packetsfairness, ends up with less bandwidth than in the is, on average, uniform across all sources andgeneric/FCFS case). directly proportional to the total queue size. Thus. achieving ambitious performance goals, such as lowScenario 6 involves a more complicated network, delay for Telnet-like sources, or even mundanecombining lines of several different bandwidths. ones, such as avoiding dropped packets, requiresNone of the gateways are overloaded so all combi- coordination among all sources to control the queue 9
creates an environment where users are rewarded for devising more sophisticated and responsive algo- rithms. The game-theoretic issue first raised by Nagle, that one must change the rules of the gateway’s game so that good source behavior is encouraged, is crucial in the design of gateway algorithms. A formal game-theoretic analysis of a simple gateway model (an exponential server with N Poisson sources) suggests that fair queueing algorithms make self-optimizing source behavior result in fair, protective, nonmanipulable, and stable networks; in fact, they may be the only rea- sonable queueing algorithms to do so [SheBgal. Our calculations show that the fair queueing algo- Buffer Sire: 20 rithm is able to deliver low delay to sources using less than their fair share of bandwidth, and that 0f+l this delay is insensitive to the window sizes being used by the FTP sources. Furthermore, simulations I FTP indicate that, when combined with currently avail- Quantity Policy 1 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 able flow control algorithms, FQ delivers satisfac- tory congestion control in a wide variety of network Throughput scenarios. The combination of FQ gateways and (packets) DECbit flow control was particularly effective. However, these limited tests are in no way con- clusive. We hope, in the future, to investigate the performance of FQ under more realistic load condi- Average tions, on larger networks, and interacting with Roundtrip routing algorithms. Also, we hope to explore new Time source flow control algorithms that are more I I I t I I attuned to the properties of FQ gateways. Retrans- In this paper we have compared our fair queueing mitted algorithm with only the standard first-come-first- Packets serve queueing algorithm. We know of three other widely known queueing algorithm proposals. The I ” I I I Y I ” 1 n n ” I .A I ” I Y I Y I n n n n Dropped first two were not intended as a general purpose Packets ‘C%S o 0 o 0 o 0 o 0 II 0 0 o 1 0 o o 0 congestion control algorithms. Prue and Postel JKIFQ o o o 0 0 010 o [Pru871 have proposed a type-of-service priority 3c 0 0 0 0 0 0 [ 0 0 o I o I o I o I o lo1 o 1~0 queueing algorithm, but allocation is not made on a user-by-user basis, so fairness issues are not addressed. There is also the Fuzzball selective Scenario 6: Complicated Network preemption algorithm [Mi1187,88] whereby the gate- ways allocate buffers fairly (on a source basis, oversize. Having to rely on source flow control algo- all of the gateway’s outgoing buffers). This is veryrithms to solve this control problem, which is similar to our buffer allocation policy, and so can beextremely difficult in a maximally cooperative considered a subset of our FQ algorithm. The Fuzz-environment and impossible in a noncooperative balls also had a form of type-of-service priorityone, merely reflects the inability of FCFS gateways queueing but, as with the Prue and Postel algo-to distinguish between users and to allocate rithm, allocations were not made on a user-by-userbandwidth, promptness, and buffer space indepen- basis. The third policy is the Random-Droppingdently. (RD) buffer management policy in which, when theIn the design of the fair queueing algorithm, we buffer is overloaded, the packet to be dropped ishave attempted to address these issues. The algo- chosen at random CPer89, JacBBabl. This algorithmrithm does allocate the three quantities separately. greatly alleviates the problem of segregation. How-Moreover, the promptness allocation is not uniform ever, it is now generally agreed that the RD algo-across users and is somewhat tunable through the rithm does not provide fair bandwidth allocation, isparameter 6. Most importantly, fair queueing vulnerable to ill-behaved sources, and is unable tocreates a firewall that protects well-behaved provide reduced delay to conversations using lesssources from their uncouth brethren. Not only does than their fair share of bandwidth [She89b, Zha89.this allow the current generation of flow control Has891.algorithms to function more effectively, but it 10
There are two objections that have been raised in 7. Referencesconjunction with fair queueing. The first is thatsome source-destination pairs, such as file server or [DEC87a]R. Jain and K. K. Ramakrishnan,mail server pairs, need more than their fair share “Congestion Avoidance in Computer Net-of bandwidth. There are several responses to this. works with a Connectionless NetworkFirst, FQ is no worse than the status quo. FCFS Layer, Part I-Concepts, Goals, and Alter-gateways already limit well-behaved hosts, using natives”, DEC Technical Report TR-507,the same path and having only one stream per Digital Equipment Corporation, Aprilsource destination pair, to their fair share of 1987.bandwidth. Some current bandwidth hogs achieve [DEC87b]K. K. Ramakrishnan and R. Jain,their desired level of service by opening up many “Congestion Avoidance in Computer Net-streams, since FCFS gateways implicitly define works with a Connectionless Networkstreams as the unit of user. Note that that there Layer, Part II-An Explicit Binary Feed-are no controls over this mechanism of gaining back Scheme”, DEC Technical Reportmore bandwidth, leaving the network vulnerable to TR-508, Digital Equipment Corporation,abuse. If desired, however, this same trick can be April 1987.introduced into fair queueing by merely changingthe notion of user. This would violate layering, [DEC87c]D.-M. Chiu and R. Jain, “Congestionwhich is admittedly a serious drawback. A better Avoidance in Computer Networks with aapproach is to confront the issue of allocation Connectionless Network Layer, Part III-directly by generalizing the algorithm to allow for Analysis of Increase and Decrease Algo-arbitrary bandwidth priorities. Assign each pair a rithms”, DEC Technical Report TR-509,number n, which represents how many queue slots Digital Equipment Corporation, Aprilthat conversation gets in the bit-by-bit round robin. 1987.The new relationships are Nnc=x:na with the sum [DEC87d]K. K. Ramakrishnan, D.-M. Chiu, and R.over all active conversations, and P,” is set to be Jain “Congestion Avoidance in Computerl/n, times the true packet length. Of course, the Networks with a Connectionless Networktruly vexing problem is the politics of assigning the Layer, Part IV-A Selective Binary Feed-priorities na, Note that while we have described an back Scheme for General Topologies”,extension that provides for different relative shares DEC Technical Report TR-510, Digitalof bandwidth, one could also define these shares as Equipment Corporation, November 1987.absolute fractions of the bandwidth of the outgoingline. This would guarantee a minimum level of [Fra841 A. Fraser and S. Morgan, “Queueing andservice for these sources, and is very similar to the Framing Disciplines for a Mixture of DataVirtual Clock algorithm of Zhang [Zha89]. Traffic Types”, AT&T Bell Laboratories Technical Journal, Volume 63, No. 6, ppThe other objection is that fair queueing requires 1061-1087,1984.the gateways to be smart and fast. There is techno-logical question of whether or not one can build FQ [Gaf841 E. Gafni and D. Bertsekas, “Dynamicgateways that can match the bandwidth of fibers. If Control of Session Input Rates in Com-so, are these gateways economically feasible? We munication Networks”, IEEE Transac-have no answers to these questions, and they do tions on Automatic Control, Volume 29,indeed seem to hold the key to the future of fair No. 10, pp 1009-1016, 1984.queueing. [Ger80] M. Gerla and L, Kleinrock, “Flow Control: A Comparative Survey”, IEEE Transac-6. Acknowledgements tions on Communications, Volume 28, pp 553-574, 1980.The authors gratefully acknowledge H. Murray’simportant role in designing an earlier version of [Gre891 A. Greenberg and N. Madras, privatethe fair queueing algorithm. We wish to thank A. communication, 1989.Dupuy for assistance with the Nest simulator. [Hah861 E. Hahne, “Round Robin Scheduling forFruitful discussions with J. Nagle, E. Hahne, A. Fair Flow Control in Data Communica-Greenberg, S. Morgan, K. K. Ramakrishnan, V. tion Networks”, Report LIDS-TH-1631,Jacobson, M. Karels, D. Greene and the End-to-End Laboratory for Information and DecisionTask Force are also appreciated. In addition, we Systems, Massachusetts Institute of Tech-are indebted to H. Sturgis for bringing the segrega- nology, Cambridge, Massachusetts.tion result in section 4 to our attention, and to him December, 1986.and R. Hagmann and C. Hauser for the relatedlively discussions. We also wish to thank the group [Has891 E. Hashem, private communication. 1989.at MIT for freely sharing their insights: L. Zhang, D-b891 A. Heybey and C. Davin, private com-D. Clark, A. Heybey, C. Davin, and E. Hashem. munication, 1989. 11
[IS0861 International Organization for Standardi- [Nes88] D. Bacon, A. Dupuy, J. Schwartz, and Y. zation (ISO), “Protocol for Providing the Yemini, “Nest: A Network Simulation Connectionless Mode Network Service”, and Prototyping Tool”, Dallas Winter Draft International Standard 8473, 1986. 1988 Usenix Conference Proceedings, pp. 71-78, 1988. [Jac88al V. Jacobson, “Congestion Avoidance and Control”, ACM SigComm Proceedings, pp [Per891 IETF Performance and Congestion Con- 314-329, 1988. trol Working Group, “Gateway Congestion Control Policies”, draft, 1989.[Jac88bl V. Jacobson, private communication, 1988. [Pos81] J. Postel, “Internet Protocol”, RFC 791 1981.[Jai861 R. Jain, “Divergence of Timeout Algo- rithms for Packet Retransmission”, [Pru88] W. Prue and J. Postel, “A Queueing Algo- Proceedings of the Fifth Annual Interna- rithm to Provide Type-of-Service for IP tional Phoenix Conference on Computers Links”, RFC1046, 1988. and Communications, pp 1162-1167, 1987. [She89a] S. Shenker, “Game-Theoretic Analysis of[Kar87] P. Karn and C. Partridge, “Improving Gateway Algorithms”, in preparation, Round-Trip Time Estimates in Reliable 1989. Transport Protocols”, ACM SigComm Proceedings, pp 2-7, 1987. [She89b] S. Shenker, “Comments on the IETF Per- formance and Congestion Control Work-[Kat871 M. Katevenis, “Fast Switching and Fair ing Group Draft on Gateway Congestion Control of Congested Flow in Broadband Control Policies”, unpublished, 1989. Networks”, IEEE Journal on Selected Areas in Communications, Volume 5, No. [Stu88] H. Sturgis, private communication, 1988. 8, pp 1315-1327, 1987. [USC811 USC Information Science Institute,[Lo871 C.-Y. Lo, “Performance Analysis and “Transmission Control Protocol”, RFC Application of a Two-Priority Packet 793, 1981. Queue”, AT&T Technical Journal, [Xer81] Xerox Corporation, “Internet Transport Volume 66, No. 3, pp 83-99, 1987. Protocols”, XSIS 028112, 1981.[Lua881 D. Luan and D. Lucantoni, “Throughput [Zha89] L. Zhang, “A New Architecture for Packet Analysis of an Adaptive Window-Based Switching Network Protocols”, MIT Ph. D. Flow Control Subject to Bandwidth Thesis, forthcoming, 1989. Management”, Proceedings of the Interna- tional Teletraffic Conference, 1988.[Man891 A. Mankin and K. Thompson, “Limiting Factors in the Performance of the Slo- start TCP Algorithms”, preprint.[Mor891 S. Morgan, “Queueing Disciplines and Passive Congestion Control in Byte- Stream Networks”, IEEE INFOCOM ‘89 Proceedings, pp 711-720, 1989.[Mi187] D. Mills and W.-W. Braun, “The NSFNET Backbone Network”, ACM SigComm Proceedings, pp 191-196, 1987.Mi1881 D. Mills, “The Fuzzball”, ACM SigComm Proceedings, pp 115-122. 1988.Nag841 J. Nagle, “Congestion Control in IPlTCP Networks, Computer Communication Review, Vol 14, No. 4, pp 11-1’7, 1984.[Nag851 J. Nagle, “On Packet Switches with Infinite Storage”, RFC 896 1985.[Nag871 J. Nagle, “On Packet Switches with Infinite Storage”, IEEE Transactions on Communications, Volume 35, pp 435-438, 1987.