254 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 2, APRIL 2008examples. Suppose that a downloading peer wants to download (second-order statistics) or higher-order statistics associateda ﬁle of size from possible source peers. Let be the av- with the capacity ﬂuctuation in time will need to be taken intoerage end-to-end available capacity between the downloading account, even for ﬁnding a source peer with minimum averagepeer and the th source peer . Notice that the download time.actual value of is unknown before the downloading peer actu-ally connects to the source peer . The average service capacity B. Our Contributionof the network, , is give by . Intuitively, the The examples in Section I-A give us a motivation to seekaverage download time, , for a ﬁle of size would be methods that can reduce the download time of each individual user. The main contribution of this paper is to show that the pre- (1) dicted value given in (1) is actually achievable without requiring any global information, regardless of the distribution of service In reality, however, (1) is far different from the true average capacities and correlations in a P2P network.download time for each user in the network. The two main rea- In this paper, we ﬁrst characterize the relationship betweensons to cause the difference are (i) the spatial heterogeneity in the heterogeneity in service capacity and the average downloadthe available service capacities of different end-to-end paths and time for each user, and show that the degree of diversity in ser-(ii) the temporal correlations in the service capacity of a given vice capacities has negative impact on the average downloadsource peer. We ﬁrst consider the impact of heterogeneity. Sup- time. After we formally deﬁne the download time over a sto-pose that there are two source peers with service capacities of chastic capacity process, we prove that the correlations in the kbps and kbps, respectively, and there is capacity make the average download time much larger than theonly one downloading peer in the network. Because the down- commonly accepted value , where is the average capacityloading peer does not know the service capacity of each source of the source peer. It is thus obvious that the average down-peer1 prior to its connection, the best choice that the down- load time will be reduced if there exists a (possibly distributed)loading peer can make to minimize the risk is to choose the algorithm that can efﬁciently eliminate the negative impact ofsource peers with equal probability. In such a setting, the av- both the heterogeneity in service capacities over different sourceerage capacity that the downloading peer expects from the net- peers and the correlations in time of a given source peer.work is kbps. If the ﬁle size is 1 MB, we In practice, most P2P applications try to reduce the down-predict that the average download time is 64 seconds from (1). load time by minimizing the risk of getting stuck with a ‘bad’However, the actual average download time is 1/2(1 MB/100 source peer (the connection with small service capacity) bykbps) 1/2(1 MB/150 Mbps) 66.7 seconds! Hence, we see using smaller ﬁle sizes and/or having them downloaded overthat the spatial heterogeneity actually makes the average down- different source peers (e.g., parallel download).2 In otherload time longer. words, they try to reduce the download time by minimizing Suppose now that the average service capacity can be known the bytes transferred from the source peer with small capacity.before the downloading peer makes the connection. Then, an ob- However, we show in this paper that this approach cannotvious solution to the problem of minimizing the average down- effectively remove the negative impact of both the correlationsload time is to ﬁnd the peer with the maximum average capacity, in the available capacity of a source peer and the heterogeneityi.e., to choose peer with the average capacity ( for in different source peers. This approach may help to reduceall ), as the average download time over source peer would average download time in some cases but not always. Rather,be given by . We assume that each peer can ﬁnd the ser- a simple and distributed algorithm that limits the amount ofvice capacity of its source peers via packet-level measurements time each peer spends on a bad source peer, can minimize theor short-term in-band probing . average download time for each user almost in all cases as Consider again the previous two-source peer example with we will show in our paper. Through extensive simulations, we 100 kbps and 150 kbps. As we want to minimize the also verify that the simple download strategy outperforms alldownload time, an obvious choice would be to choose source other schemes widely used in practice under various networkpeer 2 as its average capacity is higher. Now, let us assume conﬁgurations. In particular, both the average download timethat the service capacity of source peer 2 is not a constant, and the variation in download time of our scheme are smallerbut is given by a stochastic process taking values 50 or than any other scheme when the network is heterogeneous250 kbps with equal probability, thus giving (possibly correlated) and many downloading peers coexist with150 kbps. If the process is strongly correlated over source peers, as is the case in reality.time such that the service capacity for a ﬁle is likely to be The rest of the paper is organized as follows. In Section II,the same throughout the session duration, it takes on average we provide some background on service capacity characteris-(1 MB/50 kbps 1 MB/250 kps)/2 96 seconds, while it tics in a P2P network in terms of the heterogeneity over differenttakes only 80 seconds to download the ﬁle from source peer 1. connections and correlations over time for a given connection.In other words, it may take longer to complete the download In Section III, we analyze the impact of heterogeneity in ser-when we simply choose the source peer with the maximum av- vice capacities as well as the correlations in a given connec-erage capacity! It is thus evident that the impact of correlations tion on each user’s average download time. In Section IV, we show that our simple and distributed algorithm and can virtually 1Although the ﬂuctuation seen by a downloader can be caused by changeboth in the status of the end-to-end network path and in the status of the source 2For example, Overnet, BitTorrent, and Slurpie divide ﬁles into 9500 KB, 256peer itself, we use “service capacity of a source peer” to unify the terminology KB, and 256 KB ﬁle segments (chunks), respectively –, and a down-throughout the paper. loader can transfer different chunks from different source peers. Authorized licensed use limited to: Saranathan College of Engineering. Downloaded on August 12, 2009 at 04:02 from IEEE Xplore. Restrictions apply.
CHIU AND EUN: MINIMIZING FILE DOWNLOAD TIME IN STOCHASTIC PEER-TO-PEER NETWORKS 255eliminates all the negative impacts of heterogeneity and corre-lations. Our scheme thus greatly reduces the average downloadtime and achieves the simple relation in (1) regardless of net-work settings. Section V provides simulation results to test ouralgorithm and compare with others under various network set-tings, and we conclude our work in Section VI. Fig. 1. Typical variation in end-to-end available bandwidth based on the results II. BACKGROUND in  and . Drastic changes usually occur in the scale of minutes. In this section, we brieﬂy describe the characteristics of theservice capacity that a single user receives from the network variations in the capacity are mainly due to the window size ﬂuc-from the user’s perspective. Speciﬁcally, we consider the het- tuation in TCP, while the long-term variations are due to net-erogeneity of service capacities over different network paths and work congestion, changes in workload or the number of con-the stochastic ﬂuctuation of the capacity over time for a given necting users at the source peer, etc. The long-term ﬂuctuationsource peer. typically lasts over a longer time scale, say, few minutes up to several hours.A. Heterogeneity of Service Capacity As illustrated in the introduction, both the heterogeneity over different source peers and the correlations of the capacity in a In a P2P network, just like any other network, the service given source peer have signiﬁcant impact on the average down-capacities from different source peers are different. There are load time. To the best of our knowledge, however, there has beenmany reasons for this heterogeneity. On each peer side, phys- no result available in the literature addressing these issues. Allical connection speeds at different peers vary over a wide range the existing studies have simply assumed that the service ca- (e.g., DSL, Cable, T1, etc.). Also, it is reasonable to assume pacity is given by a constant (its average value) for the durationthat most peers in a typical P2P network are just personal com- of a download. Consequently, the download time of a ﬁle of sizeputers, whose processing powers are also widely different. The is simply given by , where is the average service ca-limitation in the processing power can limit how fast a peer can pacity. As will be seen later on, however, this is true only whenservice others and hence limits the service capacity. the service capacity is constant or independent and identically On the network side, peers are geographically located over distributed (i.i.d.) over time, neither of them is true in reality. Ina large area and each logical connection consists of multiple the next section, we will analyze the impact of these two factorshops. The distance between two peers and the number of hops on the per-user performance in terms of the average downloadsurely affect its round-trip time (RTT), which in turns affects time.the throughput due to the TCP congestion control. Moreover,in a typical P2P network, this information is usually “hidden”when a user simply gets a list of available source peers that have III. CHARACTERIZING THE DOWNLOAD TIME IN A P2P NETWORKcontents the user is looking for. Note that the aforementioned factors do not change over the We consider our network as a discrete-time system with eachtimescale of any typical P2P session (days or a week). Hence, time slot of length . For notational simplicity, throughout thewe assume that those factors mainly determine the long-term paper, we will assume that the length of a time slot is normalizedaverage of the service capacity over a given source peer. to one, i.e., . Let denote the time-varying service ca- pacity (available end-to-end bandwidth) of a given source peerB. Correlations in Service Capacity at time slot over the duration of a download. Then, the download time for a ﬁle of size is deﬁned as While the long-term average of the service capacity is mainlygoverned by topological parameters, the actual service capacityduring a typical session is never constant, but always ﬂuctuates (2)over time , . There are many factors causing this ﬂuc-tuation. First, the number of connection a source peer allows is Note that is a stopping time or the ﬁrst hitting time of a processchanging over time, which creates a ﬂuctuation in the service ca- to a ﬁxed level .pacity for each user. Second, some user applications running on If are i.i.d., then by assuming an equalitya source peer (usually a PC), such as online games, may throttle in (2), we obtain from Wald’s equation  thatthe CPU and impact the amount of capacity it can offer. Third,temporary congestion at any link in the network can also reduce (3)the service capacity of all users utilizing that link. Fig. 1 shows a typical available end-to-end capacity ﬂuctua-tion similar to that presented in  and . The time scale for The expected download time, measured in slots, then becomesthe ﬁgure in the measurement study is on the order of minutes. . Note that (3) also holds if is con-We know from  that a typical ﬁle download session can last stant (over ). Thus, when the service capacity is i.i.d. over timefrom minutes to hours for a ﬁle size of several megabytes. This or constant, there exists a direct relationship between the av-implies that the service capacity over the timescale of one down- erage service capacity and the average download time, as hasload session is stochastic and correlated. In Fig. 1, the short-term typically been assumed in the literature. Authorized licensed use limited to: Saranathan College of Engineering. Downloaded on August 12, 2009 at 04:02 from IEEE Xplore. Restrictions apply.
256 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 2, APRIL 2008A. Impact of Heterogeneity in Service Capacity From (8), we see that the difference between the predicted av- erage download time using (1) and the actual average value is We ﬁrst consider the impact of heterogeneous service capac- governed by two factors, the ﬁle size and the variance of theities of different source peers. In order to decouple the effect of service capacity, . First, the actual average downloadcorrelations from that of heterogeneity, in this section, we as- time will be different from (5) if the ﬁle is large. Second, moresume that Wald’s equation holds true for each source peer (i.e., importantly, if the service capacities over different source peersthe service capacity of a given source peer is either constant or vary over a wide range, the actual download time will be muchi.i.d. over time). But we allow the average capacities for dif- larger than (5).ferent source peers to be different. We will consider the impactof correlations in Section III-B. B. First Hitting Time of a Correlated Stochastic Process Let be the number of source peers in the network ( dif- In this section we show that the expected ﬁrst hitting time offerent end-to-end paths) and be the service capacity of a “positively correlated process” is larger than that of an i.i.d.source peer at time slot . We assume that is either con- counterpart. Consider a ﬁxed network path between a down-stant or i.i.d. over such that (3) holds. Let be loading peer and its corresponding source peer for a ﬁle of sizethe average capacity of source peer . Then, the average service . Let be a stationary random process denoting the avail-capacity the network offers to a user becomes able capacity over that source at time slot . We will assume that is positively correlated over time. Then, as before, we can (4) deﬁne the download time of a ﬁle (or the ﬁrst hitting time of the process to reach a level ) as , where the subscript “ ” means that is a correlated stochastic process.where and is the arithmetic mean Suppose now that we are able to remove the correlations fromof the sequence . Thus, one may expect that the . Let be the resulting process and be the stoppingaverage download time, , of a ﬁle of size would be time for the process to reach level , where the subscript “ ” now means that is independent over time. Then, (5) again from Wald’s equation, we have As we mentioned earlier, however, the actual service capacityof each source peer remains hidden unless a network-wide probeis conducted. So the common strategy for a user is to randomly First, as introduced earlier, consider the case that ispick one source peer and keep the connection to it until the 100% correlated over time, i.e., for some randomdownload completes. If the user connects to source peer (with variable for all . Then, the download time becomesservice capacity ), the average download time over that assuming an equality in (2). Hence, from Jensen’ssource peer becomes from (3). Since the user can choose inequality, we haveone of source peers with equal probability, the actual averagedownload time in this case becomes i.e., the average ﬁrst hitting time of an 100% correlated process (6) is always larger than that of an i.i.d. counterpart. In order to char- acterize any degree of positive correlations in , we need thewhere is the harmonic mean of deﬁned by following deﬁnition : . Because 3, it follows Deﬁnition 1: Random variables are said tothat (6) (5). This implies that the actual average download be ‘associated’ if for all increasing functions andtime in a heterogeneous network is always larger than that given (9)by “the average capacity of the network” as in (5). To quantify the difference between (6) and (5), we adopt sim- where , and we say is an increasing func-ilar techniques as in . Let be the random variable taking tion if whenever forvalues of with equal probability, i.e., . for all . Consider the following Taylor expansion Relation (9) characterizes the positive dependence among theof the function around some point : random variables . In words, if some of them become larger, then the other random variables are also likely (7) to be larger. Note that (9) implies positive correlations in by setting and . Deﬁnition 1 can beLetting and taking expectation in both sides generalized to a stochastic process as follows.of (7) give Deﬁnition 2: The stochastic process is said to be associated if for all and , the random (8) variables are associated. In fact, the set of associated processes comprises a large class 3The arithmetic mean is always larger than or equal to the harmonic mean, of processes. Perhaps the most popular example is of the fol-where the equality holds when all c ’s are identical. lowing type: Authorized licensed use limited to: Saranathan College of Engineering. Downloaded on August 12, 2009 at 04:02 from IEEE Xplore. Restrictions apply.
CHIU AND EUN: MINIMIZING FILE DOWNLOAD TIME IN STOCHASTIC PEER-TO-PEER NETWORKS 257 Theorem 4.3.13. in : Let be a stochastic process Thus, we havewith static space of the form (10)If the are mutually independent and independent of This completes the proof. , then is associated if is increasing in . Theorem 1 states that the average download time of a ﬁle from Stochastic processes of the form (10) constitute large por- a source peer with correlated service capacity (in the sense oftion of Markov processes. For example, any auto-regressive type association deﬁned in (9)) is always larger than that of an i.i.d.model with positive correlation coefﬁcient can be written in the counterpart. In the subsequent section, we show the relation-form of (10). Speciﬁcally, for an AR-1 sequence deﬁned ship between the degree of correlation of a process and the av-by erage ﬁrst ﬁtting time of that process, and illustrate how much can be larger than . From previous discus- sions, we know that in general the average download time, , should be calculated using rather than the commonlywhere and is a sequence of used .i.i.d. random variables and independent of , we can write where . Since C. First Hitting Time and Degree of Correlationis increasing in , the process is associated. We now present our theorem. To illustrate the relationship between the average download Theorem 1: Suppose that is associated. Then, time and the degree of correlation in the available bandwidthwe have , assume that is given by a stationary ﬁrst-order au- toregressive process (AR-1), i.e., (16) Proof: First, for any given , we set and , where Here, is a sequence of i.i.d. random variables with zero . Note that both functions and mean, which represents a noise term of the process. Then, fromare increasing. Observe that the stationarity of the process, we get (11) (17)Thus, we have, for any , We vary the constant such that the average capacity is always ﬁxed to under different . Since the avail- able bandwidth cannot be negative, we limit the range of such that , while keeping the same mean. The ﬁle (12) size is and the noise term, , is chosen to be uni- formly distributed over , and to see how the noise term affects the average download time. (13) Remark 1: The choice of the autoregressive process is for the sake of presentation, not to actually reﬂex the real ﬂuctuation inwhere the inequality in (12) follows since is associated, an end-to-end available bandwidth in real world. It is easy toand (13) is from the stationarity of in and (11). Since generate AR-1 process with the same mean but different cor- , it fol- relation structures. Similar results can be obtained if the AR-1lows that process is replaced by other processes with more complicated (14) correlation structures. Fig. 2(a) shows the relationship between the average down- Now, let us assume that an equality holds in the deﬁnition of load time and the degree of correlation of the process (16) for (see (2)). Then, we have different and . As the degree of correlation increases, the average download time increases. In particular, for a heavily cor- related process, the average download time can be about 40% larger than that for a uncorrelated or weakly correlated process, regardless of different noise terms. In other words, the long term (15) variation in the service capacity is the main determining factor of the average download time, and the short-term random noiseSubstituting (14) into (15) yields in the service capacity, such as the one caused by TCP conges- tion control mechanism over short time scales (RTTs), does not have signiﬁcant impact on the average download time. To see the impact of the variance of itself, we restrict the range of to some ﬁxed interval. For example, means that we set whenever it becomes smaller than 9 and when larger than 11. Fig. 2(b) shows the Authorized licensed use limited to: Saranathan College of Engineering. Downloaded on August 12, 2009 at 04:02 from IEEE Xplore. Restrictions apply.
258 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 2, APRIL 2008 time, utilizing different source peers either simultaneously (par- allel downloading) or sequentially within one download session would be a good idea to diversify the risk. Parallel downloading improves the performance by reducing the ﬁle size over the “worst” source peer and also may increase the service capacity one receives from the network by utilizing “unused” capacities of other source peers. If a downloader utilizes one source peer at a time, switching around seems to be a good strategy to avoid the “bad” source peer. Now, the question is, “What is the crite- rion for switching, i.e., is it chunk-based or time-based?” In this section we will analyze the performance of (i) parallel down- loading; (ii) random chunk-based switching; and (iii) random time-based (periodic) switching. Different strategies have different impact on the average download time of each peer, which may result in different system dynamics as well, e.g., how fast a downloader can start to contribute (become a source peer) or how fast a peer leaves the system after ﬁnishing download. If there is no peer leaving the system and all peers are willing to share after they complete their download (either the entire ﬁle or a chunk), the aggregate service capacity in the system keeps increasing as time goes on because the number of source peers continuously grows. In this case, the dynamics in the increase of aggregate service capacity becomes the dominant factor in the average download time for each peer. On the other hand, if no peer is willing to share after download, the aggregate capacity will then eventually drop to zero, thus throttling all the performance metrics. In reality, however, the P2P network will reach a steady state at some point in which the peer arrivals and departures are balanced and the aggregate service capacity remains around some constantFig. 2. Relationship between the average download time and different degrees with little variation as shown in . This suggests that theof correlation . (a) Under different noise term (t) in (16). (b) Under different number of source peers in the system will also be around somerange for C (t). constant with little ﬂuctuation in the steady state. In this paper, we are mostly interested in the impact of stochastic variations of capacities on the average download time of each peer in therelationship between the average download time and the degree steady state, rather than in the impact of sources–downloadersof correlation of under different variation range for . dynamics in the transient period, which is beyond the scope ofWhen the range of ﬂuctuation of gets smaller, the down- this paper.load time is less affected by the correlation of the process. This Before we start our analysis, we have the following assump-is well expected since the process ﬂuctuates only within tions: and thus behaves more like a constant process. In con- i) The service capacity of a source is constant within onetrast, when the range for is large, the impact of correlation time slot.becomes apparent as shown in Fig. 2(b). ii) Each downloader selects its source independently. In real data networks, the available capacity of a connection iii) Each downloader makes a blind choice, i.e., the sourcestypically shows wild ﬂuctuation; it becomes very low when con- are randomly chosen uniformly over all available sources.gestion occurs, and it can reach up to the maximum link band- Assumption (i) is reasonable since it is expected that there is nowidth when things go well. In addition, as technology advances, major event that triggers dramatic ﬂuctuation in the service ca-people are getting links of higher and higher speed, hence the pacity within a short period of time. There may be small short-range of available capacity ﬂuctuation is also likely to increase. term ﬂuctuations, on the order of seconds, in the service capacityTherefore, it is very important to consider the effect of correla- due to the nature of the network protocol, such as TCP con-tion in capacity over time when we calculate the average down- gestion window changes, or OS interrupt handling, etc. Theseload time of a ﬁle transfer. changes however do not impose serious impact on the service capacity. Thus, we are not interested in such small short-term IV. MINIMIZING AVERAGE DOWNLOAD TIME variations, but are more interested in the ﬂuctuation on a longer OVER STOCHASTIC CHANNELS time scale caused by change in the number of connections at a Intuitively, if a downloader relies on a single source peer source peer or change in network congestion status, which allfor its entire download, it risks making an unlucky choice of usually last for longer time (say, minutes to hours). We have thea slow source resulting in a long download. Since the service assumption (ii) because it is impractical for any downloader tocapacity of each source peer is different and ﬂuctuates over know how other downloaders choose their source peers in the Authorized licensed use limited to: Saranathan College of Engineering. Downloaded on August 12, 2009 at 04:02 from IEEE Xplore. Restrictions apply.
CHIU AND EUN: MINIMIZING FILE DOWNLOAD TIME IN STOCHASTIC PEER-TO-PEER NETWORKS 259network. Hence the downloader cannot not make its source se- a simple and distributed algorithm with no global informationlection decision based on other downloaders’ decision. Assump- such that the value in (1), or , can be achieved undertion (iii) is based on the fact that the downloader does not know almost all network settings.the service capacity of each source peer a priori. Although some We have already seen that parallel downloading may notprotocols require peers to broadcast information about its phys- achieve even when there is only one user in theical connection speed, it is hard to tell the “true” instant service network. Further, it is shown ,  that in a multi-usercapacity of each source peer due to many factors such as compe- network, maintaining just a few parallel connections, say, 4tition among other peers, changing workload of the source peer, to 6, is better than having parallel connections to all possibleor the network congestion status. Therefore, a simple way to se- source peers. Hence, if there is an algorithm that can increaselect a source peer is just to make a blind choice. the performance of each individual connection among such a few parallel connection, then each individual user may achieveA. Effect of Parallel Downloading the download time predicted by (1) or even better. Parallel downloading is one of the most noticeable way to re-duce the download time , . If the ﬁle is divided into B. Random Chunk-Based Switchingchunks of equal size, and simultaneous connections are used, In the random chunk-based switching scheme, the ﬁle of in-the capacity for this download session becomes , terest is divided into many small chunks just as in the parallelwhere is the service capacity of th connection. Intuitively, download scheme. A user downloads chunks sequentially onethis parallel downloading seems to be optimal in all cases. But, it at a time. Whenever a user completes a chunk from its cur-is worth noting that the download time for parallel downloading rent source peer, the user randomly selects a new source peeris given by rather than and connects to it to retrieve a new chunk. In this way, if the , where is the download time of a chunk over th connec- downloader is currently stuck with a bad source peer, it willtion. This is because the chunk that takes the longest time to stay there for only the amount of time required for ﬁnishing onecomplete determines the entire download session. chunk. The download time for one chunk is independent of that To illustrate that parallel downloading is better than single of the previous chunk. Intuitively, switching source peers baseddownload, we consider the following simple example. Assume on chunk can reduce the correlation in service capacity betweenthat there are only two source peers in the network, and chunks and hence reduce the average download time. However,are the service capacities of the two source peers. Without loss there is another factor that has negative impact on the averageof generality, we assume that . If parallel downloading download time, the spatial heterogeneity.is used for downloading a ﬁle of size from the network, the First, suppose that there is no temporal correlation in ser-download time is given by vice capacity and Wald’s equation holds for each source peer. A ﬁle of size is divided into chunks of equal size, and let be the download time for chunk . Then, the total download time, , is . Since each chunk randomly chooses one of source peers (with equal probability), the ex-For the case of single download, the average download time pected download time will be is (18) The result in (18) is identical to the download time given in (6) Now, given that parallel download is better than single down- where a user downloads the entire ﬁle from an initially randomlyload, one may ask whether it is as good as the predicted value in chosen source peer. In other words, the chunk-based switching(1). To answer this, let us recall the two-source peers example. is still subject to the “curse” of spatial heterogeneity. WhileFrom (1), the predicted download time is there is no beneﬁt of the chunk-based switching from the av- erage download time point of view, it turns out that this scheme still helps reduce the variance of the download time under a rel- atively smaller number of users by diversifying the risk with smaller chunks. [See Fig. 5(b).]An easy calculation shows if . Thus, In the chunk-based switching, if we get stuck in a sourceeven in the network with one user, parallel downloading may peer with very low service capacity, downloading a ﬁx amountnot reduce the download time to the predicted value in all cases. of bytes from that source peer may still take a long time. WeInstead, the performance of parallel download depends upon could avoid this long wait by making the size of each chunk verythe distribution of the underlying service capacities and could small, but this then would cause too much overhead associatedbe much worse than the ideal case, . Indeed, it is shown with switching to many source peers and integrating those manyin  that if we can make the chunk-size proportional to the chunks into a single ﬁle. Therefore, instead of waiting until weservice capacity of each source peer, parallel downloading can ﬁnish downloading a ﬁxed amount of data (chunk or ﬁle), weyield the optimal download time. But such scheme requires may want to get out of that bad source peer after some ﬁxedglobal information of the network. One of our goals is to ﬁnd amount of time. In other words, we randomly switch based on Authorized licensed use limited to: Saranathan College of Engineering. Downloaded on August 12, 2009 at 04:02 from IEEE Xplore. Restrictions apply.
260 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 2, APRIL 2008time. In the subsequent section, we will investigate the perfor-mance of this random switching based on time and show that itoutperforms all the previous schemes in the presence of hetero-geneity of service capacities over space and temporal correla-tions of service capacity of each source peer.C. Random Periodic Switching Fig. 3. Operation of source selection function U (t) for random periodic switching. In this section, we analyze a very simple, distributed algo-rithm and show that it effectively removes correlations in thecapacity ﬂuctuation and the heterogeneity in space, thus greatly Then, since can take values only from , thereducing the average download time. As the algorithm will be actual available capacity at time can be written asimplemented at each downloading peer in a distributed fashion,without loss of generality, we only focus on a single downloaderthroughout this section. In our model, there are possible source peers for a ﬁxeddownloader. Let ( and ) de-note the available capacity during time slot of source peer . for both the permanent connection and the random periodicLet be a source selection function for the switching strategies. Since each downloader chooses a sourcedownloader. If , this indicates that the downloader se- peer independently of the available capacity, is alsolects path and the available capacity it receives is during independent from , and so is . Note that, fromthe time slot . We assume that each is stationary in and for any , we have of different source peers are indepen-dent.4 We however allow that they have different distributions,i.e., are different for different (heterogeneity).For any given , the available capacity is correlated overtime . As before, when each connection has the same proba-bility of being chosen, the average service capacity of the net-work is given by . In this setup, we can consider the following two schemes: (i) (19)permanent connection, and (ii) random periodic switching. Forthe ﬁrst case, the source selection function does not change in i.e., the average available capacity for the two source selectiontime . When the searching phase is over and a list of avail- strategies are the same.able source peers is given, the downloader will choose one of In order to analyze how the two different strategies affect thethem randomly with equal probability. In other words, correlation in , we consider the correlation coefﬁcient of where is a random variable uniformly distributed over deﬁned as . For example, if the downloader chooses at time 0, then it will stay with that source peerpermanently until the download completes. For the random periodic switching, the downloader randomlychooses a source peer at each time slot, independently of every- Then, we have the following result.thing else. In other words, the source selection function Proposition 1: Let and denote the correla-forms an i.i.d. sequence of random variables, each of which is tion coefﬁcient of under the permanent connection and theagain uniformly distributed over . Fig. 3 illustrates random periodic switching, respectively. Then, we havethe operation of the source selection function for randomperiodic switching. In this ﬁgure, source 1 is selected at time 1,source is selected at time 2, and so on. Let us deﬁne an indicator function Proof: Since the average capacity for both strategies re- mains the same (see (19)), without loss of generality, we can assume that for any source peer has zero mean by sub- 4We note that different paths (overlay) may share the same link at the net- tracting if necessary. From the independence amongwork core, but still, the bottleneck is typically at the end of network, e.g., access different source peers, we have, for any ,network type, or CPU workload, etc. Thus, the independence assumption hereis reasonable. (20) Authorized licensed use limited to: Saranathan College of Engineering. Downloaded on August 12, 2009 at 04:02 from IEEE Xplore. Restrictions apply.
CHIU AND EUN: MINIMIZING FILE DOWNLOAD TIME IN STOCHASTIC PEER-TO-PEER NETWORKS 261Then, the covariance of becomes we deﬁne as the download time for a ﬁle of size under the random periodic switching, we have (21) (24)From (20), we can rewrite (21) as We then have the following comparison result between the permanent connection and periodic switching. (22) Proposition 2: Suppose that the process for each is associated (i.e., it is correlated over time ). Let and First, consider the case of . Then, it follows that be the download time for the permanent connection and for the random periodic switching, respectively. Then, we haveHence, from (22) with , the variance of is given by Proof: Assume that the ﬁle size is . Since is asso- ciated, from Theorem 1, we have (25) (23) for any given source peer . Observe now thatregardless of the strategies for . Now, consider the case of . Under the permanent con- (26)nection strategy, since all the time, we get (27) (28)On the other hand, for the random periodic switching, we have where (26) is from (25), (27) is from Jensen’s inequality and the convexity of a function for , and (28) is from (24). This completes the proof.since and for are independent. Proposition 2 shows that our random periodic switching Finally, set . Then, from (22) and since the variance strategy will always reduce the average download time com-of remains the same for both strategies as in (23), we have pared to the permanent strategy and that the average download and this completes the proof. time under the random periodic switching is given by From Proposition 1, we see that under the random periodic (see (27)). Note that this was made possible since the randomswitching strategy, the correlation of is times smaller periodic switching removes the negative impact of both thethan that of permanent connection strategy. For example, when heterogeneity and the correlations. In addition, our algorithmeach downloader has about 10 available source peers is extremely simple and does not require any information about , the correlation coefﬁcient of the newly obtained capacity the system.process under our random periodic switching is no more than0.1 regardless of the correlations present in the original capacity D. Discussionﬂuctuation. So, by using our random periodic switching, we can So far, we have analyzed the performance of three differentalways make the capacity process very lightly correlated, or al- schemes that utilize the spatial diversity of the network to im-most independent. From Fig. 2, we see that the average down- prove per-user performance in terms of the average downloadload time for a lightly correlated process is very close to that time. We have considered (i) parallel downloading; (ii) randomgiven by Wald’s equation. It is thus reasonable to assume that chunk-based switching; and (iii) random periodic switching.Wald’s equation holds for the lightly correlated process The parallel downloading may perform well if the capacityunder our random periodic switching strategy. Speciﬁcally, if of each possible source peer is known so as to allocate larger Authorized licensed use limited to: Saranathan College of Engineering. Downloaded on August 12, 2009 at 04:02 from IEEE Xplore. Restrictions apply.
262 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 2, APRIL 2008chunks to faster connections and smaller chunks to slower con- TABLE Inections. But this method is not practical as one cannot know AVERAGE SERVICE CAPACITY OF EACH SOURCE PEERa priori the service capacity of all source peers. In addition, the UNDER DIFFERENT CONFIGURATIONSservice capacity is stochastically ﬂuctuating all the time, andour analysis show that the performance of parallel downloadingdepends much upon the heterogeneity of the service capacitiesin different source peers if the chunks are equal in size. Many P2P applications nowadays use chunk-based ﬁletransfer with equal chunk size. As mentioned earlier, the beneﬁtof chunk-based switching is to speed up the conversion fromdownloading peers to uploading peers and thus indirectly in our conﬁguration, different source peers have different av-affect the average download time. But, in terms of reducing erage service capacities, and the service capacity of each sourcethe average download time directly, it does not help much. peer is correlated in time. We consider a single downloadingRandom chunk-based switching may reduce the correlations in peer as well as multiple downloading peers to allow competi-the service capacity, but it still cannot eliminate the effect of tion among the downloading peers for limited service capacityspatial heterogeneity in different source peers. of each source peer. In current practice, the chunk based transfer and the paralleldownload are often combined. Taking BitTorrent and Overnet A. Single Downloader With Heterogeneous Service Capacitiesfor examples, a ﬁle is ﬁrst divided into 256 KB and 9.5 MBchunks of equal size, respectively, and then different chunks are We ﬁrst show the impact of both heterogeneity and correla-downloaded from different source peers simultaneously. How- tions in service capacities on the average download time whenever, we separate the analysis of the two strategies to show how there is a single user (downloader) in the network. There areeach is different in combating spatial heterogeneity and tem- source peers in the network, each offering different av-poral correlations. Please note that we are not trying to com- erage service capacities. Let be the average service capacitypare the performance of parallel downloading with chunk based of source peer and . The average service ca-transfer since they can be easily combined to yield better perfor- pacity of the whole network is then .mance. Rather, we are comparing the performance of the two We change the heterogeneity in service capacity by changingstrategies with our random periodic scheme. Further, we will each , while keeping 200 kbps the same. We measurepresent the performance comparison of the combined strategy the degree of heterogeneity in term of , thewith the random periodic scheme in Section V-B. normalized standard deviation. Table I shows the different set- The idea of time-based switching scheme is in fact not new. tings used in our simulation in this subsection.Such strategy has been implemented in BitTorrent  but with To demonstrate the impact of correlation in each ﬁxed sourcesome other purpose in mind. In BitTorrent application, by using peer, we use a class of AR-1 random processes to model theits optimistic choking/unchoking algorithm, a peer changes one stochastic ﬂuctuation in the service capacity. It is reasonableof its servicing neighbors with the lowest upload capacity every to assume that if the average service capacity is large, the10 seconds in hope to ﬁnd some peers offering higher service ca- service capacity is more likely to ﬂuctuate over a wider range.pacity. However, the idea of switching source peer periodically For instance, for a high-speed source peer (e.g., 1 Mbps), thein the BitTorrent’s optimistic choking/unchoking algorithm is actual service capacity of the end-to-end session may dropto discover new potential sources rather than to explicitly re- down to somewhere around 50 kbps and stays there for amove the negative impact of temporal correlations and spatial while due to network congestion or limited CPU resources atheterogeneity in service capacity. To the best of our knowledge, the source peer. In this regard, we assume that the amount ofwe are the ﬁrst to point out that the random periodic switching ﬂuctuation in is proportional to its mean value . Specif-gives us the average download time of , while all the ically, for source peer , we set in (16) to be uniformlyother schemes considered so far yield larger average download distributed over where is chosen such thattime. remains the same for all . Our study leads us to believe that the random switching deci- In our simulation, the length of each time slot (one period) ission should be based on time rather than “bytes” because we are chosen to be 5 minutes. We set the ﬁle size to 150 MB, whichinterested in the download time, not the average capacity itself. is the typical size of some small video clips or multimedia ﬁles.Indeed, any algorithm based on bytes or a ﬁxed amount of data As the average service capacity (of the network) is 200 kbps, wewill suffer the curse of a bad source peer in that it has to wait set the chunk-size for chunk-based switching to be 7.5 MB (until that amount of data is completely received from the “bad” 200 kbps 5 minutes). The purpose of simulating the chunk-source peer. On the other hand, when the decision is based on based switching is to show the impact of switching based ontime, we do not need to wait that long as we can jump out of “data size”, hence we choose 7.5 MB to allow fair comparisonthat source peer after a ﬁxed amount of time (one period). with the random periodic switching with 5 minute period. We will show the performance of using smaller chunk size later in V. NUMERICAL RESULTS Section V-B. In this section, we provide numerical results to support our We consider all three download strategies discussed so faranalysis and compare the performance of the four schemes for in comparison with permanent connection. For permanent con-ﬁle download under various network conﬁgurations. In any case, nection, the user initially chooses one of four sources randomly Authorized licensed use limited to: Saranathan College of Engineering. Downloaded on August 12, 2009 at 04:02 from IEEE Xplore. Restrictions apply.
CHIU AND EUN: MINIMIZING FILE DOWNLOAD TIME IN STOCHASTIC PEER-TO-PEER NETWORKS 263and stays there until the download completes. For chunk-basedswitching, the user switches to a new randomly selected sourcepeer whenever a chunk is completed. Although we simulate thesystem as a discrete time system, the user is allowed to switchto a new source peer anytime within a time slot whenever itﬁnishes the current chunk. For parallel download, the ﬁle isdivided into four equal-sized pieces and the downloading peerconnects to all four source peers and downloads each piecefrom each source peer simultaneously. Finally, for periodicswitching, a user switches to a new randomly chosen sourcepeer every 5 minutes to further download the remaining partsof the ﬁle. Fig. 4(a)–(b) shows the average download time versus the de-gree of heterogeneity in the average service capacities whenthere is a single downloader in the network. Dashed lines are forstrong correlations and solid lines represent the caseof light correlations . In Fig. 4(a), when the degreeof heterogeneity is small, all three single-link download strate-gies (permanent, chunk-based, periodic) under light correlationsperform the same. This is well expected since the service capac-ities of all source peers are almost i.i.d. over space and time, soswitching does not make any difference and the average down-load time becomes 150 MB/200 kbps 100 minutes,as commonly used in practice. On the other hand, when thereexists strong correlations in the service capacity, the downloadtime is longer for all strategies except the periodic switching.For example, when , the correlation alone can causemore than 20% of increase in the average download time. Thus,when the network is more like homogeneous (i.e., small ), thetemporal correlation in the service capacity of each source peerbecomes a major factor that renders the average download timelonger. However, the average download time remains the sameunder the random periodic switching. Fig. 4. Average download time versus degree of heterogeneity under different Fig. 4(a) also shows the performance of parallel down- download strategies and different degree of correlations. (a) Low degree of het-loading. Intuitively, parallel downloading should perform erogeneity: 0 : : : 0 4. (b) High degree of heterogeneity: 0 4 0 7.better than single link downloading because (i) it utilizes morethan one link at the same time and (ii) if the connection is “poor” with very small capacity. Thus, even though the size ofpoor, parallel downloading reduces the amount of data getting chunk (37.5 MB) is smaller than the whole ﬁle (hence reducingthrough that bad source peer. Since there is only a single user, the risk of staying with the bad source peer for too long), this isit utilizes all the service capacity the network can provide still not as good as the idea of averaging capacities all the time, . In this case, the average download time as used in the periodic switching. We note that temporal cor-should be 150 MB/ 150 MB/800 kbps relations still negatively affect in all these three schemes. How- 25 minutes. We see from Fig. 4(a) that parallel downloading ever, it should be pointed out that the random periodic switchingcan actually achieve the performance close to our expectation performs the same regardless of heterogeneity and correlations,when the service capacities of different source peers are close and in fact it outperforms all the other schemes when the net-to i.i.d. Still, parallel downloading is prone to the negative work is heterogeneous with a wide range of service capacitieseffect of correlations. as in the current network. As the degree of heterogeneity increases, the average down-load time sharply increases for all the schemes except the pe- B. Multiple Downloaders With Competitionriodic switching. Fig. 4(b) shows this when is between 0.4 In this section, we consider the performance of differentand 0.7 (see Table I). All but periodic switching suffer from the download strategies under a multi-user environment. In ournegative effect of heterogeneity. When both heterogeneity and multi-user setting, we set the number of source peers tocorrelation are high ( and ), permanent con- . The source peers are divided into four groupsnection takes about 350 minutes to complete the download. This and each source peer within the same group will have thetime is about 250 minutes, or 4 hours more than using periodic same average service capacity. In reality, the service capacityswitching! It is expected that the performance of parallel down- of each source peer may vary a lot, much greater than theloading degrades fast when there is a large degree of hetero- ones that are presented in Table I. We choose the servicegeneity. It is more likely that one of the parallel connections is capacity of the four groups as 1 Mbps, 500 Kbps, 100 Kbps, Authorized licensed use limited to: Saranathan College of Engineering. Downloaded on August 12, 2009 at 04:02 from IEEE Xplore. Restrictions apply.
264 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 2, APRIL 2008and 50 Kbps, representing typical capacities of LAN, cable,DSL, and modem connections, respectively. In contrast to thesetting in the previous section, each group now may consistof different number of source peers to reﬂect a more real-istic distribution of service capacity. We choose the numberin each group as 10, 5, 65, and 20, respectively. This is toreﬂect the situation in the real world that only a few sourcepeers have very high service capacity while most othershave the capacity of typical DSL (100 Kbps) lines or slowermodems. The average service capacity of the network is thenkbps. The degree of heterogeneity in our setting is .The ﬂuctuation in the service capacity is represented by AR-1process with correlation coefﬁcient of each source peer setto 0.9. We want to see the performance of different strategiesunder the impact of spatial heterogeneity and temporal corre-lation. In our simulation, service capacity of a source peer is equallydivided among all the users connected to that source peer. Theeffect of dividing capacity among users gives us an idea of howdifferent strategies will perform when users compete for lim-ited resources in the network. To represent the level of compe-tition, we use the downloader–source ratio, i.e., the ratio be-tween the number of users (downloading peers) to the numberof source peers. Since the service capacity of a source peer isequally among the users the source peer serves, we can expectthat the service capacity of the system is equally divided amongall users as well. Hence, the average per-user service capacitycan be calculated as the average system service capacity dividedby the downloader–source ratio. For example, if the number ofusers is 200, then the downloader–source ratio 2. The averageper-user service capacity will then be 200 kbps/2 100 kbps. We simulate three strategies. The ﬁrst one is the combinedstrategy of parallel download and the chunk-based transfer.Since we know from  and  that keeping only a smallnumber of parallel active connections is better than maintainingconnections to all source peers, we set the number of parallelconnections to 5 for all the combined strategies. We vary the Fig. 5. Performance comparison of different strategies under different levels of competition. (a) Average download time. (b) Standard deviation/average down-chunk size to see its impact on the average download time load time.in conjunction with parallel download. Further, the users areallowed to request the same chunk from different source peerswhen the number of untransferred chunks is less than the strategy is the random periodic switching. The switching periodnumber of active parallel connections. For example, if a user is still 5 minutes, but we reduce the length of the system timeis three chunks away from completing the entire ﬁle, s/he can slot to 1 minute. In this case, there will be capacity variationsrequest all three chunks from all currently connected source within each switching period.peers. Although making the same chunk requests to different Fig. 5(a) shows the average download time for the strate-source peers will reduce the download time for that speciﬁed gies considered so far. The reference line is given by the ﬁlechunk, this is at the expense of some waste of the system size divided by the average per-user service capacity. First, weresource. Note that we do not allow users to make duplicate can clearly see that the periodic switching performs a lot betterrequests to all connected source peers for every chunk, as this than the chunk-based switching. We have a reduction of 40% inwill waste too much resource. This notion of making requests average download time by using periodic switching. Next, theto different source peers for the same chunk when a user’s combined strategies are shown to outperform the chunk-baseddownload is nearly complete has been already implemented switching. Note that when the level of competition is low, thein BitTorrent called the “end-game” mode . The second combined strategy outperforms both the chunk-based and thestrategy is the random chunk-based switching with a single periodic switching schemes. This is well expected because par-connection. The chunks size is chosen to be 7.5 MB, which allelism increases the service capacity each user can achieveis identical to what we used in the previous section to allow in an under-utilized network. As the level of competition in-fair comparison with the periodic switching. Finally, the third creases, however, the random periodic switching readily starts Authorized licensed use limited to: Saranathan College of Engineering. Downloaded on August 12, 2009 at 04:02 from IEEE Xplore. Restrictions apply.