1    On the Performance of Content Delivery under      Competition in a Stochastic Unstructured                Peer-to-Pee...
2[18] try to develop an optimal peer selection method           applies.that exploits the heterogeneity of the network but...
3normalized to 1 and each time slot can then be simply               To simplify the analysis, we first impose the followin...
4can have two parallel connections simultaneously. The                  wasted, downloading peers can increase their perfo...
5size of the accumulated received data exceeds the file size F .     E{ j∈S Iij (t)} = L, and all downloading peers followI...
6tion problem:                                                                                we first show that there exis...
7  Note that [15] considers only the case of blind selection                at random, i.e. pj = p = L/|S|, ∀j ∈ S. The av...
8and we have the following approximation                                sorted in a decreasing order, i.e, c1 ≥ c2 ≥ c3 · ...
94.4 Optimal Peer Selection Example                                      4.5 Distributed Adaptive Peer Selection StrategyS...
10average total number of connections is |D|L. Note that            to 12 using (21) and (22) with the estimated parameter...
11depicted in Figure 3. The internet cloud consists of many             be 55% of the access link bandwidth limit. The P2P...
12peer selection. In some extreme cases, take |D| = 50 as                                              1an example, the av...
On the performance of content delivery under competition in a stochastic unstructured peer to-peer network
On the performance of content delivery under competition in a stochastic unstructured peer to-peer network
On the performance of content delivery under competition in a stochastic unstructured peer to-peer network
Upcoming SlideShare
Loading in...5

On the performance of content delivery under competition in a stochastic unstructured peer to-peer network


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

On the performance of content delivery under competition in a stochastic unstructured peer to-peer network

  1. 1. 1 On the Performance of Content Delivery under Competition in a Stochastic Unstructured Peer-to-Peer Network Yuh-Ming Chiu and Do Young Eun Department of Electrical and Computer Engineering North Carolina State University, Raleigh, NC {ychiu, dyeun}@ncsu.edu Abstract—Peer-to-peer (P2P) network is widely used for transferring large files nowadays. Measurement results show that most downloading peers are patient as the average download session is usually very long. It is sometimes even longer than downloading from a dedicated server using a modem. Existing results in the literature indicate that the stochastic fluctuation and the heterogeneity in the service capacity of each peer are two of the major reasons that make the average download time far longer than expected. In those studies, it has been often assumed that there is only one downloading peer in the network, ignoring the interaction and competition among peers. In this paper, we investigate the impact of the interaction and competition among peers on downloading performance under stochastic, heterogeneous, unstructured P2P settings, thereby greatly extending the existing results on stochastic P2P networks made only under a single downloading peer in the network. To analyze the average download time in a P2P network with multiple competing downloading peers, we first introduce the notion of system utilization tailored to a P2P network. We investigate the relationship between the average download time, system utilization and the level of competition among downloading peers in a stochastic P2P network. We then derive an achievable lower bound on the average download time and propose algorithms to give the peers the minimum average download time. Our result can much improve the download performance compared to earlier results in the literature. The performance of the different algorithms is compared under NS-2 simulations. Our results also provide theoretical explanation to the inconsistency of performance improvement by using parallel connections (parallel connections sometimes do not outperform a single connection) observed in some measurement studies. Index Terms—Computer network performance, P2P networks 31 I NTRODUCTION access, the typical download time for a 100MB file should be less than an hour if the prediction is correct.Peer-to-peer (P2P) technologies, such as BitTorrent [2], The gap between prediction and measured performanceGnutella [3], or Kaaza [4], have been widely used for motivated us to investigate more about the file downloadfile transfer over the Internet. In those applications, file process in a P2P system.download time is one of the most important performance Most people believe that the existence of free-riders ismetrics. Theoretically, a P2P network can make its users the main reason that degrades the performance of a P2Pdownload faster compared to a traditional client-server network. Accordingly, many incentive algorithms havenetwork because a P2P network is inherently scalable. been developed so as to encourage peers to contributeEach node in a P2P network can act both as a server and to the network [8], [9], [10], [11], [12], [13]. However,a client simultaneously. As a result, the aggregate system even if all peers in the network are altruistic, we areservice capacity increases with the number of download- still far away from enjoying the performance predicteding nodes (demand) in a P2P network [5]. It is widely by [5], [14]. Results in [15] suggest that both the stochasticbelieved that only the physical access bandwidth of each temporal fluctuation and the heterogeneity in servicedownloading peer can limit the download performance. capacity of each peer can make the average download However, measurement studies show rather counter- time significantly longer than expected, even when allintuitive results. It is shown in [6], [7] that downloading peers in the network are willing to share. Although somea 100MB file from a Gnutella or Kaaza network generally other earlier studies have also noticed the impact oftakes hours often up to a week. Considering the fact stochastic fluctuation and the heterogeneity in servicethat earlier P2P studies predicted that the download per- capacity of each peer, those studies often have moreformance is only limited by each peer’s physical access limited viewpoints. For example, the authors in [16]bandwidth and most people nowadays have broadband consider only the stochastic fluctuation in service capac-A preliminary version of this work appeared in [1]. ity of each peer but they do not consider the networkThis work was supported in part by NSF CAREER Award CNS-0545893 heterogeneity. On the other hand, the authors in [17],
  2. 2. 2[18] try to develop an optimal peer selection method applies.that exploits the heterogeneity of the network but do While the aforementioned centralized algorithm tonot consider the temporal fluctuation in service capacity solve for the optimal peer selection probability is im-of each peer. practical in a fully distributed P2P network, the al- More important, all the results in [15], [16], [17], [18] gorithm will serve its purpose as a benchmark withhave been established under the assumption that there is optimal performance. We then propose a heuristic dis-only one downloading peer in the network. This is critical, tributed algorithm that only depends upon the networksince in a real P2P network there will be multiple peers performance metric observed by each downloading peer.uploading and downloading at the same time and a Through realistic network simulations using NS-2 [19],peer’s service capacity will be shared by its competing we compare the performance of different peer selectionpeers. In other words, the downloading peers will have strategies. We use the performance of periodic uniformto compete for the limited resource each single source peer selection derived from [15] as the baseline forpeer can offer. In this setting, the download performance performance comparison. Our NS-2 simulation resultsis determined not only by the stochastic fluctuation and show that both the optimal centralized algorithm andheterogeneity in service capacity that each peer offers; the proposed distributed algorithm outperforms the pe-how each peer makes its peer selection choice under such riodic uniform peer selection in all cases. Our centralizea competitive environment is also very important. algorithm can even reduce the average download time In this paper, we jointly consider all the factors pre- by 50% compared to periodic uniform peer selection.sented in [15], [16], [17], [18] and the impact of inter- In addition, our results in this paper also provide the-action/competition among the downloading peers. In oretical explanation for the observations in [20], [21]order to capture this impact of interactions among the that using parallel connections does not always givedownloading peers, we first introduce the notion of users better performance than using a single connection.system utilization – defined as the average fraction of the We present conditions under which parallel connectionsaggregated system capacity that is actually consumed by is helpful and provide a guideline for determining aall the downloading peers, which is heavily dependent suitable number of parallel connections.upon the peer selection decision. With the introduction The paper is organized as follows. In Section 2, weof system utilization, we show how the average “share” present our system model. In Section 3, we introduceof system capacity a downloading peer receives is not a performance metric – system utilization, which isalways equal to the aggregated capacity of the source determined by the interaction between downloadingpeers divided by the number of downloading peers, peers, and derive the relationship between the averageeven if each source peer distributes its capacity evenly download time and the system utilization. In Section 4,among its connected downloading peers. Further, we we present a centralized optimal peer selection algo-extend the results in [15] that the average file download rithm that maximizes the system utilization for bench-time is often greater than the file size divided by the mark purpose and a simple distributed algorithms. Inaverage received capacity of each downloading peer in Section 5, We present the simulation results and weour much more complex network setting. We derive conclude in Section 6.an achievable lower bound on the average downloadtime based on the average network capacity, the systemutilization, and the level of competition in the network. 2 S YSTEM MODEL It was shown in [15] that switching a single connection In a P2P network, all peers can be categorized into“periodically” among possible source peers “uniformly two groups for a given file of interest: the peers whoat random” can reduce the average download time to are uploading data to other peers (source peers) andsome extent. In this paper, in a network with multiple those who are downloading from others (downloadingdownloading peers, we show that the algorithm in [15] peers). We denote the set of source peers and the setis far from being optimal, but can still be used as the of downloading peers by S and D, respectively. Eachfirst step toward developing an optimal peer selection peer can either be in one of the two sets, S or D, or instrategy. We modify the notion of periodic switching both. For example, a free-rider is in D only and a peerand start by letting each downloading peer generate a who contributes to the network without downloadingrandom number of parallel connections periodically instead anything from other peers is in S only. In most cases, aof having just one single connection all the time. Each peer is counted in both sets D and S. We use indices idownloading peer chooses its source peers according to and j to represent the peers in sets D and S throughoutsome probability. We can then formulate the probability the paper, respectively.assignment problem as a convex optimization problem We model the network as a discrete time system withand compute the optimal connection probabilities in a the length of a unit time-slot set to ∆. Consideringcentralized fashion. Our solution is applicable to much a P2P network as a discrete time system is widelywider ranges of network scenarios with stochastic fluc- used in performance analysis of P2P networks in thetuation and heterogeneity under peer competitions, to literature [5], [22]. Thus the duration of each slot iswhich none of the approaches in [15], [16], [17], [18] [(t − 1)∆, t∆) , t ∈ N. For analytical simplicity, ∆ is
  3. 3. 3normalized to 1 and each time slot can then be simply To simplify the analysis, we first impose the followingindexed by t. There are two major data dissemination assumptions.models, push-based and pull-based, often used in the (A1) Different downloading peers make their ownliterature to describe P2P systems. In a push-based sys- choices independent of other peers.tem, the source peers control when and what to send (A2) There is no restriction on the number of connec-to the downloading peers [23], [24]. The downloading tions a source peer can have, i.e., all connectionpeers can make as many data or connection requests as attempts of a downloading peer always succeed.they want and then wait passively for the source peers to However, we limit the average number of parallelservice them. On the other hand, the downloading peers connections a downloading peer can open to L.in a pull-based system have the power to decide when (A3) A downloading peer i selects each source peer jand what to download from the source peers [25]. A with some probability, pij , i.e. the connection deci-piece of data is transferred immediately from the source sion Iij (t) is generated from pij . The connectionpeers to the downloading peers upon requests. Since probability pij and the fluctuation in Cj (t) of aeach downloading peer is interested in improving its source peer j are independent.own downloading performance, it is intuitive to give thedownloading peers more control in its own download We have (A1) because it is unlikely that each down-process. Hence we view our network as a pull-based loading peer will broadcast its own peer selection strat-system. Since the downloading peers can select its source egy to its neighbors unless the set of downloading peerspeers actively, we denote the subset Si (t) ⊆ S be the set conspire together.of source peers that peer i connects to at time slot t. We have (A2) to reduce capacity over-saturation of Note that peers i and j are actively connected if the source peers. Capacity over-saturation occurs whenand only if the connection between i and j is carrying a source peer has too many active connections simul-data traffic in our definition. For each connection, it is taneously. The capacity offered over each connectionwell known that the available bandwidth fluctuates over becomes extremely low. The performance bottleneck intime [26], [27] due to the workloads of both end points terms of receive capacity (from a downloading peer’sor the network congestion status. Recent studies have point of view) is the capacity over-saturated sourceshown that most peers in a P2P network utilize broad- peers. It is intuitive to limit the number of parallel con-band connections. Since either cable or DSL lines offer nections each source peer can have to eliminate capacityasymmetrical bandwidth and the upstream bandwidth over-saturation. It can be done in two ways. We canis usually much smaller than downstream bandwidth, either put such limitation on the source peers or theit is reasonable to assume that the bottleneck lies in the downloading peers. To see how limiting the numberservice capacity of the source peers. Let Cj (t) denote of parallel connection each downloading peer can havethe total service capacity a source j can offer at time slot t. helps to prevent capacity over-saturation from happen-We assume that all downloading peers connecting to the ing, let’s consider the following simple example. Sup-same source peer will share the source’s service capacity pose that there are N = 100 source peers and M = 200equally, i.e., if there are currently M downloading peers downloading peers. Each downloading peer is limited toconnecting to source j, each downloading peer will get have one connection. Each downloading peer selects itsa data rate of Cj (t)/M during time slot t. source peer uniformly at random. The number of parallel Define an indicator function connections each source peer will have is then a binomial 1, j ∈ Si (t) random variable with n = M , p = 1/N = 0.01. Therefore, Iij (t) = each source peer will have M/N = 2 parallel connections 0, otherwise on average with variance of 1.98. In other words, it isshowing whether downloading peer i connects to source very unlikely for a source peer to have 5 or more parallelj at time t. Then, i∈D Iij (t) is the number of down- connections. In the above case, we demonstrate clearlyloading peers that are connected to source j at time t. that we can prevent capacity over-saturation by limitingThe data rate that the downloading peer i receives from the number of parallel connections a downloading peersource j at time t becomes can have. Iij (t) We have (A3) because selecting source peers using Rij (t) = Cj (t). i∈D Iij (t) probability is in general better than selecting sourceThe aggregated capacity, Ri (t), that a downloading peer peers deterministically based on some system metrics.i receives from the entire network during time slot t will We define deterministic peer selection as a download-then be ing peer making its connection decisions based on Iij (t) some measured network statistics. The decision is al- Ri (t) Rij (t) = Cj (t), (1) ways connect/disconnect if the measured statistics is j∈Si (t) j∈S i∈D Iij (t) greater/smaller than some threshold. To demonstrateThe second equality in (1) comes from the fact that the idea of deterministic peer selection, let’s considerIij (t) = 0 for j ∈ Si (t), i.e., sources not connected to the following example. Suppose that there are 3 sourcedownloading peer i will not contribute to Ri (t). peers and a downloading peer. The downloading peer
  4. 4. 4can have two parallel connections simultaneously. The wasted, downloading peers can increase their perfor-capacity of source 1, 2, and 3 are denoted by C1 (t), C2 (t), mance by utilizing those unused resources and thusand C3 (t), respectively. The table below shows the ca- reduce their download time.pacity fluctuation of each peer. We can either make To make our analysis more tractable, let’s assumeour connection decision base on the average capacity that each downloading peer has the same probability of(assume it is known) of each source peer or on the connecting to a source j, i.e.,received capacity in the previous time slot. P{Iij (t) = 1} = P{j ∈ Si (t)} = pj , ∀ i ∈ D. (3) t=1 2 3 4 5 6 ... C1 (t) 4 1 4 1 4 1 ... Note that pj (j ∈ S) is not a probability distribution since C2 (t) 3 3 3 3 3 3 ... E{ j∈S Iij (t)} = j∈S pj is the average number of C3 (t) 1 2 1 2 1 2 ... connections of downloading peer i, which can be larger than 1 for multiple connections (i.e., parallel download- If we use the received capacity in the previous slot ing). In other words, Iij (t) is not independent over j asas the decision threshold. One possible strategy is to they are constrained by the number of connections ofdisconnect from the source peer that offers the lowest a downloading peer (usually up to some constant). Wecapacity and then make new connections to the source note however that Iij (t) is i.i.d. over i (i ∈ D) underpeer that is not utilized in the previous slot. The boldface (A1) and (3).numbers in the above table represents the capacity a We now have the following.downloading peer receives from the its two connectionsover time under such strategy. In this case, the average Proposition 1. Under (3), we havereceived capacity is less than 4. However, if we connectto the two source peers that offer higher average capacity j∈S 1 − (1 − pj )|D| cj ρ= . (4)(peers 1 and 2). The average received capacity is 4. j∈S cj In the above example, we can see that selecting peersaccording to their average capacity is better. It is possible Proof: See Appendix Athat the approach of making decisions utilizing the most To better understand the meaning of the system uti-recent measurement is better in some cases but we lack lization ρ in (4), let’s consider a simple system withthe systematic analysis of selecting peers based on the two source peers and two downloading peers wheremost recent measurements in the literature. Hence, the the capacities of two source peers are identical. Supposemost common approach adopted in the literature is that each downloading peer connects to each sourceselecting source peers based on their average capacity. with probability pj = E{Iij } = 0.5, but under theHowever, making deterministic peer selection based on constraint that Ii1 (t) + Ii2 (t) = 1, i = 1, 2, i.e., eachthe average capacity of the source peers is not as good downloading peer maintains a connection to exactly oneas using a uniform random variable to choose a source of the two sources at any time. Then, from (4), we havepeer as shown in [15]. Hence we adopt the notion of ρ = (1 − (1 − 0.5)2 ) = 0.75. This is indeed true asusing probability to choose source peers. there are only two possible cases – (i) two downloading peers connect to different sources (ρ = 1), or (ii) two downloading peers connect to the same source (ρ = 0.5),3 D OWNLOAD T IME A NALYSIS each of which takes place with the same probability.3.1 System Utilization From Proposition 1, we see that ρ = 1 only when pj = P{Iij (t) = 1} = 1 for all j. This suggests that eachWe first define the system utilization of a P2P net- downloading peer should connect to all possible sourcework because it is important in determining the average peers in the network from purely the system utilizationdownload time. Let the service capacity of source j ∈ S point of view. However, in reality, a downloading peerbe stationary with E{Cj (t)} = cj . We have the following: may have many potential source peers and it is imprac-Definition 1. The system utilization is tical to connect to all of them simultaneously, hence we usually have pj < 1. E i∈D Ri (t) i∈D j∈S E {Rij (t)} ρ = (2) cj E j∈S Cj (t) j∈S 3.2 Achievable Minimum Average Download Timewhich is the ratio between the aggregated average service Before we show how we can derive the relation betweencapacity of the entire network and the aggregated average the average download time and the system utilization,capacity actually consumed by all the downloading peers. we first need a formal definition for the file download time. In any network, from an administrator’s point of view,the system would perform the best if the “utilization” Definition 2. Let C(t) denote the service capacity that aof the system is maximized. In other words, when the downloading peer “receives” from the network at time t (t =system is under-utilized, i.e. some source capacity being 1, 2, . . .). The file download time T is the first time that the
  5. 5. 5size of the accumulated received data exceeds the file size F . E{ j∈S Iij (t)} = L, and all downloading peers followIn other words, we have the following equation: the “connect-and-wait” strategy for each of the connec- T tions. We have the following. T = min T > 0 C(t) ≥ F (5) Theorem 1. Let c = {cj }, j = 1, . . . , |S| represent the vector t=1 of average capacities of the source peers. Then The random variable T is the first hitting time of F ν Tthe cumulative process t=1 C(t) to reach level F . If E{Ti } ≥ (8) A(c) ρ{C(t), t ∈ N} are independent and identically distributed(i.i.d.), then by assuming an equality in (5), we obtain wherefrom Wald’s equation [28] that |D| 1 ν= and A(c) cj . (9) T |S| |S| j∈S F =E C(t) = E{C(t)}E{T }. (6) We define the parameter ν as the competition level, which is t=1 the average number of downloading peers each source peerThe expected download time, measured in slots, then has to service. Note that ν/A(c) = j∈S cj /|D| canbecomes E{T } = F/E{C(t)}, which has been widely be interpreted as the commonly perceived average “share”used in the literature. of system capacity that each downloading receives from the However, the equality in (6) does not generally hold. network .Recall that the service capacity of each source peer canbe different and can fluctuate over time. First, suppose Proof: Assume that a file of size F is divided intothat a downloading peer is only able to make a single |Si | pieces because the downloading peer utilizes parallelconnection and waits patiently for its session to complete. connections to download from the source set Si . Let FijIt was shown in [15] that the temporal correlation in denotes the size of the piece that the downloading peerthe fluctuation of service capacity can make the average i requests from source peer j. Since we are consideringdownload time over any connection longer. In other the specific downloading peer i, so we suppress thewords, if a downloading peer selects its source peer by a subscript i in Fij . Clearly, we have j∈Si Fj = F, whererandom variable J ∈ [1, . . . , |S|], the average download 0 ≤ Fj ≤ F. Note that pj is independent of j and let Tijtime Tj , given that J = j, usually have the following denotes the time required to complete transferring piecerelationship: E{Tj } ≥ F/E{Cj (t)}, where equality is Fj from j, as defined in (5), and we haveachieved only if each of Cj (t), ∀j ∈ S is independent orweakly correlated over t. E{Ti } = E max Tij = ESi E max Tij Si Even if the equality in (6) holds for whichever connec- j∈Si j∈Sition the downloading peer chooses, the heterogeneity ≥ ESi max E{Tij | Si } = ESi max E{Tij } , (10)of the network (different service capacities offered by j∈Si j∈Sidifferent source peers) also makes the average down- where ESi is the expectation over the random set Si , andload time longer. Suppose now E{Tj } = F/E{Cj (t)}for each j. Then, observe from Jensen’s inequality that the last equality follows from the fact that Si is randomly generated at t = 0, i.e., Tij and Si are independent.EJ E{CJF F ≥ EJ {E{CJ (t)|J}} where the equality is (t)|J} For each piece of a file that peer i downloads from peerachieved when E{Cj (t)} = E{Ck (t)}, ∀j, k ∈ S. j, we generally have Tij ≥ Fj /E{Rij (t)} [15]. Substitute Hence, combining the effect of both temporal correla- E{Rij (t)} with the result in (27) and we havetion and network heterogeneity, we arrive to Fj Fj E{T } = EJ {E {TJ |J}} E{Tij } ≥ = |D| , E{Rij (t)} µj c j F F ≥ EJ ≥ , (7) E {CJ (t)|J} EJ {E{CJ (t)|J}} where we let µj = 1 − (1 − pj )|D| . Here, we can viewwhere the first inequality comes from the temporal µj as the probability that source peer j is utilized in any |D|correlation in the fluctuation of the capacity of each random time slot since (1 − pj ) represents the proba-source peer and the second inequality comes from the bility that source peer j is connected to no downloadingnetwork heterogeneity. Note that (7) is for the case in peers. Further, we can also view µj as the fraction of timewhich the network has only one downloading peer and it is contributing to the system. As the result µj cj is thenit only utilizes a single connection. We now investigate the average capacity that a source j can contribute to thethe performance of the case in which the network has system. Hence, equation (10) impliesmultiple downloading peers in competition, each of Fjwhich utilizes parallel connections to multiple source E{Ti } ≥ |D|ESi max . (11) j∈Si µj c jpeers at the same time. Assume that each downloading peer i can now For any give set Si , we know that Fj = F ·have an average of L parallel connections, i.e. µj cj / j∈Si µj cj is a solution to the following optimiza-
  6. 6. 6tion problem: we first show that there exists a way to achieve the equality in (8) under all conditions. We then propose a Fj min max , s.t. Fj = F, Fj ≥ 0. centralized algorithm that maximizes the system utiliza- j∈Si µj c j tion ρ, thereby further minimizing the (now achievable) j∈Si minimum average download time (see (8)). Finally, weThus, assume that Fj is allocated proportional to µj cj propose some possible distributed algorithm implemen-and j∈Si Fj = F , we have tations that may achieve near optimal performance. Fj F max = . (12) 4.1 Dynamic Peer Selection j∈Si µj c j j∈Si µj c j As our first step towards developing an algorithm thatFurther, note that Iij (0) = 1{j∈Si } in the “connect-and- minimizes the average download time, we show switch-wait” strategy for each connection and we have ing peers periodically, i.e. a downloading peer i changes its source set Si (t) after each t, can achieve the equality in ESi µj c j =E µj cj Iij (0) (8). As opposed to the “connect-and-wait” strategy that j∈Si j∈S the set of source peers remains “static” after a session = µj cj P{Iij (0) = 1} = µj cj pj ≤ ρA(c)|S|. (13) begins, switching peer periodically causes the set of j∈S j∈S source peer to change dynamically during a downloadRecall from (4) that ρ j∈S cj = j∈S µj cj and pj ≤ 1, session. We will use the term dynamic peer selectionwe have the inequality in (13). From (11), (12) and (13), and periodic switching interchangeably in the followingobserve that sections. In our network setting, the source set Si (t) of each downloading peer i can change both in its elements Fj F and its size over time. Each downloading peer can have E{Ti } ≥ |D|ESi max = |D|ESi j∈Si µj c j j∈Si µj c j connection preference for one source peer over another. F |D| F |D| F ν The capacity of each source peer is shared among the ≥ ≥ = . (14) downloading peers connected to it to model the compe- ESi { j∈Si µj c j } ρA(c)|S| A(c) ρ tition.This completes the proof. Under our time-varying dynamic source peer selection The function A(c) is the average system service capacity scheme, Si (t) (t = 1, 2, . . .) is a sequence of random setsfrom the perspective of a downloading peer. Note that rather than a constant set randomly generated at t = 0the results in [15] that E{T } ≥ F/A(c) is now just as the Si (t) in Theorem 1. The service capacity that a“a special case” of Theorem 1. The schemes described downloading peer i receives from the entire network,in [15] is the case when |D| = L = 1 and pj = 1/|S|, Ri (t), then becomes the sum of random variables over awhich corresponds to ρ = 1/|S| from (4). Therefore, random set. Here, we do not enforce a strict limit on theν/ρ = 1 in (8), and we have the result in [15]. number of parallel connections in each time slot. Rather, From the derivation of Theorem 1, we can see that we assume that the average number is limited to L, i.e.the naive “connect-and-wait” strategy needs to satisfy E{ j∈S Iij (t)} = j∈S pj = L. To show that we canseveral conditions in order to achieve the minimum have an equality in (8), we need to first show that thepossible average download time, i.e. the equality in (8). temporal correlation in received service capacity of eachFirst, the size for each piece downloaded from differ- downloading peer is much reduced by our time-varyingent sources must be determined at the beginning of a dynamic peer selection.session such that Fj is proportional to µj cj , i.e Fj = Note that the correlation function of a stationary ran-F · (µj cj / j∈Si µj cj ). In reality, it is hard to measure dom process X(t) is defined byboth cj and µj accurately, and hence it is difficult to Cov(X(t), X(t + τ ))pre-allocate Fj . Even if we were able to allocate each φX (τ ) =Fj precisely proportional to µj cj , the possible temporal Var(X(t))correlation of fluctuation in the service capacity of each and we have the following.source peer will result in a strict inequality in (11) and Theorem 2. For any given downloading peer i, Let X(t) andso does the heterogeneity in the average capacity with Y (t) denote the Ri (t) under the “connect-and-wait” and thedifferent source peers for the inequality in (14). The dynamic peer selection, respectively. In general, we havenaive ”connect-and-wait” strategy for each of the parallelconnections is thus unlikely to achieve an equality in (8), φY (τ ) ≤ max{pj }φX (τ ). (15)and therefore we need to consider different algorithms j∈Sto achieve such equality if at all possible. For the special case of uniform (blind) selection, i.e pj = L/|S|, we have the following inequality:4 ACHIEVING THE M INIMUM AVERAGE D OWN -LOAD T IME L φY (τ ) ≤ φX (τ ). (16) |S|Theorem 1 shows the minimum possible average down-load time in a stochastic P2P network. In this section, Proof: See Appendix B.
  7. 7. 7 Note that [15] considers only the case of blind selection at random, i.e. pj = p = L/|S|, ∀j ∈ S. The average num-with L = 1. Theorem 2 is in a much general form and ber of parallel connections used by each downloadingshows that the temporal fluctuation in each downloading peer L is chosen to be some small number so that thatpeer’s received capacity is either uncorrelated or weakly equation (18) holds, and we plot the relation betweencorrelated over time so that Wald’s equation (6) holds the average download time E{T }, system utilization ρ,true as long as maxj {pj } is not too large. and the level of competition ν for different average Now, suppose that maxj {pj } is properly selected to number of parallel connections L according to (18) asbe some suitable value, say, 0.5. The correlation in the Figure 1(a) and (b). The network has a fixed numberreceived capacity from the network is at least reduced by of 40 source peers and the capacity of each source50% compared to “connect-and-wait” strategy from (15). ranges from 1MB/min to 20MB/min with an incrementHence, the receive capacity Ri (t) for downloading peer of 0.5MB/min. The number of downloading peers variesi will be weakly correlated over time. Replacing C(t) in from 1 to 200, hence ν ∈ [0.025, 5].(6) directly by Ri (t) in (1) gives F F 5 1 E{Ti } = = E{Ri (t)} E Iij (t) Cj (t) 4 0.8 Normalized E{T} j∈S i∈D Iij (t) 3 0.6 F |D| A(c) ρ = · (17) 2 0.4 1 − (1 − pj ) |D| cj A(c) L=1 L=1 j∈S 1 L=3 0.2 L=3 L=5 L=5 F |D| F ν = ρ−1 = . (18) 0 0 1 2 3 4 5 0 0 1 2 3 4 5 A(c) |S| A(c) ρ ν νEquation (17) is a direct application of (27) and (18) is (a) E{T } vs ν. (b) ρ vs. νthe result of rearranging the terms in (17). Equation (18) Fig. 1. The relation between the average download timeclearly shows that the equality in (8) is achieved by using E{T }, system utilization ρ, and the level of competitiondynamic peer selection. ν for different average number of parallel connections L. E{T } is normalized over F/A(c) and E{X} denotes the We argue that maxj∈S {pj } is most likely to be strictly expectation of a random variable X.less than 1 in practice. If maxj∈S {pj } = 1, it meansthat the downloading peers will connect to some source By closely examine both Figure 1(a) and 1(b), we arguepeers permanently throughout their entire download that the fundamental effect of parallel downloading issessions. Note that the service capacity of a source to increase the system utilization rather than directlypeer fluctuates over time. A permanent connection will reducing the average download time. Figure 1(a) showsprevent the downloading peer from switching to some that the benefit that downloading peers gain form utiliz-other potential source peers that can offer better capacity ing parallel connections varies under different networkat the times when the service capacity of the currently settings. The common belief that parallel connectionsconnected source peer plummets. can help to reduce the average download time is true Further, note that limiting the value of maxj {pj } also only when the competition in the network is very low.implies limiting the average number of parallel connec- For example, when ν ≤ 2, we see that utilizing par-tions a downloading peer can use, i.e. j∈S pj = L ≤ allel connections indeed gives better performance over|S| · maxj∈S {pj }. Although it is a common belief that using a single connection in Figure 1(a). The reason isit is always better to open more parallel connections if that parallel connections can much increase the systema downloading peer wants to complete its download utilization. Taking ν = 1 as an example, Figure 1(b)session quicker, such idea is generally not true as will shows that the system utilization is increased from 60%be discussed in Section 4.2. If we are to connect to all to around 95% by increasing L from 1 to 3. On thepossible source peers in the network, we lose the benefit other hand, there is no noticeable performance differenceof switching peers (correlation reduction) and hence the between using a single connection and using parallelaverage download time will be much larger than the connections when ν ≥ 2. The system performance inRHS of (8). Further, the measurement results in [20], [21] this region actually corresponds to the results in [20],also show that all downloading peers utilizing parallel [21] that large scale deployment of parallel connectionsconnections often do not give better performance than all does not always give better performance. Once again,peers using a single connection. The authors of [20], [21] if we refer to Figure 1(b), we can see that there is notsuggest that the number of parallel connections should much room to increase system utilization when ν is largebe limited. Theorem 2 actually agrees with this claim. regardless the value of L.In the next section, we will investigate the impact ofutilizing parallel connections in more detail. Even when the system is operating at the under- utilized region, utilizing more parallel connections may4.2 Impact of Parallel Connections not guarantee better performance. First, each additionalHere, we assume that each downloading peer changes its connection does not reduce the average download timeconnections periodically using selecting peers uniformly by the same amount. Consider the case when |S| |D|L,
  8. 8. 8and we have the following approximation sorted in a decreasing order, i.e, c1 ≥ c2 ≥ c3 · · · . Then, |D| the optimal solution {p∗ } is j L |D| ρ=1− 1− ≈1− 1−L = Lν.  1 |S| |S| K ∗ −L 1 (|D|−1)  1− , j ≤ K∗  1 cjEquation (18) approximately becomes F/(A(c)L) and p∗ = j K∗ 1 j=1 cj (|D| 1) − (21) ∗   0,the average download time is inversely proportional to j>KL. Second, the negative impact of temporal correlationmay compromise the benefit we gain from using parallel Here, K ∗ is the maximum K such that µ(K) < |D| cK C(K) , nconnections. Consider the case when ν ≈ 0.5 in Figure 1, where C(n) = j=1 cj and µ(n) is defined aswe may be able to reduce the average download time a (|D|−1)little more by increasing the average number of parallel connections from 5 to 10 or more. However, increasing |D|  n−L µ(n) = . (22) L implies that we will have less correlation reduction, C(n)  1/(|D|−1)  n 1and it was shown in [15] that different levels of tem- j=1 cjporal correlation in received capacity could increase theaverage download time up to 25% or more. Therefore, For any given D, S, L, (21)–(22) gives the optimal peercombining the effect of both parallel connections and selection probabilities for any choice of {cj }, j ∈ S. In atemporal correlation, a large L may give the download- typical P2P network with tens to hundreds of uploadinging peers worse performance compared with a small L. and downloading peers, we argue by an example in theIn summary, the average number of parallel connections next section that maxj∈S {p∗ } is most likely to be some jshould be limited to some small number. small number so that we in general have a significant amount of correlation reduction from Theorem 2, and we always have (18). Further, {p∗ } maximizes the system j4.3 Maximizing System Utilization utilization ρ in (18), and the minimum possible averageFrom the previous section, we demonstrate that chang- download time is achieved.ing the set of source peers dynamically over time can By careful investigation of the expression in (21)–(22),always achieve equality in (8) under the condition we can identify the optimal peer selection probabilities inthat both the average number of parallel connections some very special network settings. First, if the networkL and the connection probability to each source peer is homogeneous, i.e. all source peers offer the samemaxj∈S {pj } are limited. Given that we always have (8), average service capacity (cj = c for all j ∈ S), then (|D|−1)what is the minimum average download time? Note that µ(n)/|D| = n n−L 1 n < 1/n for any n. In this case,the variables, F, A(c), and ν in (18) are determined once K ∗ = |S| and (21) becomesthe P2P network is formed. Downloading peers are not 1/(|D|−1)able to change these parameters. On the other hand, the |S| − L 1 Lsystem utilization ρ in (18) is partially determined by the pj = 1 − = . 1 1/(|D|−1) c |S| |S| cpeer selection probabilities (See (4)). Hence, in what fol-lows, we consider how to assign selection probabilities Hence, uniform random selection among all possibleto source peers in order to achieve optimal performance source peers is the best strategy in a homogeneous(minimal average download time). environment. Second, recall that |D| is the number of Specifically, we assume that it is possible to measure or competing downloading peers in the network. If |D| isobtain the accurate values of E{Cj (T )} = cj for each of large, i.e. |D| → ∞, then we havethe source peer j in the network. Following the notion in 1/(|D|−1)Section 3.2, we let each peer have an average of L parallel 1connections, i.e. E{ j∈S Iij (t)} = lim =1 j∈S pj = L < |S|. |D|→∞ cjFrom (8), minimizing the average download time isequivalent to maximizing the system utilization ρ. From in (21), regardless of the value of cj > 0. Thus, again, we(4), we can formulate our problem as follows: have pj ≈ L/|S|, i.e., uniform random selection. Note that this uniform random peer selection achieves near- j∈S 1 − (1 − pj )|D| cj optimal performance only under very special network con- max (19) j∈S cj figurations that we just described above, namely, either homogeneous network or a very large number of down- s.t pj = L, 0 ≤ pj ≤ 1 (20) loading peers. In all other scenarios such as intermedi- j∈S ate number of downloading peers under heterogeneous Since 1 − (1 − pj )|D| is strictly concave in pj , the environment, assigning connection probabilities for eachproblem in (19)–(20) is a convex optimization problem. source peer according to (21)–(22) clearly give far betterWe can apply any standard convex programming tech- performance in general. In what follows, we illustratenique [29] to solve the problem. First, without the loss how the optimal connection probabilities change as theof generality, assume that the vector c = [cj , j ∈ S] is network setting varies.
  9. 9. 94.4 Optimal Peer Selection Example 4.5 Distributed Adaptive Peer Selection StrategySuppose that we have a network with 40 source peers. Note that the centralized peer selection algorithm dis-The source peers are paired into 20 groups. The two cussed in the previous section requires the informationsource peers in each group have the same average about the “average” service capacity of each source peerservice capacity, while the average service capacity of and the number of downloading peers in the network.the group ranges from 1MB/min to 20MB/min in in- Such requirement suggests that we need a centralizedcrements of 1MB/min. We set the average number of authority to calculate the optimal connection probabilityparallel connections L = 3 and vary the number of for each downloading peers. Further, broadcasting thedownloading peers |D| to calculate the optimal vector optimal solution to all downloading peers introducesof connection probabilities using (21)–(22). much communication overhead. In a system like a P2P network, a distributed algorithm for calculating the con- nection probability would be more desirable because 0.25 a distributed algorithm reduces the cost of building a |D|=5 centralized infrastructure and the cost of communication 0.2 |D|=10 overhead. In this section, we show that we can have a Connection Probability |D|=20 distribution algorithm that achieves near optimal perfor- |D|=40 mance with only very little communication overhead. 0.15 |D|=60 Unform In other words, each downloading peer can calculate its own vector of connection probabilities using online 0.1 measurements. Our distributed algorithm is a modification of the cen- 0.05 tralized algorithm in Section 4.3. The global parameters in (21)–(22) are to be replaced by each downloading 0 peer’s own estimates. In other words, each downloading 20 15 10 5 Service Capacity (MB) peer estimates the values of cj and |D| from its own measurements during its download session. Both cj andFig. 2. The optimal connection probabilities to source |D| can be inferred by the information about the numberpeers (from (21)) offering different service capacities. of downloading peers that are actively connected to a source peer. Let χj (t) = i∈D Iij (t) denote the number of down- Figure 2 shows the computed optimal connection loading peer connected to source peer j at time t.probabilities for 20 different groups, indexed with their Suppose that each downloading peer is now able toaverage service capacities from 20MB/min to 1MB/min. obtain the information about χj (t) when it connects toAll the lines in the figure display a common trend: source peer j. We emphasize that the source peers do notassigning higher probabilities to source peers offering broadcast the process χj (t) to all downloading peers inhigher average capacities. Note that the maxj {pj } is every time slot t. Instead, the downloading peers ask forstill very small even in a very heterogeneous network the value of χj (t) only when they decide to connect to(we have service capacity of source peers ranging from source peer j at t. Such actions introduce very little (if1MB/min to 20MB/min). Clearly, Figure 2 shows that any) communication overhead. Let χij (t) = χj (t) · Iij (t)the probability of connecting to peers offering the av- be the total number of downloading peers sharing theerage capacity of 20MB/min should increase as |D| capacity of source peer j, seen by the downloading peerdecreases. When there is no competition in the network, i. A downloading peer i can then estimate the averageit is intuitive that a downloading peer should always capacity of source peer j bychoose the fastest source peers as [17], [18] suggested.However, how should the peer selection probability T Rij (t)χij (t) t=1change if there exists competition in the network? Fig- cij = ˆ Ture 2 also shows that the optimal connection vector tends t=1 Iij (t) T Iij (t)to be “flatter” as the number of concurrent downloading t=1 Cj (t) i∈D Iij (t) Iij (t) i∈D Iij (t)peers |D| gets larger, and it approaches to that of uni- = Tform random selection when |D| goes to infinity, as we t=1 Iij (t) Thave discussed in Section 4.3. Although both the single t=1 Iij (t)Cj (t)downloading peer case and infinite downloading peer = T t=1 Iij (t)case are well studied, note that our algorithm finds theoptimal vector of connection probabilities between these Clearly, cij is an unbiased estimator for cj = E{Cj (t)}. ˆtwo extremes under stochastic and heterogeneous en- We argue heuristically that χij (t) can be used forvironments. Further performance comparison between estimating the number of downloading peers in thethe optimal peer selection and other selection algorithms network as well. Recall that each downloading peerwill be presented in Section 5. has an average of L parallel connections, hence the
  10. 10. 10average total number of connections is |D|L. Note that to 12 using (21) and (22) with the estimated parameters. T t=1 χij (t) Lines 14 – 16 are the steps to update each downloadingχij =ˆ T is the estimate for E{ i∈D Iij (t)}, t=1 Iij (t)which is the average number of downloading peers peer i’s estimated parameters, namely cij and χij . We ˆ ˆconnected to source peer j, from the downloading peer have to emphasize again that the estimates of both cij ˆi’s point of view. Summing up the average number of and χij cost almost no communication overhead because ˆdownloading peers connected to source peer j gives the they are updated only when i decides to connect tototal average number of connections in the network, i.e., j. In summary, our Algorithm 1 is simple and fully|D|L. Thus, each downloading peer i can use j∈S χij ˆ distributed. In Section 5, we show the performance of ˆ Algorithm 1 via NS-2 simulations.to estimate the number of all downloading peers |D| inthe network. Then, the downloading peers can use theirestimated system parameters to calculate the connection 4.6 Summaryprobability to each source peer by replacing cj and |D| Before we present our simulation result, we summarize ˆin (21) and (22) with cij and |D| . ˆ the analytical results we have obtained so far. The service Note that all the calculation so far is based on each capacity of each source peer j ∈ S is denoted by apeer’s own observation from the network. Since no random process Cj (t). Each of the downloading peerobservation can be made without connection, we assume connects to source peer j with probability pj . Tablethat pij ≥ > 0, where is a very small number to 4.6 shows the summary of our result on the averageensure that each downloading peer is able to observe download time of the network.and estimate the global information rather than sim- File size Fply ignores some source peers completely. Algorithm 1 Avg. Capacity A(c) = 1 cj = E{Cj (t)} |S| j cj ,summarizes our distributed algorithm for connection |D| j∈S 1−(1−pj ) cjprobability assignment. Utilization ρ= j∈S cj |D| Competition Factor ν = |S|Algorithm 1 Distributed Connection Prob. Assignment Min. Avg. Time F ν A(c) ρ T 1: Sij := t=1 Iij (t) with initial value 0 2: In each time slot T : TABLE 1 3: Rij (T ) :=received capacity from a source j. System parameters and performance metrics 4: χij (T ) :=observed number of peers connected to j. 5: while !(file complete) do To achieve the minimum average download time if ∃ Sij = 0 then F ν A(c) ρ ,each peer has to make connection decisions in 6: 7: Connect to L source peer from {j|Sij = 0} uni- each time slot. At the beginning of each time slot, formly. Let G denotes the selected set. the downloading peer decides whether to connect to 8: else a source peer j by flipping a coin with probability ˆ χij ˆ 9: |D|i = jL of getting a head (making a connection) pj . We have10: ˆ ˆ Calculate pij using |D|i , cij according to (21) and calculated the optimal connection probability p∗ for each j (22) with L in (20) replaced by L = L − |S| source peer j. Suppose the each source peer is indexed11: pij = pij + by its average service capacity cj in descending order,12: Connect to source peer j with probability pij . then Let G denotes the set of connected source peers.  1 K ∗ −L 1 (|D|−1)  1− , j ≤ K∗ 13: end if 1 cj p∗ = j K∗ 1 (|D| 1) −14: Sij := Sij + 1 ∀j ∈ G. j=1 cj j > K∗   0, (S −1)ˆij +Rij (T )·χij (T ) c15: cij := ij ˆ Sij ∀j ∈ G.16: ˆ (S −1)χij +χij (T ) χij := ij ˆ Sij ∀j ∈ G. The value K ∗ is the cut-off index and it is defined as17: end while µ(K) cK K ∗ = max K < , |D| C(K) In Algorithm 1, Sij holds the number of times that wherea downloading peer i connects to a source j. Only two  (|D|−1)variables Rij (T ) and χij (T ) are needed from each source nj in time slot T to carry out the connection probability |D|  n−L C(n) = cj , µ(n) =  C(n)  1/(|D|−1) assignment. Here, we briefly explain the key steps of j=1 n 1 j=1 cjAlgorithm 1 in each time slot. From line 6 and 7, we firstconnect to L source peers that have not been connected(Sij = 0) at a time to get a global estimates on the 5 S IMULATIONSparameters for the network as quick as possible. When In this section, we use NS-2 simulations to compareSij > 0, ∀j, a downloading peer i can then calculate the the performance of peer selection strategies in a P2Pconnection probability for each source peer j from lines 9 network. A simple illustration of our network setting is
  11. 11. 11depicted in Figure 3. The internet cloud consists of many be 55% of the access link bandwidth limit. The P2P fileinter-connected nodes including backbone routers and transfer traffic is carried over the TCP connection in NS-edge routers. We assume that the Internet backbone has 2. Note that in our NS-2 simulations, the actual capacityhigh bandwidth and does not introduce any congestion, fluctuation in each of connections of a peer is governedimplying that the main bottleneck is the access link of by not only those asymmetric physical bandwidth limits,each peer. Such assumption enables us to run simula- but also the actual TCP congestion control algorithm, thetions using a star-shaped network topology in which the number of concurrent connections to the source peer,internet backbone is replaced by a single network node and the random Pareto on/off background traffic, i.e.,that has infinite capacity and zero delay. Although our it is stochastic in time and heterogeneous over space.goal is to improve the download performance, whether As we have stated in the beginning of the paper, weour algorithm is ISP-friendly is an interesting future are only interested in the peer selection algorithm itselftopic. Some new work has started to consider the impact rather than the problem of free-riders. All peers canof peer selection on cross-ISP traffic [30], [31]. The access serve as source peers, hence |S| = 50. The number oflinks of peers have different bandwidth limits and are downloading peers ranges from 10 to 50. Here, we setconnected to the “super” node representing the internet the number of average parallel connections L = 2.backbone. To reflect a general network setting for the First, we show the performance difference betweenaccess bandwidth, we configure 10% of the total 50 the simple “connect-and-wait” strategy and dynamicpeers to have 10Mbps upstream (from peer node to the peer selection with no connection preference (uniformcenter node) capacity limit, 20% to have 5Mbps, 40% selection). Under the “connect-and-wait” strategy, theto have 1Mbps and the rest to have 100Kbps. These file is divided into 2 pieces of equal size. Under dy-groups represent situations in typical LAN, high speed namic peer selections, we set the basic data transfercable/DSL, low speed cable/DSL, and modem connec- unit (chunk) to be 16KB and each time slot is set to betions, respectively. We set 10Mbps as the downstream 1 minute. We choose these two strategies for separatecapacity limit so that the transmission bottleneck will comparison purposes because “periodic uniform peer se-most likely be at the source peers. The total number lection” alone already makes drastic improvement overof network nodes is 51 (one center node and 50 peers the “connect-and-wait” strategy. We then compare theparticipating in content transfer). performance between periodic uniform peer selection and other algorithms to demonstrate the effectiveness of the optimal strategies more clearly. We first set the 0.25 file size F =50MB for the ease of presenting our results. |D|=5 We have also run simulations using larger file sizes and 0.2 |D|=10 observed the same result. (“connect-and-wait” strategy Connection Probability |D|=20 always performs worst) |D|=40 0.15 |D|=60 Unform 0.1 0.05 0 20 15 10 5 Service Capacity (MB)Fig. 3. The illustration of the network used in our NS-2simulation. We assume that the Internet backbone doesnot introduce any congestion. The upstream access linkof each peer is bandwidth limited. Fig. 4. The average download time for periodic uniform peer selection and “connect-and-wait” strategy. There are 50 nodes and no free-riders, hence |S| = 50. The number Note that each node’s upstream bandwidth is not only of downloading peers varies from 10 to 50.shared among other peers connected to it but also bysome other network applications as in real world case.We model those non-P2P traffic by using the on/off Figure 4 shows the results for our first scenario.Pareto traffic generators in NS-2. We set both the average The line marked with “uniform” is the result of the“on” and “off” periods to be 10 minutes. When a traffic periodic uniform selection, and the line marked withsource is in the “on” state, it generates a constant bit-rate “permanent” is the result for the “connect-and-wait”traffic at 90% of the full upstream capacity. Hence, the strategy. In all cases, we can see at least 50% reductionlong term upstream average capacity for P2P traffic will in average download time by using periodic uniform
  12. 12. 12peer selection. In some extreme cases, take |D| = 50 as 1an example, the average download time for “connect- % Over Uniform Selectionand-wait” strategy is almost 4 times longer than the 0.9periodic uniform selection. The result is exactly whatwe anticipated from our discussion in Section 3. Next, 0.8 Optimal Distributedwe show that under the dynamic peer selection schemes(non-uniform), the average download time can be fur- 0.7ther shortened by selecting peers using the optimalconnection probabilities. The duration of each time slot 0.6is still 1 minute. We compare the performance between 0.5the uniform strategy (periodic uniform peer selection),the centralized optimal selection strategy in (21)–(22) and 10 20 30 40 50the distributed strategy (Algorithm 1) in Section 4.5. We # of Downloading Peers (|D|)change the file size to 200MB, which is typical for some Fig. 6. The reduction in the average download time forvideo clips that contribute to long sessions so as to reflect different peer selection strategies over the uniform peera more general and realistic case. selection. The value in Y-axis is the fraction of the average download time of non-uniform selection algorithms over the average download time of uniform peer selection. Avg. Download Time (Sec) 1500 in our simple distributed algorithm. First, each peer esti- 1000 mates the level of competition of the entire network only through partial information and there will always be Uniform Permanent difference between ‘local view’ and global information. Second, the estimate cij may not be close to the real value ˆ 500 of cj . As we can see from Figure 5, the average download time for the distributed algorithm ranges from 1500 to 10 20 30 40 50 2000 seconds, which means that a downloading peer can |D| make 25 to 35 connection switches during its session. Each peer will thus make around 60 observations of theFig. 5. The average download time for three dynamic system (L = 2). However, such number of observationspeer selection strategies: periodic uniform peer selection is still not enough to obtain an accurate estimate for(Uniform), the centralized optimal strategy in (21)–(22) cj , especially under highly stochastic and heterogeneous(Optimal), and the distributed strategy in Algorithm 1 environment as in our setting. We point out these two(Distributed). factors to illustrate the possibility that the performance of the distributed algorithm can be further improved by Figure 5 shows the average download time along with increasing the accuracy of estimation of system variables,its confidence interval for three different dynamic peer which can be made totally decoupled from our consid-selection strategies. It is clear that both the centralize eration of stochastic heterogeneous P2P networks underoptimal strategy and the distributed strategy offer a per- competition.formance boost over the simple uniform peer selection.(Note that this simple uniform selection is still muchbetter than the usual connect-and-wait strategy as shown 6 C ONCLUSION AND F UTURE WORKin Figure 4.) To further demonstrate the effectiveness In this paper, we jointly consider many factors that haveof our strategies over uniform peer selection, we plot impact on the average download time, which are oftenthe reduction in the average download time of both considered separately in the literature, and derive an op-centralized and distributed strategies over uniform peer timal peer selection strategy that greatly improves con-selection in Figure 6. tent delivery performance in a stochastic heterogeneous It is clear that the optimal strategy can further reduce P2P network. Specifically, we address some of the oftenabout 50% of the average download time compared with neglected issues, namely, the competition for resourcethe periodic uniform selection strategy over most range among multiple concurrent downloading peers, jointlyof values of |D|. We see that our simple distributed with the spatial heterogeneity and temporal correlationalgorithm (developed from the centralized optimal strat- in service capacities of the source peers. We derive theegy) can still reduce the average download time by 10% relationship between the average download time andto 15%. Clearly, there is a performance gap between system performance metrics such as system utilizationthe distributed algorithm and the centralized optimal and level of competition, and develop our optimal peerstrategy. This is inevitable due to the estimation errors selection strategies based on this relationship.