Resource demand and supply in BitTorrent content-sharing communities
Nazareno Andrade a,b,*, Elizeu Santos-Neto b
, Franci...
evaluates the quality of service currently available in
commons-based content distribution communities; (ii)
for community...
segmented BitTorrent communities, centered around the
different sites that enable content location.
Studies based on model...
study). We experimented with snapshots every 30 min, but
the resulting load was seen as too high by communities’
optimized designs of content distribution mechanisms. We
consider torrent joins (i.e., file requests) as indirectly
A comparison between the kðtÞ and k0
ðtÞ models can be
made through the difference in their Akaike’s Information
5.1. How are user contributions distributed?
Our analysis of user contribution shows that, at the
community level, contrib...
is concentrated on a torrent, the more the service level in
that torrent depends on the individual behavior of a few
larger files and because of that, to upload more data. Also,
the p-value for all results we report in this section are low-...
From a different perspective, our results show that
there is virtually no free-riding in bitsoup: the norm is that
users d...
considerably skewed, which renders its mean a limited
assessment of typical torrent behavior. Indeed, 60% of tor-
rents in...
study, as the commonly referred to Poisson process is not
accurate for current BitTorrent usage.
The characterization of r...
[26] D. Qiu, R. Srikant, Modeling and performance analysis of BitTorrent-
like peer-to-peer networks, in: Proceedings of t...
Upcoming SlideShare
Loading in...5

Resource demand and supply in BitTorrent content-sharing ...


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Resource demand and supply in BitTorrent content-sharing ...

  1. 1. Resource demand and supply in BitTorrent content-sharing communities Nazareno Andrade a,b,*, Elizeu Santos-Neto b , Francisco Brasileiro a , Matei Ripeanu b a Universidade Federal de Campina Grande, Laboratorio de Sistemas Distribuidos, Av. Aprigio Veloso, 882, Bloco-CO, 58109970 Campina Grande, PB, Brazil b University of British Columbia, Vancouver, BC, Canada a r t i c l e i n f o Article history: Received 21 April 2008 Received in revised form 9 September 2008 Accepted 24 September 2008 Available online 21 November 2008 Keywords: Content distribution BitTorrent Workload characterization Resource sharing a b s t r a c t BitTorrent is a widely popular peer-to-peer content distribution protocol. Unveiling pat- terns of resource demand and supply in its usage is paramount to inform operators and designers of BitTorrent and of future content distribution systems. This study examines three BitTorrent content-sharing communities regarding resource demand and supply. The resulting characterization is significantly broader and deeper than previous BitTor- rent investigations: it compares multiple BitTorrent communities and investigates aspects that have not been characterized before, such as aggregate user behavior and resource contention. The main findings are three-fold: (i) resource demand – a more accurate model for the peer arrival rate over time is introduced, contributing to work- load synthesis and analysis; additionally, torrent popularity distributions are found to be non-heavy-tailed, what has implications on the design of BitTorrent caching mecha- nisms; (ii) resource supply – a small set of users contributes most of the resources in the communities, but the set of heavy contributors changes over time and is typically not responsible for most resources used in the distribution of an individual file; these results imply some level of robustness can be expected in BitTorrent communities and directs resource allocation efforts; (iii) relation between resource demand and supply – users that provide more resources are also those that demand more from it; also, the distribution of a file usually experiences resource contention, although the communities achieve high rates of served requests. Ó 2008 Elsevier B.V. All rights reserved. 1. Introduction Four aspects must be analyzed to understand a compu- tational system and improve its performance: its design, its implementation, the resources on which it runs, and the workload it serves. The first two aspects directly determine the efficiency of the system, while the latter two bound sys- tem performance and the efficacy of resource allocation mechanisms. Moreover, while the design and implementa- tion of a system can be analyzed in controlled conditions, a characterization of typical workload and resource availabil- ity normally requires monitoring in production settings. This study focuses on advancing the characterization of workload and resource availability of commons-based con- tent distribution based on BitTorrent, a widely popular, peer-to-peer content distribution protocol. Our analysis of the data collected from three BitTorrent communities leads to the understanding of the resources on which peer-to-peer content distribution mechanisms typically run and the workload they serve. We interpret the resources on which BitTorrent runs as its resource supply and the workload it serves as its resource demand. In this perspective, this study investigates the im- pact of resource supply and demand on BitTorrent’s perfor- mance, on its resource allocation mechanisms, and on the overhead it imposes on the underlying network infrastruc- ture. The resulting analysis is relevant: (i) for users, as it 1389-1286/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2008.09.029 * Corresponding author. Address: Laboratório de Sistemas Distribuídos, Av. Aprígio Veloso, 882, Bloco CO, CEP 58429-900, Campina-PB, Brazil. Tel.: +55 83 33101365. E-mail addresses:, (N. Andrade), (F. Brasileiro). Computer Networks 53 (2009) 515–527 Contents lists available at ScienceDirect Computer Networks journal homepage:
  2. 2. evaluates the quality of service currently available in commons-based content distribution communities; (ii) for community operators, since it characterizes the re- sources on which the community depends and, in particu- lar, the effect of an increasingly popular incentive mechanism called sharing-ratio enforcement; (iii) for devel- opers of content distribution technologies, as it documents usage patterns which affect resource allocation mecha- nisms; and (iv) for Internet infrastructure operators, as de- mand and supply patterns define the load content distribution poses on the network and how effective it is to cache content to reduce operational costs. The traces collected and characterized in this work al- low us to draw a clearer picture of BitTorrent communities than previous studies [19,25,5,16,31,3,27]. We are able, first, to compare system behavior across different commu- nities, and second, to accurately model user behavior, as some of our traces allow precise user identification across all activities a user may engage in a community. Accurately tracing user behavior enables us to characterize the system at a community level more precisely than previous studies that focused on individual files or used tentative user iden- tification. Additionally, we discuss limitations of current practices used by previous BitTorrent measurement stud- ies, and document solutions to limit the shortcomings of these methodologies. In summary, this paper extends previous characteriza- tion studies in terms of breadth, as it compares system behavior in several large communities (the largest having more than 80,000 active users performing 1.7 million downloads during our trace); depth, as it investigates novel aspects of demand and supply, such as user behavior across torrents and resource contention; and accuracy, as it improves the methodology used by previous studies. At a high level, our study pictures BitTorrent communi- ties as content distribution systems which generally expe- rience resource contention, but often operate on an abundance of resources, and which rely on contributions from a minority of users. Such minority, however, is not composed of altruistic participants: users that contribute more to the communities are also those that request more. Furthermore, the set of major contributors is not static: heavy contributors typically have this status for a limited time. Finally, communities successfully serve the vast majority of the received requests. In more detail, this study finds that, from the resource demand perspective, (i) file popularity in BitTorrent com- munities deviates significantly from the long-tailed item popularity distribution on the Web, with direct implica- tions on the design of caching mechanisms for BitTorrent traffic; and (ii) the request arrival rate for a file is not com- prehensively modeled by previous proposals, what leads us to provide a more accurate model for the arrival rate of requests. From the resource supply perspective, our main find- ings are: (i) a few users contribute a majority of resources at the community level, yet, at the individual data item le- vel, contributions are considerably less concentrated; (ii) peers which contribute more to the system are those that devote more bandwidth to the system, and not those that devote more time to distribute a file, and (iii) sharing-ratio enforcement, a popular incentive mechanism deployed in BitTorrent communities, leads to users investing more time contributing to the community, but not to higher bandwidth allocations. Investigating the relationship between supply and de- mand shows that: (i) in the community we can gauge the correlation between users’ demand and contribution, users that contribute more to the community are also those that consume more from it, an observation that denotes a de- gree of equity to this community; (ii) in all communities studied, a high proportion of the requests is successfully served, which is evidence that commons-based content distribution provides a good level of quality of service and that traditional content providers could reduce data distribution costs if they are able to leverage users’ contri- butions at similar levels; and (iii) resource contention var- ies significantly even within a single community: for three quarters of the files distributed, there is at least a mild con- tention for resources and the provision of more resources could improve quality of service; for the remaining quar- ter, there are enough resources available to meet the demand of all peers indistinctly, rendering prioritization- based incentive mechanisms irrelevant. The rest of this article is organized as follows. The next section presents an overview of BitTorrent together with related work. Section 3 details the communities studied and our data collection and analysis method. The charac- terizations of resource demand, supply and the relation be- tween them are presented in Sections 4–6, respectively. The last section brings our conclusions and final remarks. 2. Background The main goal of BitTorrent is to enable scalable content distribution. To this end, the load of distributing a file is shared between the content publisher and those who download it: the peers downloading and those which have already downloaded the file supply bandwidth, the parts of the file they already have, and content availability. This scheme is currently widely popular: studies estimate that about 30% of Internet traffic was due to BitTorrent in 2005 [10]. In BitTorrent parlance, a torrent is the group formed by all peers taking part in the distribution of a file. To down- load a file using BitTorrent, a user must join the torrent that distributes it by contacting its tracker, the component of the system that enables peer discovery and data loca- tion. Peers that have an incomplete copy of the file are called leechers, while peers that have finished downloading and still participate in the torrent are called seeders. Lee- chers both upload and download pieces of the file, while seeders only upload them. Both leechers and seeders re- port their progress periodically to the tracker. BitTorrent has a built-in incentive mechanism through which each leecher prioritizes the leechers that provide it the best re- cent download speed. Seeders do not share the same built- in incentive mechanism, as they only upload [13]. Note that the content discovery and access control mechanisms are external to the BitTorrent protocol, which is focused on data transfer. These functionalities are usu- ally provided through a web site. This solution leads to 516 N. Andrade et al. / Computer Networks 53 (2009) 515–527
  3. 3. segmented BitTorrent communities, centered around the different sites that enable content location. Studies based on modeling [26,35,16], simulation [7,32] and experiments with BitTorrent software [20] established, under controlled conditions, that BitTorrent is an efficient and scalable content distribution protocol. Although these studies based on controlled scenarios help understand BitTorrent behavior, a comprehensive characterization of BitTorrent in production scenarios is necessary to comple- ment them. Several studies of real world deployments provide valu- able information on this perspective [19,25,5,16,31,3, 27,23]. However, these studies suffer from four shortcom- ings which motivate our work: (i) they are unable to accu- rately analyze aggregate user behavior at a community level – due to inaccurate user identification in the collected data, (ii) their study of the relationship between resource demand and supply is limited, (iii) they are restricted in scope, as they analyze either a few torrents [19,31], a single community [5,16,2] or a single snapshot of multiple com- munities [3,27], and (iv) they have methodological limita- tions related to the assessment of information loss or bias in the sampling methods used. This work addresses these issues by (a) obtaining and analyzing a trace from a community which provides strong user identification, (b) broadening the scope of the charac- terization to three different communities with more than 10,000 torrents and one million downloads, (c) discussing in depth the implications of the sampling methods used, and (d) analyzing the relationship between resource de- mand and supply. The traces which address points (a) and (b) and our approach to address point (c) are presented in Section 3. Our subsequent BitTorrent characterization addresses point (d). Throughout the rest of the document, we compare in detail the results of our study with related work. 3. The data sets This section presents the terminology used in the rest of the paper, the data collection method, and the three Bit- Torrent communities studied. Analyzing multiple commu- nities is necessary as user behavior tends to vary across communities [3,27]. Although studying the three selected communities does not guarantee a definitive view over user behavior, we claim that analyzing multiple communi- ties and a larger set of torrents does contribute towards a better characterization. 3.1. Terminology For the rest of this document, we use the following ter- minology: we differentiate between users and peers. A user is a participant in a BitTorrent community, which is ob- served as a peer in each torrent she participates. This dis- tinction is relevant because for some of the communities studied, it is only possible to observe accurately peer behavior. A peer joins a torrent the first time it participates in it. Each peer might have several sessions in the same torrent, as it may go offline and come back online, and it leaves a torrent when it departs from the torrent and does not come back. The time a peer spends online after it finishes the download and before it leaves the torrent is the peer’s seeding time. The torrent start is determined by the first peer join event in that torrent. The torrent end is the time when the last peer leaves the torrent. The lifetime of the torrent is the period between its start and its end, and a torrent is complete if its start and end happen within our measurement period. We consider a two-level view of a BitTorrent commu- nity: the community level view characterizes the behavior of users across all torrents they participate in and aggre- gates metrics over all torrents in the community. The tor- rent level is concerned with peer behavior in each torrent, without aggregating this behavior to observe users. The community level and torrent level views offer complemen- tary view of the community: the former informs observing users across different torrents, while the latter observes primarily torrents. Also, this distinction is necessary in our data analysis, as for some of the communities studied, it is not possible to accurately track user behavior. In these communities, we focus on analyzing resource demand and supply at the torrent level. 3.2. The communities The three BitTorrent communities studied are etree, bit- soup and alluvion. etree ( is a community devoted to sharing recordings of live performances for non-commercial purposes; alluvion (http://www.allu- is a community hosting user-generated media; fi- nally, bitsoup ( is a community of users that share all kinds of content. Two features distinguish bitsoup from the other two communities. First, it requires users to register with the community website and tracks user behavior across tor- rents. Second, it uses sharing-ratio enforcement (SRE) in addition to BitTorrent’s built-in incentive mechanism to boost resource contribution. SRE, also used by other com- munities (e.g., works by keeping a record of users’ resource consumption and con- tribution across different torrents and penalizing the users which do not contribute a minimum proportion of their consumption across all torrents where they participate. 3.3. Data collection This study uses a passive method to collect BitTorrent traces. Data collection is done via crawling report pages provided by the trackers of each community, instead of deploying software on client machines. These reports con- tain detailed information, per torrent, about all peers cur- rently active in the system, such as peer’s downloaded and uploaded amounts, for how long the peer is online and whether the peer is a seeder. Each crawling consists of thousands of HTTP requests to the community’s web server. Thus, the frequency of data collection must be moderate to minimize the crawling overhead. The crawler executed every hour (Section 3.4 discusses the limitations of this sampling frequency in this N. Andrade et al. / Computer Networks 53 (2009) 515–527 517
  4. 4. study). We experimented with snapshots every 30 min, but the resulting load was seen as too high by communities’ administrators. To collect data from bitsoup and etree, we implemented our own crawlers, while the alluvion data is available at the UMass Trace Repository ( It is worth noting that although the data from alluvion has been analyzed before [5,16], we perform our analysis with an improved methodology and analyze new dimensions of it. Table 1 presents, for each community studied, the dura- tion of the trace, the total number of torrents observed in each trace, the average number of torrents alive at any point in time, the total number of peers seen in the trace and the average number of peers seen at any point in time. The bitsoup community is considerably larger than the other two studied. Nevertheless, as we show throughout the rest of the paper, this does not cause major differences in the properties of user behavior we consider. 3.4. Reconstructing torrent dynamics Our traces consist of hourly snapshots of the state of peers and torrents in a community. Before analysis, it is necessary to reconstruct peer and torrent behavior over time from the snapshots. This reconstruction process im- plies three challenges, which we discuss next. Please refer to our technical report [4] for further details about the methods presented here. The first challenge in analyzing the traces is the use of imprecise identification by some trackers, an issue dis- cussed by previous studies [5,16,31,19]. This study has the advantage that one of the communities studied, bit- soup, uses unique logins to track user behavior. This allows us to estimate precisely the distribution of resource contri- bution and consumption by users at the community level. User identification is imprecise in alluvion and etree, allow- ing only heuristic-based identification of peers at a torrent level. For these two communities, we used heuristics simi- lar to those reported in previous studies [5,16,19] to track peer behavior given the imprecise identification found in the traces. For the sake of reproducibility, these heuristics are detailed in our technical report. A second difficulty arises from the crawling frequency. As snapshots are taken periodically, information is lost if the rate with which relevant events occur is higher than the sampling period. To address this issue, we estimate the information loss and bound it by studying only torrents in which enough observations of the relevant events are available. For that, we determine the likelihood of a peer to join a torrent, download a file and leave the torrent be- tween two snapshots. This likelihood is a function of the time peers must stay in a torrent to download the file. The time necessary to download the file is derived from the size of the file distributed in the torrent and the distri- bution of peers’ download bandwidth. Using this result, we have estimated the amount of information loss for torrents distributing files of different sizes and found that by ana- lyzing only torrents that distribute files larger than 100 MB, it is possible to observe at least 90% of the peers in the communities studied. For the remainder of this study, therefore, we consider only these torrents in our traces. The third complication in analyzing the data sets results from the limited trace duration. For some analysis, such as the characterization of the request arrival process in tor- rents, it is necessary to examine a sample of complete tor- rents. Moreover, it is desirable that this sample reflects the overall population of torrents in the community. However, because data from each community is collected for a lim- ited period (up to 68 days for bitsoup), one must take care when sampling the complete torrents so as to produce an unbiased sample. The reason is that considering only com- plete torrents will admit proportionally more short tor- rents than exist in the torrent population, as these have a higher probability of occurring in the trace than longer torrents. To avoid bias when studying complete torrents, we ap- ply the create-based method proposed by Roselli et al. [28], which allows obtaining an unbiased sample of torrents with a maximum duration s. We obtained samples of tor- rents for s ¼ 8 days from the three communities and sam- ples of torrents for s ¼ 30 days for alluvion and bitsoup, the two communities for which we have longer traces. In the rest of this paper these samples are referred as s8 and s30, respectively. Table 2 details the samples we consider given our method. 4. Characterizing resource demand The first part of this characterization focuses on the de- mand generated by members of a BitTorrent community. Understanding usage patterns is paramount to produce Table 1 Characteristics of the traces used. Trace Duration Torrents Peers Total Average Total Average etree 10 days during March 2005 1589 835 66,588 4905 alluvion 50 days during October–December 2003 1528 278 227,096 7312 bitsoup 68 days during April–July 2007 13,741 6633 1,694,243 145,462 Table 2 Characteristics of torrent samples considered. Sample Torrents Peers All s8 s30 All s8 s30 alluvion 1247 271 355 187,916 12,291 43,930 bitsoup 10,463 416 1123 1,351,806 8400 54,889 etree 284 124 – 11,788 1764 – 518 N. Andrade et al. / Computer Networks 53 (2009) 515–527
  5. 5. optimized designs of content distribution mechanisms. We consider torrent joins (i.e., file requests) as indirectly expressing user demand and focus on the following two questions: (i) what is the distribution of torrent popularity, as expressed by total number of torrent joins each torrent receives; and (ii) what is the evolution of the rate of torrent joins over time. 4.1. What is the content popularity distribution? The popularity of a torrent (and implicitly of a content item) is the number of torrent joins received during a time interval. From our data, we can see both the popularity of all torrents during the duration of the traces and the pop- ularity of the complete torrents sampled. The former shows how the interest of users is distributed over avail- able content in a period, while the latter is concerned with the total number of users which will join a torrent. Regard- less of perspective, our main finding is the same: content popularity in BitTorrent communities is not heavy-tailed. Fig. 1 shows the popularity of all content during the en- tire period of our measurements (left) and restricted to complete torrents on s30 (right). The popularity distribu- tion of torrents in the s8 sample have the same character- istics. For all samples, the curves have similar shapes and clearly deviate from a Zipf distribution, commonly referred to when modeling popularity. For all our samples, a Lognormal or a Weibull distribu- tion fits the empirical data well.1 These distributions are distinct both from those found in peer-to-peer file-sharing [15,29] and video streaming [11], but are similar with the observed distribution of user activity across topics in four online peer production systems observed by Wilkinson [34] and with the popularity of films as measured by their box office revenues [30]. A distinguishing feature in the distributions we observe is that they are not heavy-tailed. The absence of a heavy- tail has major implications on the design of caching mechanisms. On the one side, for small cache sizes, these popularity distributions lead to caches that are less effec- tive (i.e., lower hit ratio) than for heavy-tailed distribu- tions. On the other side, for large cache sizes, the cache effectiveness can be much higher than for heavy-tailed dis- tributions, since the percentage of unpopular items is much lower in the trace we observe. This stands in contrast with the design of caches for Web pages, whose popularity distribution is heavy-tailed [14,8] and suggests that caching mechanisms designed for heavy-tailed distributions observed in peer-to-peer file-sharing (e.g. [33,29]) should be revisited before being applied to BitTorrent traffic. We note that Belissimo et al. documented that the popularity distribution of files devi- ates from a Zipf distribution in the alluvion data we con- sider [5]. Our results expand this observation through a wider sample that includes three different content-sharing communities and suggest distributions which fit the data. 4.2. How are torrent joins distributed over time? A second dimension of user demand is revealed by the distribution of joins over the torrents’ lifetimes. Our char- acterization (i) reproduces previous results which show that the join rate for a torrent drops rapidly after its start and (ii) proposes a model for the evolution of join rate over time that is more accurate than current state-of-the-art. Previous studies report that the request arrival rate for a torrent decreases rapidly over time after its start [5,25,16]. Guo et al. [16] report, based on the same alluvion trace we use, that the request arrival rate for a torrent decreases exponentially over time. We revisit their findings using all three traces we have collected and benefit from the in- creased accuracy offered by the bitsoup trace which con- tains accurate peer identification. Examining the curve of peer joins per day, we observe that although an exponential function kðtÞ ¼ aeÀt=b is able to accurately model a number of torrents, it fails to account for a longer tail of joins that appears in a large number of torrents. This effect is particularly perceivable in bitsoup torrents. We find that a function of the form k0 ðtÞ ¼ a0=ð1 þ btÞ, for t 2 N better models a larger proportion of all torrents in bitsoup while modeling torrents in alluvion and etree similarly to the exponential function. As in the exponential model, a0 represents the initial peer join rate and b is a fac- tor that influences how fast this rate drops with time. Dif- ferently from kðtÞ, however, in k0 ðtÞ the arrival rate decreases slower and at different rates during the lifetime of the torrent. The difference in the two models is further illustrated in Fig. 2. 100 101 10 2 103 104 100 101 102 103 104 105 106 Popularity Torrent popularity rank alluvion bitsoup etree 100 101 10 2 103 104 100 101 102 103 104 105 106 Popularity Torrent popularity rank alluvion bitsoup Fig. 1. Popularity rank of all torrents in the traces (left) and of torrents in the s30 sample (right). 1 We used QQ-plots to compare empirical and theoretical CDFs as visual tests of goodness of fit. N. Andrade et al. / Computer Networks 53 (2009) 515–527 519
  6. 6. A comparison between the kðtÞ and k0 ðtÞ models can be made through the difference in their Akaike’s Information Criterion (AIC). This criterion quantifies the fit of an esti- mated statistical model and can be used to compare how well two models fit a dataset. Depending on the difference between the AIC for the two models, it is possible to assess their relative merits [9]. This comparison can result in con- sidering both models to be adequate, or in evidence in fa- vor of the use of one of them. We take a conservative approach and consider that a model can be used unless there is essentially no support for it in comparison with the competing model. It is then possible to verify how of- ten each model can be used to model the torrents in our traces, considering that it can be replaced by the compet- ing model. Table 3 summarizes the comparison for the torrents in our traces that lasted for at least 5 days and had a mini- mum of 10 peers. A small fraction (5–10%) of the torrents in each trace fits neither of our two models and was not in- cluded in the table. For the remainder, no model is the most adequate for all torrents and models have a similar coverage for torrents in etree and alluvion. Nevertheless, the k0 ðtÞ model fits considerably better the bitsoup trace particularly for the most popular torrents. One possible explanation for the difference in model adequacy is the scale of the bitsoup community. Bitsoup is significantly larger than the other two communities, which might result in peers joining torrents for longer periods. It is also possible that the heuristic peer identification em- ployed in the analysis of etree and alluvion influences the peer joins observed in these communities, but our data does not allow an evaluation of this potential influence. In spite of the cause of the difference, however, our re- sults support the k0 ðtÞ model as a valuable tool both when reasoning about torrents and when synthesizing work- loads. When reasoning, it offers a complement to the exponential model, better explaining a number of torrents and accounting for the phenomenon of longer tails in peer join rates. For workload synthesis, it offers an accurate representation of how the popularity of a torrent de- creases over time. Moreover, the k0 ðtÞ model is particu- larly suitable for modeling highly popular torrents, a type of torrent that is often of interest in performance evaluation. Finally, the observed model has impact on simulation studies. The decrease in peer join rates over time implies that a pure Poisson process does not model the peer join process well. Several studies (e.g. [21,26,24,12]) have re- lied on this model and our result strengthens the need for reconsidering them with more accurate models. 5. Characterizing resource supply In a BitTorrent community, resources are contributed by users. Their actions determine resource supply as users (i) configure the maximum amount of bandwidth their cli- ent can use for upload, (ii) determine the seeding time of their clients by controlling when how long the client stays online after it has finished a download, and (iii) decide to definitely quit torrents they are seeding, thus stopping to contribute to them. This user behavior influences three aspects of the sys- tem: (a) throughput, (b) content durability, and (c) upload volume. Users contribute to system throughput by provid- ing upload bandwidth. Content durability is influenced by users’ seeding time. Finally, the upload volume is the result of both providing upload bandwidth and spending time online when there is demand for service. The remaining of this section evaluates the contribution users make to each of the aforementioned aspects of the system (Sections 5.1 and 5.2); investigates whether it is the upload bandwidth or the seeding time that better determines the upload volume at both community and tor- rent levels (Section 5.3); and finally investigates the dynamics of the population of contributors in the system (Section 5.4). 0 5 10 15 1520100500 Torrent age (days) Dailyarrivalrate Measured λ(t) λ'(t) 0 5 10 15 20 1520100500 Torrent age (days) 0 5 10 15 20 1550500 Torrent age (days) Fig. 2. Fitting of kðtÞ and k0 ðtÞ for three example torrents. The two torrents on the left are from bitsoup, while the rightmost one is from alluvion. Table 3 Comparison of the percentage of torrents in which each model was equivalent or better than the competing mode. The percentage in the kðtÞ column states the fraction of all torrents in which the kðtÞ model was as good as or better than the k0 ðtÞ model. Sample # torrents kðtÞ (%) k0 ðtÞ (%) etree s8 27 100 100 alluvion s30 194 67 65 bitsoup s30 858 40 79 joins < 50 406 45 82 50 6 joins 6 150 430 37 75 joins P 150 22 9 91 520 N. Andrade et al. / Computer Networks 53 (2009) 515–527
  7. 7. 5.1. How are user contributions distributed? Our analysis of user contribution shows that, at the community level, contributions of all types are concentrated on a small portion of contributors, yet, at the torrent level, contributions are typically less concentrated. The first part of this analysis investigates the distribu- tion of user contribution at the community level. Such analysis is only possible in the bitsoup trace, since it is the only trace with accurate user identification. Fig. 3 shows that contributions are considerably concentrated in this trace: the top 20% contributors provide around 80% of the contribution for all the types of contributions considered. We note that although a minority of all users provide a significant share of resources, this share is much lower than that reported in previous studies of Gnutella [1,18] and eDonkey [17], two other peer-to-peer file-shar- ing systems. However, a high concentration at the community level does not imply that contribution is also concentrated at the torrent level. The investigation of this aspect uses the samples of complete torrents from all communities. We fo- cus on the top 20% contributors in each individual torrent and measure the amount they contribute. Fig. 4 shows the CDF of the proportion of contribution generated by peers in the top 20% set: the proportion y of all torrents have x or less of their resources contributed by the peers in their top 20% set. The concentration of contribution is less pronounced at the torrent level than at the community level in all samples of complete torrents. While at the community level, the users that are the 20% top contributors are responsible for 80% or more of the total seeding time, upload band- width and upload volume; at the torrent level, the peers that represent the top 20% of the contributors are respon- sible for a similar proportion of all contribution only in a small number of torrents. In particular, when considering the upload volume, the top 20% contributors only rarely contribute the majority of the uploaded volume. Upload bandwidth and seeding time are usually more concen- trated; yet it is not as concentrated as at the community level. Focusing the analysis on the upload volume metric, it is possible to gain further insight on how resources are pro- vided: the considerable concentration of upload volume contributions at the community level and the milder con- centration at the torrent level together suggest that top contributing users do not achieve this status as a result of massive contributions in a small number of torrents. In- stead, these users contribute in a large number of torrents over time. On the one hand, understanding that a small proportion of users are responsible for most of the data transferred motivates communities to highly value these users, rewarding them so as to maintain their participation. On the other hand, the relative balance in resource contribu- tion at the torrent level can be useful regarding resource allocation in the community. For example, this information is useful when deciding, either centrally or by collective ac- tion, in which torrents each user should seed. The lack of concentration in contributions at the torrent level implies that the number of peers contributing in a torrent is a rea- sonable indicator for the level of contribution to be ex- pected in that torrent. Our results suggest resource allocation at the community level can use this simple and cheap-to-obtain information to decide where to direct resources. Furthermore, the lack of an accentuated concentration in resource contribution at the torrent level improves the overall robustness of torrents. The more the contribution 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Cumulativeproportion ofcontribution Cumulative proportion of users upload bandwidth upload volume seeding time Fig. 3. Community level concentration of contributions in bitsoup ðs30Þ. 0 0.2 0.4 0.6 0.8 1 0 0.2 0.8 1 P[X<x] Proportion of contribution by top 20% contributors bitsoup tau30 upload bandwidth upload volume seeding time 0 0.2 0.4 0.6 0.8 1 0 0.2 0.8 1 Proportion of contribution by top 20% contributors alluvion tau30 0 0.2 0.4 0.6 0.8 1 0 0.2 0.8 1 Proportion of contribution by top 20% contributors etree tau8 Fig. 4. Torrent level concentration of contributions. N. Andrade et al. / Computer Networks 53 (2009) 515–527 521
  8. 8. is concentrated on a torrent, the more the service level in that torrent depends on the individual behavior of a few peers. More equitable contribution distribution leads to torrents that are more robust to individual peer failure or departure. 5.2. Do contribution levels vary across communities? To examine regularities and peculiarities in user behav- ior, it is necessary to compare the three types of peer con- tributions across the communities. Recall that bitsoup operates a sharing-ratio enforcement mechanism (SRE) seeking to boost resource contributions. Our investigation reveals that users seed for longer in the community that uses this enforcement mechanism, although bandwidth contribu- tion is similar across all communities. The comparison of the three communities focuses on contributions at the torrent level (a community level com- parison among the three communities is not possible since the upload volume is directly related to the size of the community). Upload bandwidth and seeding time are com- parable if they do not correlate with torrent characteristics. Otherwise, observed differences in contributions might re- sult from torrents’ peculiarities. A correlation analysis shows that the upload bandwidth allocated by a peer is unrelated to the characteristics of the file being downloaded. Additionally, seeding time is not correlated with the size of the file or the time the peer takes to download it. This indicates that it is reasonable to analyze peers’ upload bandwidth and seeding time regardless of the torrent they participate in and to compare them across communities. Fig. 5 shows the distribution of seeding time and upload bandwidth contributed by peers. Regarding the upload bandwidth, in all communities, about only 5% of peers con- tribute large amounts of bandwidth (over 100 KB/s), 20– 30% of peers do not contribute, and 40% contribute be- tween 5 and 50 KB/s. Considering the seeding time, how- ever, it is clear that a considerably larger proportion of users seed for longer in bitsoup than in etree and alluvion. We attribute this difference to the SRE mechanism em- ployed in bitsoup, which encourages users to contribute more. This conjecture is further supported by observations made on a different set of communities by Andrade et al. [3] and Ripeanu et al. [27]. Taken together, the observations that bandwidth levels are similar across all communities while seeding is higher in the presence of SRE brings a new perspective on the ef- fects of this mechanism. Our analysis unveils that users typically try to increase their contribution levels by seed- ing for longer and not by providing more bandwidth to the system. This observation is of particular importance for designers and operators of incentive mechanisms for BitTorrent, as it provides evidence of which resource users will invest as a response to incentives to increase their up- load volumes. Finally, regardless of the difference in seeding behavior across communities, the seeding time distribution is con- siderably skewed in all of them. This regularity supports the conjecture of Powelse et al. [25] that the behavior of individual users is more relevant to determine the longev- ity of a torrent than the number of seeders online at an in- stant: a file is likely to be available for longer if there is a long-standing seeder than if there are several short-lived ones at a given moment. 5.3. What determines the upload volume: seeding time or bandwidth? This section investigates what factor is determinant in explaining the upload volume of a user at the community level and of a peer at the torrent level. The goal is to find whether the upload bandwidth or the seeding time better explains the upload volume. The results suggest that the bandwidth is a better predictor for the upload volume at both levels, as opposed to the seeding time. The study at the community level can only be done with the bitsoup trace. At the torrent level, we investigate the s30 samples if bitsoup and alluvion. We analyze the correlations between three variables: (i) the volume uploaded, normal- ized by the amount downloaded; (ii) the estimated upload bandwidth; and (iii) the time spent online in each torrent. The normalization is used to avoid the effect of file size on the measurements: peers are likely to leech for longer on 0 0.2 0.4 0.6 0.8 1 1 10 100 1000 Cumulativeproportionofseeders Seeding time (h) alluvion bitsoup etree 0 0.2 0.4 0.6 0.8 1 1 10 100 1000 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 -3 10 -2 10 -1 10 0 10 1 10 2 10 3 10 4 P[bandwidth<=x] Peer bandwidth (KB/s) bitsoup alluvion Fig. 5. CDFs of seeding time and upload bandwidth provided by peers. Omitted samples behave similarly. 522 N. Andrade et al. / Computer Networks 53 (2009) 515–527
  9. 9. larger files and because of that, to upload more data. Also, the p-value for all results we report in this section are low- er than 2:2 Â 10À16 . A regression analysis shows that, at the torrent level, the amount contributed by each peer on a torrent has a higher correlation with its bandwidth (R2 ¼ 0:50 in bitsoup and R2 ¼ 0:39 in alluvion), than to the time online (R2 ¼ 0:03 in bitsoup and R2 ¼ 0:008 in alluvion), after log transformations on the variables. At the community level, a similar analysis indicates an even stronger correlation between the user’s upload vol- ume and its estimated bandwidth ðR2 ¼ 0:50Þ, compared to the correlation between the user’s upload volume and the time she spent online ðR2 ¼ 0:01Þ. Therefore, the heavy contributors at the community and torrent levels are those that own and make available more bandwidth, as opposed to those that spend longer periods seeding. Together with our analysis of seeding time in Sec- tion 5.2, this result shows that although the main answer from peers to the sharing-ratio enforcement incentives is trying to contribute more by seeding for longer, the most effective strategy for contributing more is actually lifting bandwidth limitations, if these are used. 5.4. How stable is the population of heavy contributors? The results in Section 5.1 show that, at the community level, a few users provide most resources (20% of the users provide 80% of the uploaded volume). In this section, we refer to these users as heavy contributors and analyze whether the set of heavy contributors in the community is stable or changes over time. As it focuses on the commu- nity level, this analysis uses only the bitsoup trace. The analysis is performed as follows: the entire trace is divided into several non-overlapping successive time win- dows W1; W2; . . . of length t. For each window Wi, users are ranked according to the total amount that they uploaded within that window. A user is a heavy contributor in that window if it is among the top 20% contributors. To evaluate the changes in the set of heavy contributors, we use a win- dow length of a week to account for seasonality within dai- ly and weekly periods. The results show that 30% of the user population belongs to the set of heavy contributors at least once during the trace duration. Furthermore, approximately only 1.8% of users maintain the status of heavy contributors over the entire trace duration. This shows that there is some degree of churn in the heavy contributors’ set. However, although our traces allow this assertion, their limited duration do not allow for a precise evaluation of this churn. Nevertheless, the observation that the set of heavy con- tributors over time is not static suggests BitTorrent com- munities are more robust than communities that depend on a static set of users to provide most of its resources. Although the performance of the system does rely on a small number of users during periods of time, the results indicate that some level of renewal occurs, as heavy con- tributors are replaced over extended periods. This means failure or departure of a heavy contributor has less impact on the system than in the case of a static set of high contributors. 6. Relating resource demand and supply Besides studying resource demand and supply sepa- rately, understanding their relation unveils important data about system functioning. In particular, this section an- swers the following questions: (i) are peers’ demand and supply correlated? (Section 6.1); (ii) does resource supply meet the observed demand? (Section 6.2); and (iii) is there resource contention at the torrent level? (Section 6.3). 6.1. Are heavy contributors heavy consumers too? Our analysis of resource supply shows that, at the com- munity level, a minority of users is responsible for most re- sources contributed. This picture naturally leads to the question of whether this set of users act as servers to a majority that behaves mostly as consumers. The analysis in this section shows that this is not the case in bitsoup, the community where we can evaluate user behavior. In- stead, users who are heavy contributors are also heavy consumers. Fig. 6 shows a scatter plot of uploaded and downloaded volumes of users in bitsoup. The color of each point repre- sents the number of torrents a user participates: fewer tor- rents yield darker points. The logarithms of upload and download volumes are linearly correlated, with a Pearson correlation coefficient of 0.77, revealing that users that contribute more are also those that consume more from the community. Moreover, the color gradient shows that users who are the heavy contributors and heavy consum- ers are those that participate in more torrents. Finally, the correlation between upload and download volumes is consistent for peers irrespective of their activity level. These observations portray bitsoup as an equitable shar- ing system. Our traces do not allow us to determine if this is a result of BitTorrent’s built-in incentive mechanism or of the SRE employed in this community, but our results underpin the scalability of communities adopting bitsoup’s model. If contributions are proportional to consumption, then resource contention levels and, thus, service provision levels, are not affected by the scale of the community. In the absence of a similar correlation, growth in the popula- tion of users could lead to increasing levels of contention for the available resources. Fig. 6. Churn in bitsoup. N. Andrade et al. / Computer Networks 53 (2009) 515–527 523
  10. 10. From a different perspective, our results show that there is virtually no free-riding in bitsoup: the norm is that users do not consume significantly more than their fair share of the community’s resources. This is different from user behavior reported in studies of Gnutella and eDonkey, where high levels of free-riding were documented [1,18,17]. Nevertheless, this comparison should be taken with caution, as analyses of Gnutella and eDonkey are based on the assumption that all users consume from the system while it is observed that only a few contribute. Our analysis provides reference results for a future and fair comparison that tracks user consumption in file-sharing networks and considers as free-riders only the users that consume more than their fair share of the system’s resources. 6.2. Does the resource supply meet the observed demand? This section concentrates on understanding whether re- source supply is adequate to ensure system’s liveness by investigating the proportion of requests are successfully served. Overall, our evaluation shows that in all three com- munities, the great majority of requests is successfully served. In the analyzed traces, failed download requests are ob- served when a peer is left as a leecher in a torrent and no seeder joins this torrent after that. We use the notation st and sc for the fraction of requests that succeed in a torrent and in a community, respectively. Fig. 7 (left) shows the CDF of st for all torrents observed in our traces. Considering only complete torrents, we ob- serve similar results across all communities: most torrents serve virtually all the requests they receive (st is larger than 0.99 for 97% of the torrents in bitsoup and in etree, and for 60% of the torrents in alluvion). Nevertheless, a small fraction of the torrents have most of their requests failed. However, measuring the proportion of served requests at the torrent level does not give an accurate picture of the proportion of requests which are served in a BitTorrent community over a period of time. In fact, a more in-depth analysis of the failed requests reveals that the majority of the torrents which have a high proportion of failed re- quests are those that receive less than 20 requests. These torrents are a minority and serve a small proportion of users. A complementary perspective is then to analyze sc. Fig. 7 (right) shows the 95% confidence intervals of sc for the three communities. We observe that the overall pro- portion of served requests in the three BitTorrent commu- nities studied is high: considering all requests seen, sc is not statistically different from 1 for bitsoup and etree, and is larger than 0.98 for alluvion. Considering samples of complete torrents, however, sc is significantly lower in allu- vion when compared to bitsoup and etree. One interesting perspective is that our results do not al- low a statistical distinction between sc for bitsoup and etree. This is an indication that the sharing-ratio enforce- ment (SRE) is not necessary for etree to achieve a rate of served requests which is equivalent to bitsoup. On the other hand, it is not possible to infer the ineffectiveness of the SRE to improve sc, as bitsoup serves significantly more requests than alluvion. However, the similar levels of contributions in etree and alluvion observed in Section 5 in conjunction with the high sc of etree suggest that some peculiarity of alluvion is the cause of its lower service rates.2 If this conjecture holds, it would imply that although the SRE might lead to more seeding, it is not nec- essary for communities similar to etree or alluvion to achieve a high quality of service in the metric we consider. The results in this section also provide insight on the quality of service provided by commons-based content dis- tribution communities like those supported by BitTorrent. These communities are loosely organized as decentralized peer-production systems [6] coordinated through loose so- cial relationships and still manage to serve all or nearly all requests they receive. Our results provide quantitative evi- dence of the effectiveness of the commons-based approach as a viable alternative for its market counterparts in con- tent distribution. Our interpretation of the service provided by BitTorrent communities stands in contrast with those of Guo et al. [16] and Piatek et al. [23]. Guo et al. calculated the average value of st in alluvion and interpreted its value (0.9) as a sign of an overall unsatisfactory quality of service in torrents. We observed that the distribution of st is 10 -4 10-3 10 -2 10 -1 10 0 0 0.2 0.4 0.6 0.8 1 P[X<x] st alluvion bitsoup etree 0.85 0.9 0.95 1 alluvion-all bitsoup-all etree-all alluvion-t30 bitsoup-t30 alluvion-t8 bitsoup-t8 etree-t8 sc Fig. 7. Proportion of requests served in a torrent (left) and in the community as a whole (right). Arrows indicate the 95% confidence intervals. 2 It is worth noting that removing the first month of alluvion trace, we observe a proportion of served requests comparable to bitsoup and etree. However, further investigation on the causes of such behavior is left as future work. 524 N. Andrade et al. / Computer Networks 53 (2009) 515–527
  11. 11. considerably skewed, which renders its mean a limited assessment of typical torrent behavior. Indeed, 60% of tor- rents in alluvion serve more than 99% of their requests. Fur- thermore, our results show that service quality in alluvion cannot be taken as the general quality of service in BitTor- rent communities. Piatek et al. reported that BitTorrent provides a poor service to its users because 25% of 55,000 torrents ob- served during their measurements were unavailable. The difference between Piatek et al.’s conclusion and ours is due to different definitions of availability: we account ser- vice unavailability only when a peer tries to download a file and fails, while Piatek et al. do not relate availability and demand. 6.3. Is there resource contention at the torrent level? We now turn to investigate resource contention at the torrent level, examining what is the typical regime of oper- ation in BitTorrent communities. Resource contention ex- poses mismatches between resource demand and supply which affect the functioning of the system. When resource demand is much larger than supply in a torrent, download performance falls short of what consumers’ download bandwidth allows for. If supply is much larger than de- mand, providers’ resources are underutilized. Also, de- mand and supply play a role in BitTorrent incentives: prioritization is only relevant when resources are scarce and cannot serve the demand of all consumers. In summary, our analysis finds that resource contention is similar across communities: most torrents operate under some resource contention, while one quarter of torrents has no contention. For this investigation, we assume that BitTorrent’s tit- for-tat mechanism works efficiently and leechers that pro- vide more bandwidth are prioritized in the torrent (please refer to Legout et al. [20] for experimental evidence). This implies that when there is resource contention in a torrent, there is a positive correlation between the upload and download speed of leechers. To test the existence of re- source contention we thus use the Kendall correlation coefficient to measure the degree of correlation between the rankings of upload and download speed of leechers. The strength of this correlation is directly related to the le- vel of resource contention in the torrent. Fig. 8 presents the Kendall’s correlation coefficient be- tween upload and download speeds of peers in all torrents of bitsoup and alluvion that had at least five peers. The dis- tribution of how these measures are correlated in the two communities is very similar. For most torrents, there is at least a mild (0.3) correlation between download and up- load speeds, indicating contention, but only for a small proportion of torrents the correlation is strong (P0.6). Moreover, in one fourth of the torrents there is enough bandwidth for all peers to receive the service they demand irrespective of their contributions. The absence of resource contention in one fourth of all torrents is particularly relevant to the design of incentive mechanisms, as mechanisms based on prioritization are rendered irrelevant for these torrents. On the other hand, the existence of some degree of correlation between up- load and download speeds in most torrents suggests that contribution levels are not sufficient to meet the entire de- mand in the majority of torrents. Assuming most BitTor- rent users have asymmetric Internet connections, leechers’ demand can only be met if there are seeders in a torrent. Therefore, our results suggest that (i) seeder ser- vice is not enough to compensate the asymmetry in the Internet connections of leechers; and (ii) communities could benefit from higher levels of contribution of upload bandwidth or seeding time. We also note that the levels of resource contention do not change significantly across communities. This is evi- dence that the higher seeding times seen in bitsoup do not dramatically change the relation between demand and supply in this community. Lastly, the range of resource contention levels found in bitsoup and alluvion agree with the observation by Locher et al. [22] that sometimes download speed in a torrent is related to upload bandwidth contributed, while sometimes it is not. Our data, however, quantifies this phenomenon. Izal et al. [19] observed a positive correlation between lee- cher upload and download speeds in a highly popular tor- rent. Our observations indicate how this correlation varies in a large, heterogeneous communities. 7. Final remarks The characterization of a computational system must consider four aspects: the system’s design, its implementa- tion, the resources on which it runs, and the workload it serves. This work focuses on the latter two aspects in the context of BitTorrent commons-based content-sharing communities. In particular, it characterizes resource de- mand, resource supply and their relationship in three Bit- Torrent communities. Our results have broad impact on the design of BitTorrent extensions, on the design of com- plementary mechanisms and on the study of this system. The results related to resource demand (i) point to the design of cache mechanisms for BitTorrent that leverage the peculiar popularity distributions identified in this study; and (ii) provide an accurate model for reasoning about and synthesizing the popularity of torrents over time. In particular, the results strongly suggests that future researchers should consider the model introduced in this 0 0.2 0.4 0.6 0.8 1 -0.3 0 0.3 0.6 0.9 P[X<x] Corr. between up and download bandwidth of peers in torrent alluvion bitsoup Fig. 8. CDF of the Kendall correlation coefficient between upload and download bandwidths of the peers in each torrent in tau30. N. Andrade et al. / Computer Networks 53 (2009) 515–527 525
  12. 12. study, as the commonly referred to Poisson process is not accurate for current BitTorrent usage. The characterization of resource supply motivates com- munities to identify and nurture their heavy contributors and community resource allocation mechanisms to be developed based on simple information. The investigation also identifies that users that contribute the most in the system are those that provide more bandwidth and reveals some redundancy in the set of users that provide most of the resources in the community, providing insight on the robustness of these communities. The analysis of the relation between resource demand and supply provides a novel picture of content-sharing via BitTorrent, where users that provide most of the re- sources are not altruistic. Instead, they generate a propor- tional demand. Our results also quantify one dimension of the quality of service achieved by BitTorrent communities and suggest that service should be improved mostly on small torrents. Additionally, the investigation of resource contention presents the typical regime of operation of the system with respect to the level of contention in the system, which is a valuable information for designers of re- source allocation mechanisms. Finally, this study contributes to the methodology for experimental studies of BitTorrent content-sharing com- munities, summing up good practices relevant for BitTor- rent data analysis to assess information loss due to sampling and to avoid biased estimations. Our results suggest a number of avenues for future work. Besides the caching and resource allocation investi- gations mentioned above, our results motivate further study of the different characteristics of torrent popularity that result in the fast drop of peer joins over time; simula- tion studies that consider the effect of the long tail of peer join rates; and a further investigation of the factors that influence the request success rate in different communities. Moreover, future work should extend the breadth of this characterization. Although our characterization used traces of up to 68 days and three communities, it provides a limited view of current BitTorrent usage. In particular, fu- ture studies should focus on torrents that survive over ex- tended periods of time, use accurate user identification and consider other metrics to gauge the quality of service pro- vided in similar communities. Characterizing more com- munities of different sizes and potentially different user habits is also still necessary if we are to better understand how the human factor drives content distribution on the Internet and to design mechanisms that better serve humans. Acknowledgements The authors would like to thank the Umass Trace repos- itory for making the alluvion trace available and Jaindson Santana and Flavio Santos for valuable help in processing the traces. Francisco Brasileiro thanks the support received from CNPq/Brazil (grant 309033/2007-1). Elizeu Santos- Neto is partially supported by the British Columbia Innova- tion Council/Canada. References [1] E. Adar, B.A. Huberman, Free riding on Gnutella, First Monday 5 (2000) 10. [2] K. Anagnostakis, F. Harmantzis, S. Ioannidis, M. Zghaibeh, On the impact of practical P2P incentive mechanisms on user behavior, Working Paper 06-14, NET Institute, 2006. [3] N. Andrade, M. Mowbray, A. Lima, G. Wagner, M. Ripeanu, Influences on cooperation in BitTorrent communities, in: Proceeding of the 2005 ACM SIGCOMM Workshop on Economics of Peer-to-Peer Systems, New York, NY, USA, 2005, pp. 111–115. [4] N. Andrade, E. Santos-Neto, F. Brasileiro, M. Ripeanu, Methodological notes on studying BitTorrent through tracker snapshots, Technical Report, Networked Systems Laboratory, 2008. <http://>. [5] A. Bellissimo, B.N. Levine, P. Shenoy, Exploring the use of BitTorrent as the basis for a large trace repository, Technical Report 04-41, University of Massachusetts, 2004. [6] Y. Benkler, Sharing nicely: on shareable goods and the emergence of sharing as a modality of economic production, The Yale Law Journal 114 (2004) 273–358. [7] A. Bharambe, C. Herley, V. Padmanabhan, Analyzing and improving a BitTorrent network’s performance mechanisms, in: Proceedings of the INFOCOMM, 2006. [8] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, J. Wiener, Graph structure in the web, Computer Networks 33 (1) (2000) 309–320. [9] K. Burnham, D. Anderson, Multimodel inference: understanding AIC and BIC in model selection, Sociological Methods and Research 33 (2) (2004) 261–304. [10] CacheLogic, P2p in 2005, 2005. < pages/research/p2p2005.php>. [11] M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, S. Moon, I tube, you tube, everybody tubes: analyzing the world’s largest user generated content video system, in: Proceedings of the Seventh ACM SIGCOMM Conference on Internet Measurement, San Diego, CA, USA, ACM, 2007, pp. 1–14. [12] A.L. Chow, L. Golubchik, V. Misra, Improving BitTorrent: a simple approach, in: Proceedings of the IPTPS, 2008. [13] B. Cohen, Incentives build robustness in BitTorrent, in: Proceedings of the Workshop on Economics of Peer-to-Peer Systems, Berkeley, CA, USA, June 2003. [14] S. Glassman, A caching relay for the world wide web, Computer Networks and ISDN Systems 27 (2) (1994) 165–173. [15] K.P. Gummadi, R.J. Dunn, S. Saroiu, S.D. Gribble, H.M. Levy, J. Zahorjan, Measurement, modeling, and analysis of a peer-to-peer file-sharing workload, in: SOSP’03: Nineteenth ACM Symposium on Operating Systems Principles, 2003, pp. 314–329. [16] L. Guo, S. Chen, Z. Xiao, E. Tan, X. Ding, X. Zhang, Measurements, analysis, and modeling of BitTorrent-like systems, in: Proceedings of the ACM SIGCOMM/USENIX IMC, October 2005, pp. 19–21. [17] S.B. Handurukande, A.-M. Kermarrec, F.L. Fessant, L. Massoulié, S. Patarin, Peer sharing behaviour in the eDonkey network, and implications for the design of server-less file sharing systems, in: Proceedings of the EuroSys’06, New York, NY, 2006, pp. 359–371. [18] D. Hughes, G. Coulson, J. Walkerdine, Freeriding on Gnutella revisited: the Bell Tolls, IEEE Distributed Systems Online 6 (2005) 6. [19] M. Izal, G. Urvoy-Keller, E.W. Biersack, P. Felber, A.A. Hamra, L. Garcés-Erice, Dissecting BitTorrent: five months in a Torrent’s lifetime, in: Proceedings of the Passive and Active Measurements, Antibes Juan-les-Pins, France, April 2004. [20] A. Legout, N. Liogkas, E. Kohler, L. Zhang, Clustering and sharing incentives in BitTorrent systems, in: Proceedings of SIGMETRICS, 2007, pp. 301–312. [21] N. Liogkas, R. Nelson, E. Kohler, L. Zhang, Exploiting BitTorrent for fun (but not profit), in: Proceedings of the IPTPS, February 2006. [22] T. Locher, P. Moor, S. Schmid, R. Wattenhofer, Free riding in BitTorrent is cheap, in: Proceedings of the HotNets. [23] M. Piatek, T. Isdal, A. Krishnamurthy, T. Anderson, One hop reputations for peer to peer file sharing workloads, in: Proceedings of the NSDI, 2008. [24] F.L. Piccolo, G. Neglia, G. Bianchi, The effect of heterogeneous link capacities in BitTorrent-like file sharing systems, in: Proceedings of the Hot-P2p, Los Alamitos, CA, USA, IEEE Computer Society, 2004, pp. 40–47. [25] J.A. Powelse, P. Garbacki, D.H.J. Epema, H.J. Sips, Measurement study of the BitTorrent peer-to-peer file-sharing system, Technical Report PDS-2004-003, Delft U. Technology, 2004. 526 N. Andrade et al. / Computer Networks 53 (2009) 515–527
  13. 13. [26] D. Qiu, R. Srikant, Modeling and performance analysis of BitTorrent- like peer-to-peer networks, in: Proceedings of the SIGCOMM, August 2004, pp. 367–378. [27] M. Ripeanu, M. Mowbray, N. Andrade, A. Lima, Gifting technologies: a BitTorrent case study, First Monday 11 (2006) 11. [28] D. Roselli, J.R. Lorch, T.E. Anderson, A comparison of file system workloads, in: Proceedings of the USENIX Annual Technical Conference, Berkeley, CA, USA, 2000, p. 4. [29] O. Saleh, M. Hefeeda, Modeling and caching of peer-to-peer traffic, in: Proceedings of the ICNP, 2006, pp. 249–258. [30] S. Sinha, R.K. Pan, Econophysics and Sociophysics: Trends and Perspectives, Wiley–VCH, 2006, pp. 417–447 (ch. How a hit is born: the emergence of popularity from the dynamics of collective choice). [31] D. Stutzbach, R. Rejaie, Understanding churn in peer-to-peer networks, in: IMC’06: Proceedings of the Sixth ACM SIGCOMM on Internet Measurement, New York, NY, USA, ACM Press, 2006, pp. 189–202. [32] D. Stutzbach, D. Zappala, R. Rejaie, The scalability of swarming peer- to-peer content delivery, in: Proceedings of the NETWORKING, 2005, pp. 15–26. [33] A. Wierzbicki, N. Leibowitz, M. Ripeanu, R. Wozniak, v. . . Cache replacement policies for peer-to-peer file-sharing protocols, European Transactions on Telecommunications 15 (2004) 6. [34] D.M. Wilkinson, Strong regularities in online peer production, in: EC’08: Proceedings of the Ninth ACM Conference on Electronic Commerce, New York, NY, USA, ACM, 2008, pp. 302–309. [35] X. Yang, G. de Veciana, Service capacity of peer to peer networks, in: Proceedings of the INFOCOMM, 2004. Nazareno Andrade is a Ph.D. student at the Universidade Federal de Campina Grande, Brazil. He received a B.Tech. degree in Tele- matics from the Centro Federal de Educação Tecnológica da Paraíba, Brazil, in 2001 and an M.Sc. in Informatics from the Universidade Federal de Campina Grande, Brazil, in 2003. His research interests include peer-to-peer systems, grid computing and sharing. Elizeu Santos-Neto received a B.S. and M.S. in Computer Science from the Universidade Federal de Alagoas and the Universidade Federal de Campina Grande, respectively. He also worked as an Assistant Researcher at the OurGrid project ( and collaborated to the Virtual Workspaces pro- ject (, where he investigated topics in Grid Computing focused on distributed scheduling, resource allocation and virtualization technologies. Currently, he is a Ph.D. candidate at the University of British Columbia. His research interests are related to the characterization and mechanism design for large-scale distributed systems. Francisco Brasileiro is a Professor at the Universidade Federal de Campina Grande, Brazil. He received a B.S. degree in Computer Science from the Universidade Federal da Paraíba, Brazil, in 1988, an M.Sc. degree from the same University in 1989, and a Ph.D. degree in Computer Science from the Uni- versity of Newcastle upon Tyne, UK, in 1995. His research interests include dependability in distributed systems, grid computing and distributed algorithms and protocols. He is a member of the Brazilian Computer Society, the ACM, and the IEEE Computer Society. Matei Ripeanu received his Ph.D. degree in Computer Science from The University of Chicago in 2005. After a brief visiting period with Argonne National Laboratory, he joined the Electrical and Computer Engineering Department of the University of British Columbia as an Assistant Professor. He is broadly interested in distributed systems with a focus on self-organization and decentralized control in large-scale grid and peer-to-peer systems. He has published in major academic conferences on large-scale grid and peer-to- peer system characterization, on techniques to exploit the emergent characteristics of these systems, and on supporting scientific applications to run on these platforms. N. Andrade et al. / Computer Networks 53 (2009) 515–527 527