Resource demand and supply in BitTorrent content-sharing ...
Resource demand and supply in BitTorrent content-sharing communities
Nazareno Andrade a,b,*, Elizeu Santos-Neto b
, Francisco Brasileiro a
, Matei Ripeanu b
Universidade Federal de Campina Grande, Laboratorio de Sistemas Distribuidos, Av. Aprigio Veloso, 882, Bloco-CO, 58109970 Campina Grande, PB, Brazil
University of British Columbia, Vancouver, BC, Canada
a r t i c l e i n f o
Received 21 April 2008
Received in revised form 9 September 2008
Accepted 24 September 2008
Available online 21 November 2008
a b s t r a c t
BitTorrent is a widely popular peer-to-peer content distribution protocol. Unveiling pat-
terns of resource demand and supply in its usage is paramount to inform operators and
designers of BitTorrent and of future content distribution systems. This study examines
three BitTorrent content-sharing communities regarding resource demand and supply.
The resulting characterization is signiﬁcantly broader and deeper than previous BitTor-
rent investigations: it compares multiple BitTorrent communities and investigates
aspects that have not been characterized before, such as aggregate user behavior and
resource contention. The main ﬁndings are three-fold: (i) resource demand – a more
accurate model for the peer arrival rate over time is introduced, contributing to work-
load synthesis and analysis; additionally, torrent popularity distributions are found to
be non-heavy-tailed, what has implications on the design of BitTorrent caching mecha-
nisms; (ii) resource supply – a small set of users contributes most of the resources in
the communities, but the set of heavy contributors changes over time and is typically
not responsible for most resources used in the distribution of an individual ﬁle; these
results imply some level of robustness can be expected in BitTorrent communities and
directs resource allocation efforts; (iii) relation between resource demand and supply –
users that provide more resources are also those that demand more from it; also, the
distribution of a ﬁle usually experiences resource contention, although the communities
achieve high rates of served requests.
Ó 2008 Elsevier B.V. All rights reserved.
Four aspects must be analyzed to understand a compu-
tational system and improve its performance: its design, its
implementation, the resources on which it runs, and the
workload it serves. The ﬁrst two aspects directly determine
the efﬁciency of the system, while the latter two bound sys-
tem performance and the efﬁcacy of resource allocation
mechanisms. Moreover, while the design and implementa-
tion of a system can be analyzed in controlled conditions, a
characterization of typical workload and resource availabil-
ity normally requires monitoring in production settings.
This study focuses on advancing the characterization of
workload and resource availability of commons-based con-
tent distribution based on BitTorrent, a widely popular,
peer-to-peer content distribution protocol. Our analysis
of the data collected from three BitTorrent communities
leads to the understanding of the resources on which
peer-to-peer content distribution mechanisms typically
run and the workload they serve.
We interpret the resources on which BitTorrent runs as
its resource supply and the workload it serves as its resource
demand. In this perspective, this study investigates the im-
pact of resource supply and demand on BitTorrent’s perfor-
mance, on its resource allocation mechanisms, and on the
overhead it imposes on the underlying network infrastruc-
ture. The resulting analysis is relevant: (i) for users, as it
1389-1286/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved.
* Corresponding author. Address: Laboratório de Sistemas Distribuídos,
Av. Aprígio Veloso, 882, Bloco CO, CEP 58429-900, Campina-PB, Brazil.
Tel.: +55 83 33101365.
E-mail addresses: firstname.lastname@example.org, email@example.com
(N. Andrade), firstname.lastname@example.org (F. Brasileiro).
Computer Networks 53 (2009) 515–527
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/comnet
evaluates the quality of service currently available in
commons-based content distribution communities; (ii)
for community operators, since it characterizes the re-
sources on which the community depends and, in particu-
lar, the effect of an increasingly popular incentive
mechanism called sharing-ratio enforcement; (iii) for devel-
opers of content distribution technologies, as it documents
usage patterns which affect resource allocation mecha-
nisms; and (iv) for Internet infrastructure operators, as de-
mand and supply patterns deﬁne the load content
distribution poses on the network and how effective it is
to cache content to reduce operational costs.
The traces collected and characterized in this work al-
low us to draw a clearer picture of BitTorrent communities
than previous studies [19,25,5,16,31,3,27]. We are able,
ﬁrst, to compare system behavior across different commu-
nities, and second, to accurately model user behavior, as
some of our traces allow precise user identiﬁcation across
all activities a user may engage in a community. Accurately
tracing user behavior enables us to characterize the system
at a community level more precisely than previous studies
that focused on individual ﬁles or used tentative user iden-
tiﬁcation. Additionally, we discuss limitations of current
practices used by previous BitTorrent measurement stud-
ies, and document solutions to limit the shortcomings of
In summary, this paper extends previous characteriza-
tion studies in terms of breadth, as it compares system
behavior in several large communities (the largest having
more than 80,000 active users performing 1.7 million
downloads during our trace); depth, as it investigates novel
aspects of demand and supply, such as user behavior
across torrents and resource contention; and accuracy, as
it improves the methodology used by previous studies.
At a high level, our study pictures BitTorrent communi-
ties as content distribution systems which generally expe-
rience resource contention, but often operate on an
abundance of resources, and which rely on contributions
from a minority of users. Such minority, however, is not
composed of altruistic participants: users that contribute
more to the communities are also those that request more.
Furthermore, the set of major contributors is not static:
heavy contributors typically have this status for a limited
time. Finally, communities successfully serve the vast
majority of the received requests.
In more detail, this study ﬁnds that, from the resource
demand perspective, (i) ﬁle popularity in BitTorrent com-
munities deviates signiﬁcantly from the long-tailed item
popularity distribution on the Web, with direct implica-
tions on the design of caching mechanisms for BitTorrent
trafﬁc; and (ii) the request arrival rate for a ﬁle is not com-
prehensively modeled by previous proposals, what leads
us to provide a more accurate model for the arrival rate
From the resource supply perspective, our main ﬁnd-
ings are: (i) a few users contribute a majority of resources
at the community level, yet, at the individual data item le-
vel, contributions are considerably less concentrated; (ii)
peers which contribute more to the system are those that
devote more bandwidth to the system, and not those that
devote more time to distribute a ﬁle, and (iii) sharing-ratio
enforcement, a popular incentive mechanism deployed in
BitTorrent communities, leads to users investing more
time contributing to the community, but not to higher
Investigating the relationship between supply and de-
mand shows that: (i) in the community we can gauge the
correlation between users’ demand and contribution, users
that contribute more to the community are also those that
consume more from it, an observation that denotes a de-
gree of equity to this community; (ii) in all communities
studied, a high proportion of the requests is successfully
served, which is evidence that commons-based content
distribution provides a good level of quality of service
and that traditional content providers could reduce data
distribution costs if they are able to leverage users’ contri-
butions at similar levels; and (iii) resource contention var-
ies signiﬁcantly even within a single community: for three
quarters of the ﬁles distributed, there is at least a mild con-
tention for resources and the provision of more resources
could improve quality of service; for the remaining quar-
ter, there are enough resources available to meet the
demand of all peers indistinctly, rendering prioritization-
based incentive mechanisms irrelevant.
The rest of this article is organized as follows. The next
section presents an overview of BitTorrent together with
related work. Section 3 details the communities studied
and our data collection and analysis method. The charac-
terizations of resource demand, supply and the relation be-
tween them are presented in Sections 4–6, respectively.
The last section brings our conclusions and ﬁnal remarks.
The main goal of BitTorrent is to enable scalable content
distribution. To this end, the load of distributing a ﬁle is
shared between the content publisher and those who
download it: the peers downloading and those which have
already downloaded the ﬁle supply bandwidth, the parts of
the ﬁle they already have, and content availability. This
scheme is currently widely popular: studies estimate that
about 30% of Internet trafﬁc was due to BitTorrent in
In BitTorrent parlance, a torrent is the group formed by
all peers taking part in the distribution of a ﬁle. To down-
load a ﬁle using BitTorrent, a user must join the torrent
that distributes it by contacting its tracker, the component
of the system that enables peer discovery and data loca-
tion. Peers that have an incomplete copy of the ﬁle are
called leechers, while peers that have ﬁnished downloading
and still participate in the torrent are called seeders. Lee-
chers both upload and download pieces of the ﬁle, while
seeders only upload them. Both leechers and seeders re-
port their progress periodically to the tracker. BitTorrent
has a built-in incentive mechanism through which each
leecher prioritizes the leechers that provide it the best re-
cent download speed. Seeders do not share the same built-
in incentive mechanism, as they only upload .
Note that the content discovery and access control
mechanisms are external to the BitTorrent protocol, which
is focused on data transfer. These functionalities are usu-
ally provided through a web site. This solution leads to
516 N. Andrade et al. / Computer Networks 53 (2009) 515–527
segmented BitTorrent communities, centered around the
different sites that enable content location.
Studies based on modeling [26,35,16], simulation [7,32]
and experiments with BitTorrent software  established,
under controlled conditions, that BitTorrent is an efﬁcient
and scalable content distribution protocol. Although these
studies based on controlled scenarios help understand
BitTorrent behavior, a comprehensive characterization of
BitTorrent in production scenarios is necessary to comple-
Several studies of real world deployments provide valu-
able information on this perspective [19,25,5,16,31,3,
27,23]. However, these studies suffer from four shortcom-
ings which motivate our work: (i) they are unable to accu-
rately analyze aggregate user behavior at a community
level – due to inaccurate user identiﬁcation in the collected
data, (ii) their study of the relationship between resource
demand and supply is limited, (iii) they are restricted in
scope, as they analyze either a few torrents [19,31], a single
community [5,16,2] or a single snapshot of multiple com-
munities [3,27], and (iv) they have methodological limita-
tions related to the assessment of information loss or bias
in the sampling methods used.
This work addresses these issues by (a) obtaining and
analyzing a trace from a community which provides strong
user identiﬁcation, (b) broadening the scope of the charac-
terization to three different communities with more than
10,000 torrents and one million downloads, (c) discussing
in depth the implications of the sampling methods used,
and (d) analyzing the relationship between resource de-
mand and supply. The traces which address points (a)
and (b) and our approach to address point (c) are presented
in Section 3. Our subsequent BitTorrent characterization
addresses point (d). Throughout the rest of the document,
we compare in detail the results of our study with related
3. The data sets
This section presents the terminology used in the rest of
the paper, the data collection method, and the three Bit-
Torrent communities studied. Analyzing multiple commu-
nities is necessary as user behavior tends to vary across
communities [3,27]. Although studying the three selected
communities does not guarantee a deﬁnitive view over
user behavior, we claim that analyzing multiple communi-
ties and a larger set of torrents does contribute towards a
For the rest of this document, we use the following ter-
minology: we differentiate between users and peers. A user
is a participant in a BitTorrent community, which is ob-
served as a peer in each torrent she participates. This dis-
tinction is relevant because for some of the communities
studied, it is only possible to observe accurately peer
A peer joins a torrent the ﬁrst time it participates in it.
Each peer might have several sessions in the same torrent,
as it may go ofﬂine and come back online, and it leaves a
torrent when it departs from the torrent and does not
come back. The time a peer spends online after it ﬁnishes
the download and before it leaves the torrent is the peer’s
seeding time. The torrent start is determined by the ﬁrst
peer join event in that torrent. The torrent end is the time
when the last peer leaves the torrent. The lifetime of the
torrent is the period between its start and its end, and a
torrent is complete if its start and end happen within our
We consider a two-level view of a BitTorrent commu-
nity: the community level view characterizes the behavior
of users across all torrents they participate in and aggre-
gates metrics over all torrents in the community. The tor-
rent level is concerned with peer behavior in each torrent,
without aggregating this behavior to observe users. The
community level and torrent level views offer complemen-
tary view of the community: the former informs observing
users across different torrents, while the latter observes
primarily torrents. Also, this distinction is necessary in
our data analysis, as for some of the communities studied,
it is not possible to accurately track user behavior. In these
communities, we focus on analyzing resource demand and
supply at the torrent level.
3.2. The communities
The three BitTorrent communities studied are etree, bit-
soup and alluvion. etree (http://bt.etree.org) is a community
devoted to sharing recordings of live performances for
non-commercial purposes; alluvion (http://www.allu-
vion.org) is a community hosting user-generated media; ﬁ-
nally, bitsoup (http://www.bitsoup.org) is a community of
users that share all kinds of content.
Two features distinguish bitsoup from the other two
communities. First, it requires users to register with the
community website and tracks user behavior across tor-
rents. Second, it uses sharing-ratio enforcement (SRE) in
addition to BitTorrent’s built-in incentive mechanism to
boost resource contribution. SRE, also used by other com-
munities (e.g. http://www.nhltorrents.co.uk), works by
keeping a record of users’ resource consumption and con-
tribution across different torrents and penalizing the users
which do not contribute a minimum proportion of their
consumption across all torrents where they participate.
3.3. Data collection
This study uses a passive method to collect BitTorrent
traces. Data collection is done via crawling report pages
provided by the trackers of each community, instead of
deploying software on client machines. These reports con-
tain detailed information, per torrent, about all peers cur-
rently active in the system, such as peer’s downloaded
and uploaded amounts, for how long the peer is online
and whether the peer is a seeder.
Each crawling consists of thousands of HTTP requests to
the community’s web server. Thus, the frequency of data
collection must be moderate to minimize the crawling
overhead. The crawler executed every hour (Section 3.4
discusses the limitations of this sampling frequency in this
N. Andrade et al. / Computer Networks 53 (2009) 515–527 517
study). We experimented with snapshots every 30 min, but
the resulting load was seen as too high by communities’
To collect data from bitsoup and etree, we implemented
our own crawlers, while the alluvion data is available at the
UMass Trace Repository (http://traces.cs.umass.edu/). It is
worth noting that although the data from alluvion has been
analyzed before [5,16], we perform our analysis with an
improved methodology and analyze new dimensions of it.
Table 1 presents, for each community studied, the dura-
tion of the trace, the total number of torrents observed in
each trace, the average number of torrents alive at any
point in time, the total number of peers seen in the trace
and the average number of peers seen at any point in time.
The bitsoup community is considerably larger than the
other two studied. Nevertheless, as we show throughout
the rest of the paper, this does not cause major differences
in the properties of user behavior we consider.
3.4. Reconstructing torrent dynamics
Our traces consist of hourly snapshots of the state of
peers and torrents in a community. Before analysis, it is
necessary to reconstruct peer and torrent behavior over
time from the snapshots. This reconstruction process im-
plies three challenges, which we discuss next. Please refer
to our technical report  for further details about the
methods presented here.
The ﬁrst challenge in analyzing the traces is the use of
imprecise identiﬁcation by some trackers, an issue dis-
cussed by previous studies [5,16,31,19]. This study has
the advantage that one of the communities studied, bit-
soup, uses unique logins to track user behavior. This allows
us to estimate precisely the distribution of resource contri-
bution and consumption by users at the community level.
User identiﬁcation is imprecise in alluvion and etree, allow-
ing only heuristic-based identiﬁcation of peers at a torrent
level. For these two communities, we used heuristics simi-
lar to those reported in previous studies [5,16,19] to track
peer behavior given the imprecise identiﬁcation found in
the traces. For the sake of reproducibility, these heuristics
are detailed in our technical report.
A second difﬁculty arises from the crawling frequency.
As snapshots are taken periodically, information is lost
if the rate with which relevant events occur is higher than
the sampling period. To address this issue, we estimate the
information loss and bound it by studying only torrents in
which enough observations of the relevant events are
available. For that, we determine the likelihood of a peer
to join a torrent, download a ﬁle and leave the torrent be-
tween two snapshots. This likelihood is a function of the
time peers must stay in a torrent to download the ﬁle.
The time necessary to download the ﬁle is derived from
the size of the ﬁle distributed in the torrent and the distri-
bution of peers’ download bandwidth. Using this result, we
have estimated the amount of information loss for torrents
distributing ﬁles of different sizes and found that by ana-
lyzing only torrents that distribute ﬁles larger than
100 MB, it is possible to observe at least 90% of the peers
in the communities studied. For the remainder of this
study, therefore, we consider only these torrents in our
The third complication in analyzing the data sets results
from the limited trace duration. For some analysis, such as
the characterization of the request arrival process in tor-
rents, it is necessary to examine a sample of complete tor-
rents. Moreover, it is desirable that this sample reﬂects the
overall population of torrents in the community. However,
because data from each community is collected for a lim-
ited period (up to 68 days for bitsoup), one must take care
when sampling the complete torrents so as to produce an
unbiased sample. The reason is that considering only com-
plete torrents will admit proportionally more short tor-
rents than exist in the torrent population, as these have a
higher probability of occurring in the trace than longer
To avoid bias when studying complete torrents, we ap-
ply the create-based method proposed by Roselli et al. ,
which allows obtaining an unbiased sample of torrents
with a maximum duration s. We obtained samples of tor-
rents for s ¼ 8 days from the three communities and sam-
ples of torrents for s ¼ 30 days for alluvion and bitsoup, the
two communities for which we have longer traces. In the
rest of this paper these samples are referred as s8 and
Table 2 details the samples we consider given our
4. Characterizing resource demand
The ﬁrst part of this characterization focuses on the de-
mand generated by members of a BitTorrent community.
Understanding usage patterns is paramount to produce
Characteristics of the traces used.
Trace Duration Torrents Peers
Total Average Total Average
etree 10 days during March 2005 1589 835 66,588 4905
alluvion 50 days during October–December 2003 1528 278 227,096 7312
bitsoup 68 days during April–July 2007 13,741 6633 1,694,243 145,462
Characteristics of torrent samples considered.
Sample Torrents Peers
All s8 s30 All s8 s30
alluvion 1247 271 355 187,916 12,291 43,930
bitsoup 10,463 416 1123 1,351,806 8400 54,889
etree 284 124 – 11,788 1764 –
518 N. Andrade et al. / Computer Networks 53 (2009) 515–527
optimized designs of content distribution mechanisms. We
consider torrent joins (i.e., ﬁle requests) as indirectly
expressing user demand and focus on the following two
questions: (i) what is the distribution of torrent popularity,
as expressed by total number of torrent joins each torrent
receives; and (ii) what is the evolution of the rate of torrent
joins over time.
4.1. What is the content popularity distribution?
The popularity of a torrent (and implicitly of a content
item) is the number of torrent joins received during a time
interval. From our data, we can see both the popularity of
all torrents during the duration of the traces and the pop-
ularity of the complete torrents sampled. The former
shows how the interest of users is distributed over avail-
able content in a period, while the latter is concerned with
the total number of users which will join a torrent. Regard-
less of perspective, our main ﬁnding is the same: content
popularity in BitTorrent communities is not heavy-tailed.
Fig. 1 shows the popularity of all content during the en-
tire period of our measurements (left) and restricted to
complete torrents on s30 (right). The popularity distribu-
tion of torrents in the s8 sample have the same character-
istics. For all samples, the curves have similar shapes and
clearly deviate from a Zipf distribution, commonly referred
to when modeling popularity.
For all our samples, a Lognormal or a Weibull distribu-
tion ﬁts the empirical data well.1
These distributions are
distinct both from those found in peer-to-peer ﬁle-sharing
[15,29] and video streaming , but are similar with the
observed distribution of user activity across topics in four
online peer production systems observed by Wilkinson
 and with the popularity of ﬁlms as measured by their
box ofﬁce revenues .
A distinguishing feature in the distributions we observe
is that they are not heavy-tailed. The absence of a heavy-
tail has major implications on the design of caching
mechanisms. On the one side, for small cache sizes, these
popularity distributions lead to caches that are less effec-
tive (i.e., lower hit ratio) than for heavy-tailed distribu-
tions. On the other side, for large cache sizes, the cache
effectiveness can be much higher than for heavy-tailed dis-
tributions, since the percentage of unpopular items is
much lower in the trace we observe.
This stands in contrast with the design of caches for
Web pages, whose popularity distribution is heavy-tailed
[14,8] and suggests that caching mechanisms designed
for heavy-tailed distributions observed in peer-to-peer
ﬁle-sharing (e.g. [33,29]) should be revisited before being
applied to BitTorrent trafﬁc. We note that Belissimo et al.
documented that the popularity distribution of ﬁles devi-
ates from a Zipf distribution in the alluvion data we con-
sider . Our results expand this observation through a
wider sample that includes three different content-sharing
communities and suggest distributions which ﬁt the data.
4.2. How are torrent joins distributed over time?
A second dimension of user demand is revealed by the
distribution of joins over the torrents’ lifetimes. Our char-
acterization (i) reproduces previous results which show
that the join rate for a torrent drops rapidly after its start
and (ii) proposes a model for the evolution of join rate over
time that is more accurate than current state-of-the-art.
Previous studies report that the request arrival rate for a
torrent decreases rapidly over time after its start [5,25,16].
Guo et al.  report, based on the same alluvion trace we
use, that the request arrival rate for a torrent decreases
exponentially over time. We revisit their ﬁndings using
all three traces we have collected and beneﬁt from the in-
creased accuracy offered by the bitsoup trace which con-
tains accurate peer identiﬁcation.
Examining the curve of peer joins per day, we observe
that although an exponential function kðtÞ ¼ aeÀt=b
to accurately model a number of torrents, it fails to account
for a longer tail of joins that appears in a large number of
torrents. This effect is particularly perceivable in bitsoup
We ﬁnd that a function of the form k0
ðtÞ ¼ a0=ð1 þ btÞ,
for t 2 N better models a larger proportion of all torrents
in bitsoup while modeling torrents in alluvion and etree
similarly to the exponential function. As in the exponential
model, a0 represents the initial peer join rate and b is a fac-
tor that inﬂuences how fast this rate drops with time. Dif-
ferently from kðtÞ, however, in k0
ðtÞ the arrival rate
decreases slower and at different rates during the lifetime
of the torrent. The difference in the two models is further
illustrated in Fig. 2.
Torrent popularity rank
Torrent popularity rank
Fig. 1. Popularity rank of all torrents in the traces (left) and of torrents in the s30 sample (right).
We used QQ-plots to compare empirical and theoretical CDFs as visual
tests of goodness of ﬁt.
N. Andrade et al. / Computer Networks 53 (2009) 515–527 519
A comparison between the kðtÞ and k0
ðtÞ models can be
made through the difference in their Akaike’s Information
Criterion (AIC). This criterion quantiﬁes the ﬁt of an esti-
mated statistical model and can be used to compare how
well two models ﬁt a dataset. Depending on the difference
between the AIC for the two models, it is possible to assess
their relative merits . This comparison can result in con-
sidering both models to be adequate, or in evidence in fa-
vor of the use of one of them. We take a conservative
approach and consider that a model can be used unless
there is essentially no support for it in comparison with
the competing model. It is then possible to verify how of-
ten each model can be used to model the torrents in our
traces, considering that it can be replaced by the compet-
Table 3 summarizes the comparison for the torrents in
our traces that lasted for at least 5 days and had a mini-
mum of 10 peers. A small fraction (5–10%) of the torrents
in each trace ﬁts neither of our two models and was not in-
cluded in the table. For the remainder, no model is the
most adequate for all torrents and models have a similar
coverage for torrents in etree and alluvion. Nevertheless,
ðtÞ model ﬁts considerably better the bitsoup trace
particularly for the most popular torrents.
One possible explanation for the difference in model
adequacy is the scale of the bitsoup community. Bitsoup is
signiﬁcantly larger than the other two communities, which
might result in peers joining torrents for longer periods. It
is also possible that the heuristic peer identiﬁcation em-
ployed in the analysis of etree and alluvion inﬂuences the
peer joins observed in these communities, but our data
does not allow an evaluation of this potential inﬂuence.
In spite of the cause of the difference, however, our re-
sults support the k0
ðtÞ model as a valuable tool both when
reasoning about torrents and when synthesizing work-
loads. When reasoning, it offers a complement to the
exponential model, better explaining a number of torrents
and accounting for the phenomenon of longer tails in peer
join rates. For workload synthesis, it offers an accurate
representation of how the popularity of a torrent de-
creases over time. Moreover, the k0
ðtÞ model is particu-
larly suitable for modeling highly popular torrents, a
type of torrent that is often of interest in performance
Finally, the observed model has impact on simulation
studies. The decrease in peer join rates over time implies
that a pure Poisson process does not model the peer join
process well. Several studies (e.g. [21,26,24,12]) have re-
lied on this model and our result strengthens the need
for reconsidering them with more accurate models.
5. Characterizing resource supply
In a BitTorrent community, resources are contributed
by users. Their actions determine resource supply as users
(i) conﬁgure the maximum amount of bandwidth their cli-
ent can use for upload, (ii) determine the seeding time of
their clients by controlling when how long the client stays
online after it has ﬁnished a download, and (iii) decide to
deﬁnitely quit torrents they are seeding, thus stopping to
contribute to them.
This user behavior inﬂuences three aspects of the sys-
tem: (a) throughput, (b) content durability, and (c) upload
volume. Users contribute to system throughput by provid-
ing upload bandwidth. Content durability is inﬂuenced by
users’ seeding time. Finally, the upload volume is the result
of both providing upload bandwidth and spending time online
when there is demand for service.
The remaining of this section evaluates the contribution
users make to each of the aforementioned aspects of the
system (Sections 5.1 and 5.2); investigates whether it is
the upload bandwidth or the seeding time that better
determines the upload volume at both community and tor-
rent levels (Section 5.3); and ﬁnally investigates the
dynamics of the population of contributors in the system
0 5 10 15
Torrent age (days)
0 5 10 15 20
Torrent age (days)
0 5 10 15 20
Torrent age (days)
Fig. 2. Fitting of kðtÞ and k0
ðtÞ for three example torrents. The two torrents on the left are from bitsoup, while the rightmost one is from alluvion.
Comparison of the percentage of torrents in which each model was
equivalent or better than the competing mode. The percentage in the kðtÞ
column states the fraction of all torrents in which the kðtÞ model was as
good as or better than the k0
Sample # torrents kðtÞ (%) k0
etree s8 27 100 100
alluvion s30 194 67 65
bitsoup s30 858 40 79
joins < 50 406 45 82
50 6 joins 6 150 430 37 75
joins P 150 22 9 91
520 N. Andrade et al. / Computer Networks 53 (2009) 515–527
5.1. How are user contributions distributed?
Our analysis of user contribution shows that, at the
community level, contributions of all types are concentrated
on a small portion of contributors, yet, at the torrent level,
contributions are typically less concentrated.
The ﬁrst part of this analysis investigates the distribu-
tion of user contribution at the community level. Such
analysis is only possible in the bitsoup trace, since it is
the only trace with accurate user identiﬁcation. Fig. 3
shows that contributions are considerably concentrated
in this trace: the top 20% contributors provide around
80% of the contribution for all the types of contributions
considered. We note that although a minority of all users
provide a signiﬁcant share of resources, this share is much
lower than that reported in previous studies of Gnutella
[1,18] and eDonkey , two other peer-to-peer ﬁle-shar-
However, a high concentration at the community level
does not imply that contribution is also concentrated at
the torrent level. The investigation of this aspect uses the
samples of complete torrents from all communities. We fo-
cus on the top 20% contributors in each individual torrent
and measure the amount they contribute. Fig. 4 shows the
CDF of the proportion of contribution generated by peers in
the top 20% set: the proportion y of all torrents have x or
less of their resources contributed by the peers in their
top 20% set.
The concentration of contribution is less pronounced at
the torrent level than at the community level in all samples
of complete torrents. While at the community level, the
users that are the 20% top contributors are responsible
for 80% or more of the total seeding time, upload band-
width and upload volume; at the torrent level, the peers
that represent the top 20% of the contributors are respon-
sible for a similar proportion of all contribution only in a
small number of torrents. In particular, when considering
the upload volume, the top 20% contributors only rarely
contribute the majority of the uploaded volume. Upload
bandwidth and seeding time are usually more concen-
trated; yet it is not as concentrated as at the community
Focusing the analysis on the upload volume metric, it is
possible to gain further insight on how resources are pro-
vided: the considerable concentration of upload volume
contributions at the community level and the milder con-
centration at the torrent level together suggest that top
contributing users do not achieve this status as a result
of massive contributions in a small number of torrents. In-
stead, these users contribute in a large number of torrents
On the one hand, understanding that a small proportion
of users are responsible for most of the data transferred
motivates communities to highly value these users,
rewarding them so as to maintain their participation. On
the other hand, the relative balance in resource contribu-
tion at the torrent level can be useful regarding resource
allocation in the community. For example, this information
is useful when deciding, either centrally or by collective ac-
tion, in which torrents each user should seed. The lack of
concentration in contributions at the torrent level implies
that the number of peers contributing in a torrent is a rea-
sonable indicator for the level of contribution to be ex-
pected in that torrent. Our results suggest resource
allocation at the community level can use this simple
and cheap-to-obtain information to decide where to direct
Furthermore, the lack of an accentuated concentration
in resource contribution at the torrent level improves the
overall robustness of torrents. The more the contribution
0 0.2 0.4 0.6 0.8 1
Cumulative proportion of users
Fig. 3. Community level concentration of contributions in bitsoup ðs30Þ.
0 0.2 0.8 1
Proportion of contribution by
top 20% contributors
0 0.2 0.8 1
Proportion of contribution by
top 20% contributors
0 0.2 0.8 1
Proportion of contribution by
top 20% contributors
Fig. 4. Torrent level concentration of contributions.
N. Andrade et al. / Computer Networks 53 (2009) 515–527 521
is concentrated on a torrent, the more the service level in
that torrent depends on the individual behavior of a few
peers. More equitable contribution distribution leads to
torrents that are more robust to individual peer failure or
5.2. Do contribution levels vary across communities?
To examine regularities and peculiarities in user behav-
ior, it is necessary to compare the three types of peer con-
tributions across the communities. Recall that bitsoup
operates a sharing-ratio enforcement mechanism (SRE)
seeking to boost resource contributions. Our investigation
reveals that users seed for longer in the community that uses
this enforcement mechanism, although bandwidth contribu-
tion is similar across all communities.
The comparison of the three communities focuses on
contributions at the torrent level (a community level com-
parison among the three communities is not possible since
the upload volume is directly related to the size of the
community). Upload bandwidth and seeding time are com-
parable if they do not correlate with torrent characteristics.
Otherwise, observed differences in contributions might re-
sult from torrents’ peculiarities.
A correlation analysis shows that the upload bandwidth
allocated by a peer is unrelated to the characteristics of the
ﬁle being downloaded. Additionally, seeding time is not
correlated with the size of the ﬁle or the time the peer
takes to download it. This indicates that it is reasonable
to analyze peers’ upload bandwidth and seeding time
regardless of the torrent they participate in and to compare
them across communities.
Fig. 5 shows the distribution of seeding time and upload
bandwidth contributed by peers. Regarding the upload
bandwidth, in all communities, about only 5% of peers con-
tribute large amounts of bandwidth (over 100 KB/s), 20–
30% of peers do not contribute, and 40% contribute be-
tween 5 and 50 KB/s. Considering the seeding time, how-
ever, it is clear that a considerably larger proportion of
users seed for longer in bitsoup than in etree and alluvion.
We attribute this difference to the SRE mechanism em-
ployed in bitsoup, which encourages users to contribute
more. This conjecture is further supported by observations
made on a different set of communities by Andrade et al.
 and Ripeanu et al. .
Taken together, the observations that bandwidth levels
are similar across all communities while seeding is higher
in the presence of SRE brings a new perspective on the ef-
fects of this mechanism. Our analysis unveils that users
typically try to increase their contribution levels by seed-
ing for longer and not by providing more bandwidth to
the system. This observation is of particular importance
for designers and operators of incentive mechanisms for
BitTorrent, as it provides evidence of which resource users
will invest as a response to incentives to increase their up-
Finally, regardless of the difference in seeding behavior
across communities, the seeding time distribution is con-
siderably skewed in all of them. This regularity supports
the conjecture of Powelse et al.  that the behavior of
individual users is more relevant to determine the longev-
ity of a torrent than the number of seeders online at an in-
stant: a ﬁle is likely to be available for longer if there is a
long-standing seeder than if there are several short-lived
ones at a given moment.
5.3. What determines the upload volume: seeding time or
This section investigates what factor is determinant in
explaining the upload volume of a user at the community
level and of a peer at the torrent level. The goal is to ﬁnd
whether the upload bandwidth or the seeding time better
explains the upload volume. The results suggest that the
bandwidth is a better predictor for the upload volume at both
levels, as opposed to the seeding time.
The study at the community level can only be done with
the bitsoup trace. At the torrent level, we investigate the s30
samples if bitsoup and alluvion. We analyze the correlations
between three variables: (i) the volume uploaded, normal-
ized by the amount downloaded; (ii) the estimated upload
bandwidth; and (iii) the time spent online in each torrent.
The normalization is used to avoid the effect of ﬁle size on
the measurements: peers are likely to leech for longer on
1 10 100 1000
Seeding time (h)
1 10 100 1000
Peer bandwidth (KB/s)
Fig. 5. CDFs of seeding time and upload bandwidth provided by peers. Omitted samples behave similarly.
522 N. Andrade et al. / Computer Networks 53 (2009) 515–527
larger ﬁles and because of that, to upload more data. Also,
the p-value for all results we report in this section are low-
er than 2:2 Â 10À16
A regression analysis shows that, at the torrent level,
the amount contributed by each peer on a torrent has a
higher correlation with its bandwidth (R2
¼ 0:50 in bitsoup
¼ 0:39 in alluvion), than to the time online
¼ 0:03 in bitsoup and R2
¼ 0:008 in alluvion), after log
transformations on the variables.
At the community level, a similar analysis indicates an
even stronger correlation between the user’s upload vol-
ume and its estimated bandwidth ðR2
¼ 0:50Þ, compared
to the correlation between the user’s upload volume and
the time she spent online ðR2
Therefore, the heavy contributors at the community and
torrent levels are those that own and make available more
bandwidth, as opposed to those that spend longer periods
seeding. Together with our analysis of seeding time in Sec-
tion 5.2, this result shows that although the main answer
from peers to the sharing-ratio enforcement incentives is
trying to contribute more by seeding for longer, the most
effective strategy for contributing more is actually lifting
bandwidth limitations, if these are used.
5.4. How stable is the population of heavy contributors?
The results in Section 5.1 show that, at the community
level, a few users provide most resources (20% of the users
provide 80% of the uploaded volume). In this section, we
refer to these users as heavy contributors and analyze
whether the set of heavy contributors in the community
is stable or changes over time. As it focuses on the commu-
nity level, this analysis uses only the bitsoup trace.
The analysis is performed as follows: the entire trace is
divided into several non-overlapping successive time win-
dows W1; W2; . . . of length t. For each window Wi, users are
ranked according to the total amount that they uploaded
within that window. A user is a heavy contributor in that
window if it is among the top 20% contributors. To evaluate
the changes in the set of heavy contributors, we use a win-
dow length of a week to account for seasonality within dai-
ly and weekly periods.
The results show that 30% of the user population belongs
to the set of heavy contributors at least once during the trace
duration. Furthermore, approximately only 1.8% of users
maintain the status of heavy contributors over the entire
trace duration. This shows that there is some degree of
churn in the heavy contributors’ set. However, although
our traces allow this assertion, their limited duration do
not allow for a precise evaluation of this churn.
Nevertheless, the observation that the set of heavy con-
tributors over time is not static suggests BitTorrent com-
munities are more robust than communities that depend
on a static set of users to provide most of its resources.
Although the performance of the system does rely on a
small number of users during periods of time, the results
indicate that some level of renewal occurs, as heavy con-
tributors are replaced over extended periods. This means
failure or departure of a heavy contributor has less impact
on the system than in the case of a static set of high
6. Relating resource demand and supply
Besides studying resource demand and supply sepa-
rately, understanding their relation unveils important data
about system functioning. In particular, this section an-
swers the following questions: (i) are peers’ demand and
supply correlated? (Section 6.1); (ii) does resource supply
meet the observed demand? (Section 6.2); and (iii) is there
resource contention at the torrent level? (Section 6.3).
6.1. Are heavy contributors heavy consumers too?
Our analysis of resource supply shows that, at the com-
munity level, a minority of users is responsible for most re-
sources contributed. This picture naturally leads to the
question of whether this set of users act as servers to a
majority that behaves mostly as consumers. The analysis
in this section shows that this is not the case in bitsoup,
the community where we can evaluate user behavior. In-
stead, users who are heavy contributors are also heavy
Fig. 6 shows a scatter plot of uploaded and downloaded
volumes of users in bitsoup. The color of each point repre-
sents the number of torrents a user participates: fewer tor-
rents yield darker points. The logarithms of upload and
download volumes are linearly correlated, with a Pearson
correlation coefﬁcient of 0.77, revealing that users that
contribute more are also those that consume more from
the community. Moreover, the color gradient shows that
users who are the heavy contributors and heavy consum-
ers are those that participate in more torrents. Finally,
the correlation between upload and download volumes is
consistent for peers irrespective of their activity level.
These observations portray bitsoup as an equitable shar-
ing system. Our traces do not allow us to determine if this
is a result of BitTorrent’s built-in incentive mechanism or
of the SRE employed in this community, but our results
underpin the scalability of communities adopting bitsoup’s
model. If contributions are proportional to consumption,
then resource contention levels and, thus, service provision
levels, are not affected by the scale of the community. In
the absence of a similar correlation, growth in the popula-
tion of users could lead to increasing levels of contention
for the available resources.
Fig. 6. Churn in bitsoup.
N. Andrade et al. / Computer Networks 53 (2009) 515–527 523
From a different perspective, our results show that
there is virtually no free-riding in bitsoup: the norm is that
users do not consume signiﬁcantly more than their fair
share of the community’s resources. This is different from
user behavior reported in studies of Gnutella and eDonkey,
where high levels of free-riding were documented
[1,18,17]. Nevertheless, this comparison should be taken
with caution, as analyses of Gnutella and eDonkey are
based on the assumption that all users consume from the
system while it is observed that only a few contribute.
Our analysis provides reference results for a future and fair
comparison that tracks user consumption in ﬁle-sharing
networks and considers as free-riders only the users that
consume more than their fair share of the system’s
6.2. Does the resource supply meet the observed demand?
This section concentrates on understanding whether re-
source supply is adequate to ensure system’s liveness by
investigating the proportion of requests are successfully
served. Overall, our evaluation shows that in all three com-
munities, the great majority of requests is successfully served.
In the analyzed traces, failed download requests are ob-
served when a peer is left as a leecher in a torrent and no
seeder joins this torrent after that. We use the notation st
and sc for the fraction of requests that succeed in a torrent
and in a community, respectively.
Fig. 7 (left) shows the CDF of st for all torrents observed
in our traces. Considering only complete torrents, we ob-
serve similar results across all communities: most torrents
serve virtually all the requests they receive (st is larger
than 0.99 for 97% of the torrents in bitsoup and in etree,
and for 60% of the torrents in alluvion). Nevertheless, a
small fraction of the torrents have most of their requests
However, measuring the proportion of served requests
at the torrent level does not give an accurate picture of
the proportion of requests which are served in a BitTorrent
community over a period of time. In fact, a more in-depth
analysis of the failed requests reveals that the majority of
the torrents which have a high proportion of failed re-
quests are those that receive less than 20 requests. These
torrents are a minority and serve a small proportion of
A complementary perspective is then to analyze sc.
Fig. 7 (right) shows the 95% conﬁdence intervals of sc for
the three communities. We observe that the overall pro-
portion of served requests in the three BitTorrent commu-
nities studied is high: considering all requests seen, sc is
not statistically different from 1 for bitsoup and etree, and
is larger than 0.98 for alluvion. Considering samples of
complete torrents, however, sc is signiﬁcantly lower in allu-
vion when compared to bitsoup and etree.
One interesting perspective is that our results do not al-
low a statistical distinction between sc for bitsoup and
etree. This is an indication that the sharing-ratio enforce-
ment (SRE) is not necessary for etree to achieve a rate of
served requests which is equivalent to bitsoup. On the
other hand, it is not possible to infer the ineffectiveness
of the SRE to improve sc, as bitsoup serves signiﬁcantly
more requests than alluvion. However, the similar levels
of contributions in etree and alluvion observed in Section
5 in conjunction with the high sc of etree suggest that some
peculiarity of alluvion is the cause of its lower service
If this conjecture holds, it would imply that
although the SRE might lead to more seeding, it is not nec-
essary for communities similar to etree or alluvion to
achieve a high quality of service in the metric we consider.
The results in this section also provide insight on the
quality of service provided by commons-based content dis-
tribution communities like those supported by BitTorrent.
These communities are loosely organized as decentralized
peer-production systems  coordinated through loose so-
cial relationships and still manage to serve all or nearly all
requests they receive. Our results provide quantitative evi-
dence of the effectiveness of the commons-based approach
as a viable alternative for its market counterparts in con-
Our interpretation of the service provided by BitTorrent
communities stands in contrast with those of Guo et al.
 and Piatek et al. . Guo et al. calculated the average
value of st in alluvion and interpreted its value (0.9) as a
sign of an overall unsatisfactory quality of service in
torrents. We observed that the distribution of st is
0 0.2 0.4 0.6 0.8 1
alluvion bitsoup etree
Fig. 7. Proportion of requests served in a torrent (left) and in the community as a whole (right). Arrows indicate the 95% conﬁdence intervals.
It is worth noting that removing the ﬁrst month of alluvion trace, we
observe a proportion of served requests comparable to bitsoup and etree.
However, further investigation on the causes of such behavior is left as
524 N. Andrade et al. / Computer Networks 53 (2009) 515–527
considerably skewed, which renders its mean a limited
assessment of typical torrent behavior. Indeed, 60% of tor-
rents in alluvion serve more than 99% of their requests. Fur-
thermore, our results show that service quality in alluvion
cannot be taken as the general quality of service in BitTor-
Piatek et al. reported that BitTorrent provides a poor
service to its users because 25% of 55,000 torrents ob-
served during their measurements were unavailable. The
difference between Piatek et al.’s conclusion and ours is
due to different deﬁnitions of availability: we account ser-
vice unavailability only when a peer tries to download a
ﬁle and fails, while Piatek et al. do not relate availability
6.3. Is there resource contention at the torrent level?
We now turn to investigate resource contention at the
torrent level, examining what is the typical regime of oper-
ation in BitTorrent communities. Resource contention ex-
poses mismatches between resource demand and supply
which affect the functioning of the system. When resource
demand is much larger than supply in a torrent, download
performance falls short of what consumers’ download
bandwidth allows for. If supply is much larger than de-
mand, providers’ resources are underutilized. Also, de-
mand and supply play a role in BitTorrent incentives:
prioritization is only relevant when resources are scarce
and cannot serve the demand of all consumers.
In summary, our analysis ﬁnds that resource contention
is similar across communities: most torrents operate under
some resource contention, while one quarter of torrents has
For this investigation, we assume that BitTorrent’s tit-
for-tat mechanism works efﬁciently and leechers that pro-
vide more bandwidth are prioritized in the torrent (please
refer to Legout et al.  for experimental evidence). This
implies that when there is resource contention in a torrent,
there is a positive correlation between the upload and
download speed of leechers. To test the existence of re-
source contention we thus use the Kendall correlation
coefﬁcient to measure the degree of correlation between
the rankings of upload and download speed of leechers.
The strength of this correlation is directly related to the le-
vel of resource contention in the torrent.
Fig. 8 presents the Kendall’s correlation coefﬁcient be-
tween upload and download speeds of peers in all torrents
of bitsoup and alluvion that had at least ﬁve peers. The dis-
tribution of how these measures are correlated in the two
communities is very similar. For most torrents, there is at
least a mild (0.3) correlation between download and up-
load speeds, indicating contention, but only for a small
proportion of torrents the correlation is strong (P0.6).
Moreover, in one fourth of the torrents there is enough
bandwidth for all peers to receive the service they demand
irrespective of their contributions.
The absence of resource contention in one fourth of all
torrents is particularly relevant to the design of incentive
mechanisms, as mechanisms based on prioritization are
rendered irrelevant for these torrents. On the other hand,
the existence of some degree of correlation between up-
load and download speeds in most torrents suggests that
contribution levels are not sufﬁcient to meet the entire de-
mand in the majority of torrents. Assuming most BitTor-
rent users have asymmetric Internet connections,
leechers’ demand can only be met if there are seeders in
a torrent. Therefore, our results suggest that (i) seeder ser-
vice is not enough to compensate the asymmetry in the
Internet connections of leechers; and (ii) communities
could beneﬁt from higher levels of contribution of upload
bandwidth or seeding time.
We also note that the levels of resource contention do
not change signiﬁcantly across communities. This is evi-
dence that the higher seeding times seen in bitsoup do
not dramatically change the relation between demand
and supply in this community.
Lastly, the range of resource contention levels found in
bitsoup and alluvion agree with the observation by Locher
et al.  that sometimes download speed in a torrent is
related to upload bandwidth contributed, while sometimes
it is not. Our data, however, quantiﬁes this phenomenon.
Izal et al.  observed a positive correlation between lee-
cher upload and download speeds in a highly popular tor-
rent. Our observations indicate how this correlation varies
in a large, heterogeneous communities.
7. Final remarks
The characterization of a computational system must
consider four aspects: the system’s design, its implementa-
tion, the resources on which it runs, and the workload it
serves. This work focuses on the latter two aspects in the
context of BitTorrent commons-based content-sharing
communities. In particular, it characterizes resource de-
mand, resource supply and their relationship in three Bit-
Torrent communities. Our results have broad impact on
the design of BitTorrent extensions, on the design of com-
plementary mechanisms and on the study of this system.
The results related to resource demand (i) point to the
design of cache mechanisms for BitTorrent that leverage
the peculiar popularity distributions identiﬁed in this
study; and (ii) provide an accurate model for reasoning
about and synthesizing the popularity of torrents over
time. In particular, the results strongly suggests that future
researchers should consider the model introduced in this
-0.3 0 0.3 0.6 0.9
Corr. between up and download
bandwidth of peers in torrent
Fig. 8. CDF of the Kendall correlation coefﬁcient between upload and
download bandwidths of the peers in each torrent in tau30.
N. Andrade et al. / Computer Networks 53 (2009) 515–527 525
study, as the commonly referred to Poisson process is not
accurate for current BitTorrent usage.
The characterization of resource supply motivates com-
munities to identify and nurture their heavy contributors
and community resource allocation mechanisms to be
developed based on simple information. The investigation
also identiﬁes that users that contribute the most in the
system are those that provide more bandwidth and reveals
some redundancy in the set of users that provide most of
the resources in the community, providing insight on the
robustness of these communities.
The analysis of the relation between resource demand
and supply provides a novel picture of content-sharing
via BitTorrent, where users that provide most of the re-
sources are not altruistic. Instead, they generate a propor-
tional demand. Our results also quantify one dimension of
the quality of service achieved by BitTorrent communities
and suggest that service should be improved mostly on
small torrents. Additionally, the investigation of resource
contention presents the typical regime of operation of
the system with respect to the level of contention in the
system, which is a valuable information for designers of re-
source allocation mechanisms.
Finally, this study contributes to the methodology for
experimental studies of BitTorrent content-sharing com-
munities, summing up good practices relevant for BitTor-
rent data analysis to assess information loss due to
sampling and to avoid biased estimations.
Our results suggest a number of avenues for future
work. Besides the caching and resource allocation investi-
gations mentioned above, our results motivate further
study of the different characteristics of torrent popularity
that result in the fast drop of peer joins over time; simula-
tion studies that consider the effect of the long tail of peer
join rates; and a further investigation of the factors that
inﬂuence the request success rate in different
Moreover, future work should extend the breadth of
this characterization. Although our characterization used
traces of up to 68 days and three communities, it provides
a limited view of current BitTorrent usage. In particular, fu-
ture studies should focus on torrents that survive over ex-
tended periods of time, use accurate user identiﬁcation and
consider other metrics to gauge the quality of service pro-
vided in similar communities. Characterizing more com-
munities of different sizes and potentially different user
habits is also still necessary if we are to better understand
how the human factor drives content distribution on the
Internet and to design mechanisms that better serve
The authors would like to thank the Umass Trace repos-
itory for making the alluvion trace available and Jaindson
Santana and Flavio Santos for valuable help in processing
the traces. Francisco Brasileiro thanks the support received
from CNPq/Brazil (grant 309033/2007-1). Elizeu Santos-
Neto is partially supported by the British Columbia Innova-
 E. Adar, B.A. Huberman, Free riding on Gnutella, First Monday 5
 K. Anagnostakis, F. Harmantzis, S. Ioannidis, M. Zghaibeh, On the
impact of practical P2P incentive mechanisms on user behavior,
Working Paper 06-14, NET Institute, 2006.
 N. Andrade, M. Mowbray, A. Lima, G. Wagner, M. Ripeanu, Inﬂuences
on cooperation in BitTorrent communities, in: Proceeding of the
2005 ACM SIGCOMM Workshop on Economics of Peer-to-Peer
Systems, New York, NY, USA, 2005, pp. 111–115.
 N. Andrade, E. Santos-Neto, F. Brasileiro, M. Ripeanu, Methodological
notes on studying BitTorrent through tracker snapshots, Technical
Report, Networked Systems Laboratory, 2008. <http://
 A. Bellissimo, B.N. Levine, P. Shenoy, Exploring the use of BitTorrent
as the basis for a large trace repository, Technical Report 04-41,
University of Massachusetts, 2004.
 Y. Benkler, Sharing nicely: on shareable goods and the emergence of
sharing as a modality of economic production, The Yale Law Journal
114 (2004) 273–358.
 A. Bharambe, C. Herley, V. Padmanabhan, Analyzing and improving a
BitTorrent network’s performance mechanisms, in: Proceedings of
the INFOCOMM, 2006.
 A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata,
A. Tomkins, J. Wiener, Graph structure in the web, Computer
Networks 33 (1) (2000) 309–320.
 K. Burnham, D. Anderson, Multimodel inference: understanding AIC
and BIC in model selection, Sociological Methods and Research 33 (2)
 CacheLogic, P2p in 2005, 2005. <http://www.cachelogic.com/home/
 M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, S. Moon, I tube, you tube,
everybody tubes: analyzing the world’s largest user generated
content video system, in: Proceedings of the Seventh ACM
SIGCOMM Conference on Internet Measurement, San Diego, CA,
USA, ACM, 2007, pp. 1–14.
 A.L. Chow, L. Golubchik, V. Misra, Improving BitTorrent: a simple
approach, in: Proceedings of the IPTPS, 2008.
 B. Cohen, Incentives build robustness in BitTorrent, in: Proceedings
of the Workshop on Economics of Peer-to-Peer Systems, Berkeley,
CA, USA, June 2003.
 S. Glassman, A caching relay for the world wide web, Computer
Networks and ISDN Systems 27 (2) (1994) 165–173.
 K.P. Gummadi, R.J. Dunn, S. Saroiu, S.D. Gribble, H.M. Levy, J.
Zahorjan, Measurement, modeling, and analysis of a peer-to-peer
ﬁle-sharing workload, in: SOSP’03: Nineteenth ACM Symposium on
Operating Systems Principles, 2003, pp. 314–329.
 L. Guo, S. Chen, Z. Xiao, E. Tan, X. Ding, X. Zhang, Measurements,
analysis, and modeling of BitTorrent-like systems, in: Proceedings of
the ACM SIGCOMM/USENIX IMC, October 2005, pp. 19–21.
 S.B. Handurukande, A.-M. Kermarrec, F.L. Fessant, L. Massoulié, S.
Patarin, Peer sharing behaviour in the eDonkey network, and
implications for the design of server-less ﬁle sharing systems, in:
Proceedings of the EuroSys’06, New York, NY, 2006, pp. 359–371.
 D. Hughes, G. Coulson, J. Walkerdine, Freeriding on Gnutella
revisited: the Bell Tolls, IEEE Distributed Systems Online 6 (2005) 6.
 M. Izal, G. Urvoy-Keller, E.W. Biersack, P. Felber, A.A. Hamra, L.
Garcés-Erice, Dissecting BitTorrent: ﬁve months in a Torrent’s
lifetime, in: Proceedings of the Passive and Active Measurements,
Antibes Juan-les-Pins, France, April 2004.
 A. Legout, N. Liogkas, E. Kohler, L. Zhang, Clustering and sharing
incentives in BitTorrent systems, in: Proceedings of SIGMETRICS,
2007, pp. 301–312.
 N. Liogkas, R. Nelson, E. Kohler, L. Zhang, Exploiting BitTorrent for
fun (but not proﬁt), in: Proceedings of the IPTPS, February 2006.
 T. Locher, P. Moor, S. Schmid, R. Wattenhofer, Free riding in
BitTorrent is cheap, in: Proceedings of the HotNets.
 M. Piatek, T. Isdal, A. Krishnamurthy, T. Anderson, One hop
reputations for peer to peer ﬁle sharing workloads, in: Proceedings
of the NSDI, 2008.
 F.L. Piccolo, G. Neglia, G. Bianchi, The effect of heterogeneous link
capacities in BitTorrent-like ﬁle sharing systems, in: Proceedings of
the Hot-P2p, Los Alamitos, CA, USA, IEEE Computer Society, 2004, pp.
 J.A. Powelse, P. Garbacki, D.H.J. Epema, H.J. Sips, Measurement study
of the BitTorrent peer-to-peer ﬁle-sharing system, Technical Report
PDS-2004-003, Delft U. Technology, 2004.
526 N. Andrade et al. / Computer Networks 53 (2009) 515–527
 D. Qiu, R. Srikant, Modeling and performance analysis of BitTorrent-
like peer-to-peer networks, in: Proceedings of the SIGCOMM, August
2004, pp. 367–378.
 M. Ripeanu, M. Mowbray, N. Andrade, A. Lima, Gifting technologies:
a BitTorrent case study, First Monday 11 (2006) 11.
 D. Roselli, J.R. Lorch, T.E. Anderson, A comparison of ﬁle system
workloads, in: Proceedings of the USENIX Annual Technical
Conference, Berkeley, CA, USA, 2000, p. 4.
 O. Saleh, M. Hefeeda, Modeling and caching of peer-to-peer trafﬁc,
in: Proceedings of the ICNP, 2006, pp. 249–258.
 S. Sinha, R.K. Pan, Econophysics and Sociophysics: Trends and
Perspectives, Wiley–VCH, 2006, pp. 417–447 (ch. How a hit is
born: the emergence of popularity from the dynamics of collective
 D. Stutzbach, R. Rejaie, Understanding churn in peer-to-peer
networks, in: IMC’06: Proceedings of the Sixth ACM SIGCOMM on
Internet Measurement, New York, NY, USA, ACM Press, 2006, pp.
 D. Stutzbach, D. Zappala, R. Rejaie, The scalability of swarming peer-
to-peer content delivery, in: Proceedings of the NETWORKING, 2005,
 A. Wierzbicki, N. Leibowitz, M. Ripeanu, R. Wozniak, v. . . Cache
replacement policies for peer-to-peer ﬁle-sharing protocols,
European Transactions on Telecommunications 15 (2004) 6.
 D.M. Wilkinson, Strong regularities in online peer production, in:
EC’08: Proceedings of the Ninth ACM Conference on Electronic
Commerce, New York, NY, USA, ACM, 2008, pp. 302–309.
 X. Yang, G. de Veciana, Service capacity of peer to peer networks, in:
Proceedings of the INFOCOMM, 2004.
Nazareno Andrade is a Ph.D. student at the
Universidade Federal de Campina Grande,
Brazil. He received a B.Tech. degree in Tele-
matics from the Centro Federal de Educação
Tecnológica da Paraíba, Brazil, in 2001 and an
M.Sc. in Informatics from the Universidade
Federal de Campina Grande, Brazil, in 2003.
His research interests include peer-to-peer
systems, grid computing and sharing.
Elizeu Santos-Neto received a B.S. and M.S. in
Computer Science from the Universidade
Federal de Alagoas and the Universidade
Federal de Campina Grande, respectively. He
also worked as an Assistant Researcher at the
OurGrid project (http://www.ourgrid.org) and
collaborated to the Virtual Workspaces pro-
ject (http://workspace.globus.org), where he
investigated topics in Grid Computing focused
on distributed scheduling, resource allocation
and virtualization technologies. Currently, he
is a Ph.D. candidate at the University of British
Columbia. His research interests are related to the characterization and
mechanism design for large-scale distributed systems.
Francisco Brasileiro is a Professor at the
Universidade Federal de Campina Grande,
Brazil. He received a B.S. degree in Computer
Science from the Universidade Federal da
Paraíba, Brazil, in 1988, an M.Sc. degree from
the same University in 1989, and a Ph.D.
degree in Computer Science from the Uni-
versity of Newcastle upon Tyne, UK, in 1995.
His research interests include dependability
in distributed systems, grid computing and
distributed algorithms and protocols. He is a
member of the Brazilian Computer Society,
the ACM, and the IEEE Computer Society.
Matei Ripeanu received his Ph.D. degree in
Computer Science from The University of
Chicago in 2005. After a brief visiting period
with Argonne National Laboratory, he joined
the Electrical and Computer Engineering
Department of the University of British
Columbia as an Assistant Professor. He is
broadly interested in distributed systems with
a focus on self-organization and decentralized
control in large-scale grid and peer-to-peer
systems. He has published in major academic
conferences on large-scale grid and peer-to-
peer system characterization, on techniques to exploit the emergent
characteristics of these systems, and on supporting scientiﬁc applications
to run on these platforms.
N. Andrade et al. / Computer Networks 53 (2009) 515–527 527