Analysis of hybrid P2P overlay network topologyDocument Transcript
Available online at www.sciencedirect.com
Computer Communications 31 (2008) 190–200
Analysis of hybrid P2P overlay network topology
Chao Xie , Guihai Chen c, Art Vandenberg d, Yi Pan b,*
Department of Computer Science, University of Wisconsin-Madison, Madison, WI, 53706-1685, USA
Department of Computer Science, Georgia State University, Atlanta, GA 30302-3994, USA
State Key Laboratory of Novel Software, Nanjing University, Nanjing 210093, China
Department of Information Systems and Technology, Georgia State University, Atlanta, GA 30302-3968, USA
Available online 19 August 2007
Modeling peer-to-peer (P2P) networks is a challenge for P2P researchers. In this paper, we provide a detailed analysis of large-scale
hybrid P2P overlay network topology, using Gnutella as a case study. First, we re-examine the power-law distributions of the Gnutella
network discovered by previous researchers. Our results show that the current Gnutella network deviates from the earlier power-laws,
suggesting that the Gnutella network topology may have evolved a lot over time. Second, we identify important trends with regard to the
evolution of the Gnutella network between September 2005 and February 2006. Upon analyzing the limitations of the power-laws, we
provide a novel two-layered approach to study the topology of the Gnutella network. We divide the Gnutella network into two layers,
namely the mesh and the forest, to model the hybrid and highly dynamic architecture of the current Gnutella network. We give a detailed
analysis of the two-layered overlay and present six power-laws and one empirical law to characterize the topology. Using the two-layered
approach and laws proposed, realistic topologies can be generated and the realism of artiﬁcial topologies can be validated.
Ó 2007 Elsevier B.V. All rights reserved.
Keywords: Peer-to-peer; Overlay network; Network topology; Power-law
1. Introduction of algorithms and facilitate design of more eﬃcient proto-
cols that take advantage of topology properties. Third, we
Modeling the topologies of peer-to-peer (P2P) networks can generate more accurate artiﬁcial topologies for simula-
is an important open problem. An accurate topological tion purposes. Furthermore, we can predict future trends
model can have signiﬁcant inﬂuence on P2P research. First, and thereby address potential problems in advance.
we can gain detailed insight into the nature of the underly- Previous researchers  and  tended to use power-laws
ing system. Second, the model can enable detailed analysis to characterize the topology of P2P networks. Recent
advances in P2P networks have resulted in hybrid architec-
This paper extends and supplants the earlier version of this paper
tures, represented by the success of Gnutella protocol 0.6
presented at IEEE GLOBECOM’06 .  and Kazaa . In this paper, we provide a detailed anal-
Guihai Chen’s work is supported by China NSF under Grant ysis of large-scale hybrid P2P network topology, giving
60573131, China Jiangsu Provincial NSF under Grant BK2005208, China results concerning major topology properties and main dis-
973 projects under Grants 2006CB303000 and 2002CB312002, and Nokia tributions. In our study, we choose Gnutella as a case
Bridging the World Program. Yi Pan’s work is supported in part by the
National Science Foundation (NSF) under Grants ECS-0196569, ECS-
study, as it has a large user community and open architec-
0334813, and CCF-0514750. Any opinions, ﬁndings, and conclusions or ture. Our work can be summarized by the following points.
recommendations expressed in this paper are those of the authors and do First, we re-examine the power-law distributions of the
not necessarily reﬂect the views of the NSF, China NSF or Nokia. Gnutella network discovered by previous researchers.
Corresponding authors. Tel.: +1 404 651 0649; fax: +1 404 463 9912. Our results show that the current Gnutella network devi-
E-mail addresses: firstname.lastname@example.org (C. Xie), email@example.com (G.
Chen), firstname.lastname@example.org (A. Vandenberg), email@example.com (Y. Pan).
ates from the earlier power-laws. This observation suggests
URLs: http://www.cs.wisc.edu/~cxie (C. Xie), http://www.cs.gsu.edu/ that the Gnutella network topology may have evolved a lot
pan (Y. Pan). over time.
0140-3664/$ - see front matter Ó 2007 Elsevier B.V. All rights reserved.
C. Xie et al. / Computer Communications 31 (2008) 190–200 191
Second, we identify important trends with regard to the Gnutella protocol 0.6  employs a hybrid architecture
evolution of the Gnutella network between September 2005 combining centralized and decentralized model. Servents
and February 2006. are categorized into leaf and ultrapeer. A leaf keeps only
As our primary contribution, we provide a novel two- a small number of connections to ultrapeers. An ultrapeer
layered approach to study the topology of the Gnutella maintains connections with other ultrapeers and acts as a
network. Due to the limitations of the power-laws, we proxy to the Gnutella network for the leaves connected
divide the Gnutella network into two layers, namely the to it. An ultrapeer only forwards a query to a leaf if it
mesh and the forest, to model the hybrid and highly believes the leaf can answer it, and leaves never relay que-
dynamic architecture of the current Gnutella network. ries between ultrapeers. Fig. 2 illustrates the topology of
We give a detailed analysis of the two-layered overlay the Gnutella 0.6 network. Protocol 0.6 is compatible with
and present six power-laws and one empirical law to char- protocol 0.4, which implies that the current Gnutella net-
acterize the topology. work can contain some fraction of nodes of former proto-
Finally, we focus on the generation of realistic topolo- col speciﬁcation 0.4.
gies and the validation of artiﬁcial topologies using our
approach and laws proposed.
The rest of this paper is organized as follows. Section 2
presents background and previous work. In Section 3, we
Power-laws have been found in numerous diverse ﬁelds
present our traces of the Gnutella network. In Section 4,
spanning sociological, geological, natural and biological
we re-examine the power-law distributions discovered by
systems. Power-laws of the form y µ xa enables a compact
previous researchers and identify the trends concerning
characterization of topologies through their exponents.
the evolution of Gnutella network. In Section 5, we analyze
Faloutsos et al.  discovered four power-laws characteriz-
the limitations of the power-laws and introduce our new
ing the topology of the Internet, while Magoni et al. 
two-layered approach to study the topology of Gnutella
found another four power-laws of the Internet.
network. In Section 6, we analyze the topological proper-
In [2,7,11], several power-laws were found with regard
ties of the mesh and present two power-laws concerning
to the topology of the Gnutella network. In 2002, Ripeanu
the mesh topology. In Section 7, we examine the topology
et al.  argued that the connection distribution of the
properties of the forest and provide one empirical law con-
more recent Gnutella network may follow a two-tier
cerning the tree size. In Section 8, we present to two two
power-law distribution. P2P studies usually assume that
power-laws concerning the overlay network as a whole
these power-laws characterize the topology of P2P net-
and discuss the practical uses of our approach and laws.
works and use synthetically generated topologies following
Finally, Section 9 concludes our work.
these power-laws [12–17].
2. Background and previous work
3. Our Gnutella Network Traces
2.1. Gnutella Protocol and the crawler
We developed a crawler to collect topology information
Gnutella protocol 0.4  employs a pure decentralized of the Gnutella network, taking advantage of message
model. In this model, individual nodes, also called servents communication mechanism of both protocol 0.4 and pro-
are equal in terms of functionality. They not only perform tocol 0.6. The crawler is based on the Limewire  open
server-side roles such as matching incoming queries against source client and performs a breadth ﬁrst searching on
their local resources and respond with applicable results, the network in parallel. It can discover more than
but also oﬀer client-side functions such as issuing queries 100,000 nodes in minutes.
and collecting search results. All servents are connected We can build the graph of nodes by analyzing the col-
to each other randomly. Fig. 1 illustrates the topology of lected data on the Gnutella network. We model two adja-
the Gnutella 0.4 network. cent nodes that have at least one connection between
Fig. 1. Topology of the Gnutella 0.4 Network. Fig. 2. Topology of the Gnutella 0.6 Network.
192 C. Xie et al. / Computer Communications 31 (2008) 190–200
Basic Statistics of the Gnutella Network
Stat. Data Ours  
091505 021106 V34206 V57926
Time 09–2005 02–2006 09–2003 10–2003 11–2000 12–2000
Nodes 107,205 118,925 34,206 57,926 992 1,125
Edges 118,187 130,612 43,958 80,276 2465 4080
l 6.4 7.9 5.4 5.8 3.7 3.3
Diam. 22 24 16 15 9 8
k 2.20 2.20 2.57 2.72 4.97 7.25
each other by an edge. We treat the Gnutella network as a
In this paper, we provide two traces of the Gnutella net-
work, namely the 091505 trace and the 021106 trace. Note
that we have studied the topology of the Gnutella network
from September 2005 until February 2006 and all the traces
we have gotten accord with the results given in this paper.
In Table 1, we present some basic statistics about our traces
and previous work [2,11]. In Table 1, l represents the aver-
age shortest distance and k represents the average degree.
4. Current Gnutella network topology
In this section, we examine the power-laws of the Gnu-
tella network described in previous literatures against our
two traces. The goal of our work is to ﬁnd out whether
the topology of the current Gnutella network accords with
the early power-laws.
We use linear regression to ﬁt a line in a set of two- Fig. 3. Log–log plot of the degree dv versus the rank rv in the sequence of
dimensional points using the least-square errors method. decreasing degree.
The validity of the approximation is quantiﬁed by the cor-
relation coeﬃcient ranging from À1.0 and 1.0. The absolute
patterns. On the one hand, the nodes with high rank are
value of the correlation coeﬃcient is ACC. An ACC value
of too small degree. This is because the Gnutella protocol
of 1.0 indicates perfect linear correlation. In general, the
0.6 imposes a limit on maximal connections of an ultra-
ACC level should be greater than 0.90 to validate linear
peer. On the other hand, there are too many nodes with
degree around 30, with the result that the curve breakouts
from the linear regression. This pattern suggests that
4.1. Rank distribution
ultrapeers in the Gnutella 0.6 network tend to have the
connection limit around 30.
In this section, we study the degrees of the nodes in the
Moreover, the 021106 graph is somewhat diﬀerent from
the 091505 graph. First, the nodes with high rank in the
Power-law of rank exponent R: The degree dv of a node v
former graph are of smaller degree compared with the
is proportional to the rank of the node rv to the power of a
counterparts in the latter, implying that protocol 0.6 is
constant R : d v / rR . The rank rv of a node v is deﬁned as
v eﬀectively replacing protocol 0.4. Secondly, the curve after
its index in the order of decreasing degree.
a degree of approximately 30 drops much more suddenly in
Jovanovic  found that the early Gnutella network fol-
the former graph than in the latter, which suggests that
lowed the above power-law with rank exponent of À0.98
ultrapeers tend to employ as many connections as they can.
and ACC of 0.94. For our two traces, the rank exponent
is À0.64268 and À0.60681 and ACC is 0.92178 and
0.88120 in chronological order as we see in Fig. 3. The 4.2. Degree Distribution
low ACC values imply that this power-law is relatively
weak in the 091505 graph and even invalid for the In this section, we study the distribution of the degrees
021106 graph. of the nodes. Note that the degree power law we present
Compared with a pure power-law distribution, the two in the current work is diﬀerent from the one in earlier work
graphs deviate from the linear regression with similar . However, they both refer to the same distribution. The
C. Xie et al. / Computer Communications 31 (2008) 190–200 193
diﬀerence is that the current work uses the cumulative Furthermore, in the 021106 graph, degrees in interval 5–
probability distribution function, while the earlier work 20 follow an almost constant distribution, which means
uses the probability distribution function. As a result, the there are too few ultrapeers with a degree in this interval.
exponents of the two power-laws diﬀer approximately by This conﬁrms our previous conclusion that ultrapeers try
one. The cumulative distribution is preferable because it to hold more connections up to the limit. The curve of
can be estimated in a statistically robust way. higher degree in the 021106 graph drops much more shar-
Power-law of degree exponent D: The complementary ply, which agrees with our previous comment that the Gnu-
cumulative distribution function (CCDF) Dd of a degree tella protocol 0.6 prevents ultrapeers from employing a
d, is proportional to the degree to the power of a constant large number of connections.
D : Dd / d D . The CCDF of a degree d is the percentage of
nodes that have degree greater than the degree d. 5. The two-layered approach
Jovanovic  showed degree exponent of À1.4 and ACC
of 0.96 for the early Gnutella network by probability distri- In this section, we ﬁrst discuss the limitations of the
bution. For our two traces, the degree exponent is power-laws and then present a new approach to study
À2.25926 and À2.31074 and ACC is 0.91744 and 0.87718 the topology of the Gnutella network.
in chronological order as we see in Fig. 4. Again, the low
ACC values imply that this power-law is relatively weak 5.1. Limitations of the power-laws
in the 091505 graph and even invalid for the 021106 graph.
Compared with a pure power-law distribution, the Previous researches  and  suggest two key causes
graphs share some common patterns. There are too many for power-law distributions in network topologies: incre-
nodes with degree around 30, and the resulting curves devi- mental growth and preferential connectivity. Incremental
ate from the linear regression. This is coincident with what growth refers to open networks that form by the continual
we found in rank distribution. addition of new nodes, and thus the gradual increase in the
size of the network. Preferential connectivity refers to the
tendency of a new node to connect to existing nodes that
are highly connected or popular.
The topology of the Gnutella network is highly
dynamic, since a node can join or leave the Gnutella net-
work at any time. More speciﬁcally, most leaves tend to
disconnect from the Gnutella network in several minutes
after they connect to the network. The transient life-time
of the leaves works against incremental growth. Moreover,
due to the hybrid architecture of Gnutella protocol 0.6 ,
a leaf keeps only a small number of connections to ultrap-
eers and cannot connect to other leaves. This limitation on
leaves also works against preferential connectivity, because
leaves can never become highly connected. Combining the
above factors, we can explain why the current Gnutella net-
work does not follow the early power-law distributions. It
is the limitations of the power-laws that make them inap-
propriate for modeling hybrid and highly dynamic
As we mentioned earlier, P2P studies usually use syn-
thetically generated topologies characterized by the early
power-laws. These topologies may not reﬂect properties
of current P2P networks. So there should be a new
approach to model current P2P networks.
5.2. Our approach
In our study, we propose a new two-layered approach to
model the topology of the current Gnutella network. We
split the Gnutella network into two layers, namely the
mesh and the forest.
Before we present the analysis of our approach, we pro-
vide below a few deﬁnitions. Note that Magoni et al. 
Fig. 4. Log–log plot of Dd versus the degree d. proposed some deﬁnitions to describe the AS network.
194 C. Xie et al. / Computer Communications 31 (2008) 190–200
We keep these deﬁnitions and modify them into the follow- With the knowledge of both the topology of the mesh
ing ones. Fig. 5 shows diﬀerent kinds of nodes in a sample and the topology of the forest, we can model the topology
graph. of the Gnutella network easily by merging these two layers.
• Cycle node: a node that belongs to a cycle (i.e. it is on a 6. Mesh topology analysis
closed path of disjoint nodes; in Fig. 5, there are eleven
cycle nodes). In this section, we study the topology properties con-
• Bridge node: a node which is not a cycle node and is on cerning the mesh in the Gnutella network. In Table 2, we
a path connecting 2 cycle nodes (in Fig. 5, there is one present some basic statistics about the mesh in our traces.
bridge node). In Table 2, p(m) represents the percentage of nodes in the
• In-mesh node: a node which is a cycle node or a bridge mesh, l represents average shortest distance, and k repre-
node (in Fig. 5, the mesh has twelve in-mesh nodes). sents average degree.
• In-tree node: a node which is not an in-mesh node (i.e. it
belongs to a tree; in Fig. 5, each tree has four in-tree 6.1. Mesh node rank exponent Rm
In this section, we study the degrees of the nodes in the
Mesh is the set of in-mesh nodes and forest is the set of mesh. We sort the nodes in the mesh in decreasing order of
in-tree nodes. degree d vm and deﬁne the mesh node rank rvm as the index of
the node in the sequence. We plot the ðd vm ; rvm Þ pairs in log-
• Branch node: an in-tree node of degree at least 2. log scale. The plots are shown in Fig. 6. The data values are
• Leaf node: an in-tree AS of degree 1. represented by points, while the solid lines represent the
• Root node: an in-mesh node which is the root of a tree. least-squares approximation.
• Relay node: a node having exactly 2 connections. The points of Fig. 6 are well approximated by the linear
• Border node: a node located on the diameter of the regression. The ACC is 0.96425 for the 091505 trace and
network. 0.96580 for the 021106 trace. This leads us to the following
power law and deﬁnition.
If we split the Gnutella network into the mesh and the Power-law 1 (Mesh node rank exponent): The degree d vm
forest, we can analyze the topological properties of the of a mesh node vm is proportional to the rank of the mesh
mesh and the forest, respectively. node rvm to the power of a constant Rm :
After careful comparison between Figs. 2 and 5, we can
d vm / r R m :
ﬁnd that the mesh in Fig. 5 is composed merely of ultrap-
eers and acts as the backbone of the Gnutella network.
Since ultrapeers are relatively stable and tend to stay in
the Gnutella network for a longer time, it can meet the Deﬁnition 1. Let us sort the mesh nodes of a graph in
requirement of incremental growth. Further more, since decreasing order of degree. We deﬁne the mesh rank
ultrapeers can connect to other ultrapeers, it can meet the exponent Rm to be the slope of the plot of the degrees of
requirement of preferential connectivity. Hence, the topol- the mesh nodes versus the rank of the nodes in log–log scale.
ogy of the mesh theoretically should comply with power-
laws (see Section 6 for detailed validation). On the other
hand, we can also obtain major topology properties and 6.2. Mesh node degree exponent Om
distributions of the forest (see Section 7). Note that it is
not necessary to have all ultrapeers in the mesh. In this section, we study the distribution of the degrees
of the nodes in the mesh. We deﬁne the frequency fd m of
a mesh node degree dm as the number of nodes in the mesh
with degree dm. We plot the (fd m ; d m ) pairs in log-log scale
in Fig. 7. In these plots, we exclude a small percentage of
nodes of higher degree that have frequency of one, but still
plot 99.9% of the total number of nodes. As we saw earlier,
Basic Statistics of the Mesh
Stat. Data 091505 021106
Nb of Nodes 16,487 11,852
p(m) 15.4% 10.0%
Nb of Edges 27,467 23,539
l 5.2 6.5
Diameter 14 17
k 3.33 3.97
Fig. 5. Diﬀerent kinds of nodes.
C. Xie et al. / Computer Communications 31 (2008) 190–200 195
Fig. 6. Log–log plot of the mesh node degree d mv versus the rank rmv in the Fig. 7. Log–log plot of frequency fd m versus the mesh node degree dm.
sequence of decreasing degree.
at least one vertex not in common . The distribution of
the higher degrees are described and captured by the mesh NSP is useful for evaluating the amount of redundant
rank exponent. edges involved in shortest path. Higher NSP values mean
The major observation of Fig. 7 is that the plots are that if one edge of a shortest path between a pair of nodes
approximately linear with ACC of 0.97171 for the 091505 is removed, there is still a probability for another shortest
trace and 0.96016 for the 021106 trace. We infer the follow- path of the same length to exist for this pair. We sort the
ing power-law and deﬁnition. pairs of in-mesh nodes in decreasing NSP npm and deﬁne
Power-law 2 (Mesh node degree exponent): The fre- the pair rank rpm as the index of the pair in the sequence.
quency fd m of a mesh node degree dm, is proportional to We plot the ðnpm ; rpm Þ pairs in log-log scale. The plots are
the degree to the power of a constant Om : shown in Fig. 8. Due to the enormous amount of node
fd m / d Om : pairs, we plot the ﬁrst 106 pairs only.
The points of Fig. 8 are well approximated by the linear
regression with ACC of 0.99157 for the 091505 trace and
0.99632 for the 021106 trace. Note that it seems that in
Deﬁnition 2. We deﬁne the mesh node degree exponent Om Fig. 8(a) a signiﬁcant portion of the upper left part of the
to be the slope of the plot of the frequency of the mesh curve goes oﬀ the straight line. However, this is a visual
node degrees versus the degrees in log–log scale. illusion. The dots in the lower right part of the curve are
much more denser than the dots in the upper left part,
resulting in a high ACC value all the same. This leads us
6.3. Mesh pair rank exponent P m to the following power law and deﬁnition.
Power-law 3 (Mesh pair rank exponent). The NSP npm
In this section, we study the Number of distinct Shortest between a pair of mesh nodes pm, is proportional to the
Paths (NSP) of each pair of vertices in the mesh. The num- rank of the pair rpm to the power of a constant P m :
ber of distinct shortest paths between two vertices is the
number of shortest paths such that any of these paths have npm / rPmm :
196 C. Xie et al. / Computer Communications 31 (2008) 190–200
Fig. 9. Log–log plot of frequency fnm versus the mesh NSP nm.
Fig. 8. Log–log plot of the mesh NSP npm versus the rank rpm in the
sequence of decreasing degree.
Deﬁnition 4. We deﬁne the Mesh NSP exponent N m to be
the slope of the plot of the frequency of the mesh NSP
Deﬁnition 3. Let us sort the pairs of nodes in the mesh of a versus the mesh NSP in log-log scale.
graph in decreasing order of NSP. We deﬁne the mesh pair
rank exponent P m to be the slope of the plot of the NSP 7. Forest topology analysis
versus the rank of the mesh node pairs in log-log scale.
In this section, we study the topology properties concern-
6.4. Mesh NSP exponent N m ing the forest in the Gnutella network. In Table 3, we present
some basic statistics about the forest in our traces. In Table 3,
In this section, we study the distribution of NSP of in- p(t) represents the percentage of nodes in the forest.
mesh nodes. We deﬁne the frequency fnm of a NSP nm as
7.1. Tree depth distribution
the number of pairs with NSP of nm in the mesh. We plot
the (fnm ; nm ) pairs in log-log scale in Fig. 9. In these plots,
We deﬁne the probability p(td) of a tree depth td as the
we exclude a small percentage of pairs of higher NSP that
percentage of trees in the forest with depth td. Fig. 10
have lowest frequency, but still plot more than 99.9% of the
describes the tree depth distribution.
total number of pairs. The solid lines are the result of the
linear regression. Table 3
The major observation of Fig. 9 is that the plots are Basic Statistics of the Forest
approximately linear with ACC of 0.94301 for the 091505 Stat. Data 091505 021106
trace and 0.99840 for the 021106 trace. We infer the follow- Nb of Nodes 90,718 107,073
ing power-law and deﬁnition. p(t) 84.6% 90.0%
Power-law 4 (Mesh NSP Exponent). The frequency fnm Nb of trees 9886 6830
of a NSP between a pair of nodes in the mesh, nm, is pro- Mean tree size 10.18 16.68
portional to the NSP to the power of a constant N m : Max tree size 4,824 231
Mean tree depth 1.52 1.30
fnm / nN m :
Max tree depth 8 10
C. Xie et al. / Computer Communications 31 (2008) 190–200 197
Fig. 10. Tree depth distribution.
In Fig. 10, we notice that more than 56% of trees are
simply composed of leaves that is directly connected to
their corresponding root. We can also observe that more
than 27% of trees have depth 2 and less than 4% of trees
have depth larger than 3.
7.2. Tree rank distribution
In this section, we study the size of each tree, which
is deﬁned as the sum of the vertices composing the tree
plus the root. We sort the trees in decreasing tree size st
and deﬁne tree rank rt as the index of the tree in the
sequence. We plot the (st,rt) pairs in Fig. 11, applying
log-scale only on the y-axis. The solid lines are given by lin- Fig. 11. Plot of the tree size st(log-scale) versus the rank rt in the sequence
ear regression. of decreasing size.
The plots of Fig. 11 match the linear regression line. The
ACC is 0.95621 for the 091505 trace and 0.95465 for the 8.1. Additional power-laws
021106 trace. Consequently, we infer the following empiri-
cal law and deﬁnition. In our study, we ﬁnd that the NSP rank distribution and
Empirical law 1: The size st of a tree t, is proportional to NSP distribution of all the nodes in the Gnutella network
an exponential function with exponent being the product of follow power-laws as well. This can be explained easily.
the rank of the tree rt and a constant T : Because the mesh is the core part of the network, shortest
st / expðT rt Þ: paths is mainly constituted by nodes in the mesh, while
nodes in the forest barely contribute to shortest paths.
However, the two power-laws presented below could be
used as minor metrics to distinguish P2P topologies.
Deﬁnition 5. Let us sort the trees of a graph in decreasing
order of size. We deﬁne T to be the slope of the plot of the
sizes of trees versus the rank of the trees with log-scale
8.1.1. Pair rank exponent P
applied on the sizes of trees.
Here we study the NSP of all the nodes (including both
This empirical law provides the formula on the sizes of in-mesh nodes and in-tree nodes). We sort the pairs of the
trees in a sequence of trees. nodes in decreasing NSP np and plot the (np, rp) pairs in
log–log scale in Fig. 12. Due to the enormous amount of
8. Discussion node pairs, we plot the ﬁrst 106 pairs only. The data values
are represented by points, while the solid lines represent the
In this section, we ﬁrst present two more power-laws least-squares approximation.
concerning all the nodes (including both in-mesh nodes The points of Fig. 12 are well approximated by the lin-
and in-tree nodes) in the Gnutella network. Then we ear regression with ACC of 0.98184 for the 091505 trace
focus on the generation of synthetic topologies of P2P and 0.99259 for the 021106 trace. Note that it seems that
networks. in both Fig. 12(a) and (b), a signiﬁcant portion of the upper
198 C. Xie et al. / Computer Communications 31 (2008) 190–200
Fig. 13. Log–log plot of frequency fn versus the NSP n.
Fig. 12. Log–log plot of the NSP np versus the rank of the pairs rp in the
sequence of decreasing NSP.
of pairs of higher NSP that have lowest frequency. In
left part of the curves goes oﬀ the straight line. However, any case, we plot more than 99.9% of the total number
this is also resulted from visual illusion. The dots in the of pairs. The solid lines are the result of the linear
nether right part of the curve is much more dense than regression.
the dots in the upper left part, resulting in that the ACC The major observation is that the plots are approxi-
value is high all the same. This leads us to the following mately linear with ACC of 0.93510 for the 091505 trace
power law and deﬁnition. and 0.98810 for the 021106 trace. We infer the following
Power-law 5 (Pair Rank Exponent): The NSP np power-law and deﬁnition.
between a pair of nodes p, is proportional to the rank of Power-law 6 (NSP Exponent): The frequency fn of a NSP
the pair rp to the power of a constant P: between a pair of nodes n, is proportional to the NSP to the
np / r P : power of a constant N :
fn / nN :
Deﬁnition 6. Let us sort the pairs of nodes of a graph in
decreasing order of NSP. We deﬁne the pair rank exponent Deﬁnition 7. We deﬁne the NSP exponent N to be the
P to be the slope of the plot of the NSP versus the rank of slope of the plot of the frequency of the NSP versus the
the pairs in log–log scale. NSP in log-log scale.
8.1.2. NSP Exponent N 8.2. Topology generation
Here we study the distribution of NSP of all the nodes
(including both in-mesh nodes and in-tree nodes). We The regularity observed in our traces of the Gnutella
deﬁne the frequency fn of a NSP n as the number of pairs network between September 2005 and February 2006
with NSP of n. We plot the (fn, n) pairs in log–log scale (including but not restricted to the two traces speciﬁcally
in Fig. 13. In these plots, we exclude a small percentage discussed in this paper) is unlikely to be a coincidence.
C. Xie et al. / Computer Communications 31 (2008) 190–200 199
We could reasonably conjecture that our laws might con-  N.S. Ting, R. Deters, 3LS – A peer-to-peer network simulator, in:
tinue to hold, at least for the near future. Proc. IEEE P2P’03, Sweden, 2003.
 N. Kotilainen, M. Vapa, T. Keltanen, A. Auvinen, J. Vuori,
Our work can facilitate the generation of realistic topol- P2PRealm – Peer-to-Peer Network Simulator, in: Proc. 11th Inter-
ogies of P2P networks, specially those which employ a national Workshop on Computer-Aided Modeling, Analysis and
hybrid and highly dynamic architecture like the Gnutella Design of Communication Links and Networks, 2006, pp. 93–99.
network. As an overview, we list the following guidelines  M. Jelasity, A. Montresor, G.P. Jesi, Peersim peer-to- peer simulator,
for creating P2P network topologies. First, a small percent- 2004, Avaliable from: <http://peersim.sourceforge.net/>.
 W. Yang, N. Abu-Ghazaleh, GPS: a general peer-to-peer simulator
age of the nodes (15.4% or 10.0%) belong to the mesh and a and its use for modeling BitTorrent, in: Proc. IEEE/ACM MAS-
large percentage of the nodes (84.6% or 90.0%) belong to COTS’05, Atlanta, GA, 2005.
the forest. Second, the degree distribution of the mesh is  A.L. Barabasi, R. Albert, Emergence of scaling in random networks,
skewed following our power-law 1 and 2. Third, more than Science 286 (1999) 509.
56% of the trees have depth one, less than 4% of the trees  A. Medina, I. Matta, J. Byers, On the origin of power laws in internet
topologies, ACM SIGCOMM Computer Communication Review 30
have depth larger than 3, and the maximum depth is 7 or (2) (2000) 18–28.
10. Fourth, the size distribution of the trees is skewed fol-
lowing our empirical law 1. As a ﬁnal step, we merge the
generated mesh and the generated forest together to get Chao Xie currently is a Ph.D. student in the
the P2P network topology. We can further use our law 3, Department of Computer Science at University
of Wisconsin-Madison. He obtained his M.S.
law 4, law 5, and law 6 to examine the quality of the gen- degree in Computer Science from Georgia State
erated topologies. If we ﬁnetune the parameters, we can get University, USA, in 2007, obtained his M.Eng.
speciﬁc topologies that meet our needs. degree in Computer Science from Huazhong
University of Science and Technology, China, in
9. Conclusion and future work 2005, and obtained his B.S. degree in Mechanical
Engineering from Huazhong University of Sci-
ence and Technology, China, in 2001.
In this paper, we study the hybrid P2P network topology His main research interests include computer
through the mesh perspective and the forest perspective networks, distributed systems, parallel computing and data mining.
respectively. Using the two-layered approach and laws pro- Chao Xie is a member of the Association of Computing Machinery and
posed, realistic topologies can be generated. the IEEE Computer Society.
References Guihai Chen obtained his B.S. degree from
Nanjing University, M.Eng. from Southeast
 C. Xie, Y. Pan, Analysis of large-scale hybrid peer-to-peer network University, and Ph.D from University of Hong
topology, in: Proc. IEEE GLOBECOM’06, San Francisco, USA, Kong. He visited Kyushu Institute of Technol-
2006. ogy, Japan in 1998 as a research fellow, and
 M.A. Jovanovic, Modelling large-scale peer-to-peer networks and a University of Queensland, Australia in 2000 as a
case study of gnutella, Master’s thesis, University of Cincinnati, visiting professor. During September 2001 to
Cambridge , June 2000. August 2003, he was a visiting professor in
 Gnutella, The gnutella protocol v0.6, 2002. Wayne State University. He is now a full pro-
 The KaZaA website, 2006. fessor and deputy chair of Department of Com-
 Clip2, The Gnutella protocol speciﬁcation v0.4, 2001. puter Science, Nanjing University. Prof. Chen
 The Limewire website, 2006. has published more than 100 papers in peer-reviewed journals and refereed
 L.A. Adamic, R.M. Lukose, A.R. Puniyani, B.A. Huberman, conference proceedings in the areas of wireless sensor networks, high-
Search in power-law networks, Physical Review E 64 (2001) performance computer architecture, peer-to-peer computing and perfor-
46135–46143. mance evaluation. He has also served on technical program committees of
 M. Faloutsos, P. Faloutsos, C. Faloutsos, On power-law relationships numerous international conferences. He is a member of the IEEE Com-
of the internet topology, in: Proc. ACM SIGCOMM’99, New York, puter Society.
NY, 1999, pp. 251–262.
 D. Magoni, J.-J. Pansiot, Analysis of the autonomous system network
topology, ACM SIGCOMM Computer Communication Review 31 Art Vandenberg was born in Grasonville, Mary-
(3) (2001) 26–37. land, 1950. Education includes B.A. English
 M. Ripeanu, I. Foster, A. Iamnitchi, Mapping the Gnutella network: Literature, Swarthmore College, Swarthmore,
properties of large-scale peer-to-peer systems and implications for PA, 1972; M.V.A Painting and Drawing, Georgia
system design, IEEE Internet Computing Journal 6 (1) (2002) 50–57. State University, Atlanta, GA 1979; and M.S.
 H. Chen, H. Jin, J. Sun, D. Deng, X. Liao, Analysis of large-scale Information and Computer Systems, Georgia
topological properties for peer-to-peer networks, in: Proc. IEEE Institute of Technology, Atlanta, GA 1985.
CCGrid’04, 2004, pp. 27–34. He has worked in library systems, research and
 Q. He, M. Ammar, G. Riley, H. Raj, R. Fujimoto, Mapping peer administrative computing since 1976, including 15
behavior to packet-level details: a framework for packet-level years in information technology positions at
simulation of peer-to-peer systems, in: Proc. IEEE/ACM MAS- Georgia Institute of Technology. Since 1997 he has
COTS’03, Orlando, FL, October 2003. been with Information Systems & Technology at Georgia State University,
 S. Merugu, S. Srinivasan, E. Zegura, P-sim, A simulator for peer-to- as Director of Advanced Campus Services charged with deploying middle-
peer networks, in: Proc. IEEE/ACM MASCOTS’03, Orlando, FL, ware and research computing infrastructure. His current activities include
Oct. 2003. deploying grid computing solutions and establishing high-performance
200 C. Xie et al. / Computer Communications 31 (2008) 190–200
computing cyberinfrastructure. Recent research grants include a NSF ITR has also co-authored/co-edited 30 books (including proceedings) and
Award 0312636 as Co-PI investigating a unique approach to resolving contributed several book chapters. His pioneer work on computing using
metadata heterogeneity for information integration by combining moni- reconﬁgurable optical buses has inspired extensive subsequent work by
toring, clustering and visualization to discover patterns or trends. He is a many researchers, and his research results have been cited by more than
member of Georgia State’s IT Risk Management Research Group, the 100 researchers worldwide in books, theses, journal and conference
Georgia State Information Integration Lab, and serves as Chair of papers. He is a co-inventor of three U.S. patents (pending) and 5 pro-
SURAgrid, a regional grid initiative of the Southeastern Universities visional patents, and has received many awards from agencies such as
Research Association. NSF, AFOSR, JSPS, IISF and Mellon Foundation. His recent research
Mr. Vandenberg is a member of the Association of Computing has been supported by NSF, NIH, NSFC, AFOSR, AFRL, JSPS, IISF
Machinery and the IEEE Computer Society. and the states of Georgia and Ohio. He has served as a reviewer/panelist
for many research foundations/agencies such as the U.S. National Sci-
ence Foundation, the Natural Sciences and Engineering Research
Yi Pan is the chair and a professor in the Council of Canada, the Australian Research Council, and the Hong
Department of Computer Science and a profes- Kong Research Grants Council. Dr. Pan has served as an editor-in-chief
sor in the Department of Computer Information or editorial board member for 15 journals including 5 IEEE Transac-
Systems at Georgia State University. Dr. Pan tions and a guest editor for 10 special issues for 9 journals including 2
received his B.Eng. and M.Eng. degrees in IEEE Transactions. He has organized several international conferences
computer engineering from Tsinghua University, and workshops and has also served as a program committee member for
China, in 1982 and 1984, respectively, and his several major international conferences such as INFOCOM, GLOBE-
Ph.D. degree in computer science from the COM, ICC, IPDPS, and ICPP. Dr. Pan has delivered over 10 keynote
University of Pittsburgh, USA, in 1991. Dr. speeches at many international conferences. Dr. Pan is an IEEE Dis-
Pan’s research interests include parallel and tinguished Speaker (2000-2002), a Yamacraw Distinguished Speaker
distributed computing, optical networks, wire- (2002), a Shell Oil Colloquium Speaker (2002), and a senior member of
less networks, and bioinformatics. Dr. Pan has published more than 100 IEEE. He is listed in Men of Achievement, Who’sWho in Midwest,
journal papers with 30 papers published in various IEEE journals. In Who’sWho in America, Who’sWho in American Education, Who’s Who
addition, he has published over 100 papers in refereed conferences in Computational Science and Engineering, and Who’s Who of Asian
(including IPDPS, ICPP, ICDCS, INFOCOM, and GLOBECOM). He Americans.