The document discusses collusion in PageRank rankings. It aims to study how much a group of colluding pages can increase their PageRank through nepotistic linking. The authors develop a framework to model different types of collusion topologies and calculate PageRank scores for colluding groups and other pages. They explore bounding the maximum gain in ranking a group can achieve through collusion and investigate the impact of collusion through experiments on synthetic and real web graphs.
Comparing Sidecar-less Service Mesh from Cilium and Istio
PageRank Increase under Different Collusion Topologies (AIRWEB 2005)
1. Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Pagerank Increase
Outline
under Different Collusion Topologies
Introduction
Collusion and
Pagerank
Experiments in a
Ricardo Baeza-Yates, Carlos Castillo and Vicente L´pez
o
synthetic graph
Experiments in a
real Web graph
ICREA Professor / Dept. of Technology / C´tedra Telef´nica
a o
Conclusions
Universitat Pompeu Fabra – Barcelona, Spain
May 10th, 2005
2. Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
Introduction
1
V. L´pez
o
Outline
Collusion and Pagerank
2
Introduction
Collusion and
Pagerank
Experiments in a
Experiments in a synthetic graph
3
synthetic graph
Experiments in a
real Web graph
Experiments in a real Web graph
4
Conclusions
Conclusions
5
3. Goal
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Study collusion
Outline
Nepotistic linking in a Web graph
Introduction
Collusion and
Pagerank
This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
Colluding groups could use different topologies
Experiments in a
real Web graph
Colluding groups could have different original rankings
Conclusions
How much would their ranking increase if ... ?
4. Goal
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Study collusion
Outline
Nepotistic linking in a Web graph
Introduction
Collusion and
Pagerank
This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
Colluding groups could use different topologies
Experiments in a
real Web graph
Colluding groups could have different original rankings
Conclusions
How much would their ranking increase if ... ?
5. Goal
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Study collusion
Outline
Nepotistic linking in a Web graph
Introduction
Collusion and
Pagerank
This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
Colluding groups could use different topologies
Experiments in a
real Web graph
Colluding groups could have different original rankings
Conclusions
How much would their ranking increase if ... ?
6. Goal
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Study collusion
Outline
Nepotistic linking in a Web graph
Introduction
Collusion and
Pagerank
This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
Colluding groups could use different topologies
Experiments in a
real Web graph
Colluding groups could have different original rankings
Conclusions
How much would their ranking increase if ... ?
7. Goal
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Study collusion
Outline
Nepotistic linking in a Web graph
Introduction
Collusion and
Pagerank
This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
Colluding groups could use different topologies
Experiments in a
real Web graph
Colluding groups could have different original rankings
Conclusions
How much would their ranking increase if ... ?
8. Framework
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
We use Pagerank as the ranking function [Page et al., 1998]
Outline
Pagerank
Introduction
Let LN×N row-wise normalized link matrix
Collusion and
Pagerank
Let U a matrix such that Ui,j = 1/N
Experiments in a
Let P = (1 − )L + U
synthetic graph
Pagerank scores are given by v such that P T v = v
Experiments in a
real Web graph
Pagerank scores are the probabilities of visiting a page using a
Conclusions
process of random browsing, with a “reset” probability of
≈ 0.15.
9. Framework
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
We use Pagerank as the ranking function [Page et al., 1998]
Outline
Pagerank
Introduction
Let LN×N row-wise normalized link matrix
Collusion and
Pagerank
Let U a matrix such that Ui,j = 1/N
Experiments in a
Let P = (1 − )L + U
synthetic graph
Pagerank scores are given by v such that P T v = v
Experiments in a
real Web graph
Pagerank scores are the probabilities of visiting a page using a
Conclusions
process of random browsing, with a “reset” probability of
≈ 0.15.
10. Framework
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
We use Pagerank as the ranking function [Page et al., 1998]
Outline
Pagerank
Introduction
Let LN×N row-wise normalized link matrix
Collusion and
Pagerank
Let U a matrix such that Ui,j = 1/N
Experiments in a
Let P = (1 − )L + U
synthetic graph
Pagerank scores are given by v such that P T v = v
Experiments in a
real Web graph
Pagerank scores are the probabilities of visiting a page using a
Conclusions
process of random browsing, with a “reset” probability of
≈ 0.15.
11. Gain from collusion
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
Maximum gain [Zhang et al., 2004]:
V. L´pez
o
New Pagerank 1
Outline
≤
Introduction
Old Pagerank
Collusion and
≈ 0.15, maximum gain ≈ 7.
As
Pagerank
Experiments in a
synthetic graph
Experiments in a
real Web graph
Conclusions
First task: improve this bound.
12. Gain from collusion
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
Maximum gain [Zhang et al., 2004]:
V. L´pez
o
New Pagerank 1
Outline
≤
Introduction
Old Pagerank
Collusion and
≈ 0.15, maximum gain ≈ 7.
As
Pagerank
Experiments in a
synthetic graph
Experiments in a
real Web graph
Conclusions
First task: improve this bound.
13. Impact of collusion in Pagerank
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
The Web: N pages
Outline
G'
Introduction
Collusion and
G
Pagerank
N-M pages
Experiments in a
M pages
synthetic graph
Experiments in a
real Web graph
Conclusions
14. Grouping nodes for Pagerank calculation
Pagerank
Increase under
Links for Pagerank, can be “lumped” together
Collusion
[Clausen, 2004]:
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Outline
Introduction
Collusion and
Pagerank
Experiments in a
synthetic graph
N-M pages
Experiments in a
M pages Random
real Web graph
jumps
Conclusions
15. Links for Pagerank calculation
Pagerank
Increase under
Collusion
Pagerankcolluding = Pjump + Pin + Pself
nodes
R. Baeza-Yates,
C. Castillo and
Pin
V. L´pez
o
Outline
Introduction
Collusion and
Pagerank
Pjump
Experiments in a
M nodes,
synthetic graph
N-M nodes,
Pagerank=
Pagerank=
Experiments in a
x 1-x
real Web graph
Conclusions
Pself
16. Pagerank calculation: random jumps
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Outline
Introduction
There are N nodes in total, M in the colluding set:
Collusion and
Pagerank
Pjump = (M/N)
Experiments in a
synthetic graph
Experiments in a
real Web graph
Conclusions
17. Pagerank calculation: incoming links
Pagerank
Increase under
Collusion
Pagerank(a)
R. Baeza-Yates,
Pin =
C. Castillo and
deg (a)
V. L´pez
o
(a,b):(a,b)∈Ein
Outline
= Pagerank(a) p(a)
Introduction
a:a∈G −G
Collusion and
Pagerank
Where p(a) is the fraction of links from node a pointing to
Experiments in a
the colluding set, possibly 0 for some nodes.
synthetic graph
Experiments in a
real Web graph
Conclusions
Pagerank(a)p(a)
a:a∈G −G
p=
a:a∈G −G Pagerank(a)
a:a∈G −G Pagerank(a)p(a)
=
1−x
Z p is a weighted average of p(a), it reflects how
“important” pages in the colluding set are
18. Pagerank calculation: incoming links
Pagerank
Increase under
Collusion
Pagerank(a)
R. Baeza-Yates,
Pin =
C. Castillo and
deg (a)
V. L´pez
o
(a,b):(a,b)∈Ein
Outline
= Pagerank(a) p(a)
Introduction
a:a∈G −G
Collusion and
Pagerank
Where p(a) is the fraction of links from node a pointing to
Experiments in a
the colluding set, possibly 0 for some nodes.
synthetic graph
Experiments in a
real Web graph
Conclusions
Pagerank(a)p(a)
a:a∈G −G
p=
a:a∈G −G Pagerank(a)
a:a∈G −G Pagerank(a)p(a)
=
1−x
Z p is a weighted average of p(a), it reflects how
“important” pages in the colluding set are
19. Pagerank calculation: incoming links
Pagerank
Increase under
Collusion
Pagerank(a)
R. Baeza-Yates,
Pin =
C. Castillo and
deg (a)
V. L´pez
o
(a,b):(a,b)∈Ein
Outline
= Pagerank(a) p(a)
Introduction
a:a∈G −G
Collusion and
Pagerank
Where p(a) is the fraction of links from node a pointing to
Experiments in a
the colluding set, possibly 0 for some nodes.
synthetic graph
Experiments in a
real Web graph
Conclusions
Pagerank(a)p(a)
a:a∈G −G
p=
a:a∈G −G Pagerank(a)
a:a∈G −G Pagerank(a)p(a)
=
1−x
Z p is a weighted average of p(a), it reflects how
“important” pages in the colluding set are
20. Pagerank calculation: incoming links
Pagerank
Increase under
Collusion
Pagerank(a)
R. Baeza-Yates,
Pin =
C. Castillo and
deg (a)
V. L´pez
o
(a,b):(a,b)∈Ein
Outline
= Pagerank(a) p(a)
Introduction
a:a∈G −G
Collusion and
Pagerank
Where p(a) is the fraction of links from node a pointing to
Experiments in a
the colluding set, possibly 0 for some nodes.
synthetic graph
Experiments in a
real Web graph
Conclusions
Pagerank(a)p(a)
a:a∈G −G
p=
a:a∈G −G Pagerank(a)
a:a∈G −G Pagerank(a)p(a)
=
1−x
Z p is a weighted average of p(a), it reflects how
“important” pages in the colluding set are
21. Pagerank calculation: incoming links
Pagerank
Increase under
Collusion
Pagerank(a)
R. Baeza-Yates,
Pin =
C. Castillo and
deg (a)
V. L´pez
o
(a,b):(a,b)∈Ein
Outline
= Pagerank(a) p(a)
Introduction
a:a∈G −G
Collusion and
Pagerank
Where p(a) is the fraction of links from node a pointing to
Experiments in a
the colluding set, possibly 0 for some nodes.
synthetic graph
Experiments in a
real Web graph
Conclusions
Pagerank(a)p(a)
a:a∈G −G
p=
a:a∈G −G Pagerank(a)
a:a∈G −G Pagerank(a)p(a)
=
1−x
Z p is a weighted average of p(a), it reflects how
“important” pages in the colluding set are
22. Pagerank calculation: incoming and self links
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Pin can be rewritten as:
Outline
Introduction
Pin = (1 − )(1 − x)p
Collusion and
Pagerank
Using the same trick for Pself , we can take s as the weighted
Experiments in a
average of the fraction of self-links of each page in the
synthetic graph
colluding set, and write:
Experiments in a
real Web graph
Conclusions
Pself = (1 − )xs
23. Pagerank calculation: incoming and self links
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Pin can be rewritten as:
Outline
Introduction
Pin = (1 − )(1 − x)p
Collusion and
Pagerank
Using the same trick for Pself , we can take s as the weighted
Experiments in a
average of the fraction of self-links of each page in the
synthetic graph
colluding set, and write:
Experiments in a
real Web graph
Conclusions
Pself = (1 − )xs
24. Pagerank calculation summary
Pagerank
Increase under
Collusion
R. Baeza-Yates,
Pin= (1-)(1-x)p
C. Castillo and
V. L´pez
o
Outline
Introduction
Collusion and
Pjump= (M/N)
Pagerank
M nodes,
Experiments in a N-M nodes,
Pagerank=
synthetic graph
Pagerank=
x 1-x
Experiments in a
real Web graph
Conclusions
Pself= (1-)xs
25. Solving
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Solving the stationary state Pin + Pjump + Pself = x yields:
Outline
M
Introduction
+ (1 − ) p
N
xnormal =
Collusion and
(p − s)(1 − ) + 1
Pagerank
Experiments in a
synthetic graph
What happens when colluding ?
Experiments in a
real Web graph
Colluding means pointing more links to the inside
Conclusions
This means s → s , with s > s, yielding xcolluding
26. Solving
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Solving the stationary state Pin + Pjump + Pself = x yields:
Outline
M
Introduction
+ (1 − ) p
N
xnormal =
Collusion and
(p − s)(1 − ) + 1
Pagerank
Experiments in a
synthetic graph
What happens when colluding ?
Experiments in a
real Web graph
Colluding means pointing more links to the inside
Conclusions
This means s → s , with s > s, yielding xcolluding
27. Pagerank increase due to collusion
Pagerank
Increase under
Collusion
Making s = 1, all links from the colluding set go inside now:
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
1−s
xcolluding
=1+
Outline
xnormal p + 1−
Introduction
Collusion and
Making s = p, originally the set was not colluding:
Pagerank
Experiments in a
xcolluding 1
synthetic graph
=
p(1 − ) +
xnormal
Experiments in a
real Web graph
Conclusions
Z This is inversely correlated to p, the original weighted
fraction of links going to the colluding set
28. Pagerank increase due to collusion
Pagerank
Increase under
Collusion
Making s = 1, all links from the colluding set go inside now:
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
1−s
xcolluding
=1+
Outline
xnormal p + 1−
Introduction
Collusion and
Making s = p, originally the set was not colluding:
Pagerank
Experiments in a
xcolluding 1
synthetic graph
=
p(1 − ) +
xnormal
Experiments in a
real Web graph
Conclusions
Z This is inversely correlated to p, the original weighted
fraction of links going to the colluding set
29. Pagerank increase due to collusion
Pagerank
Increase under
Collusion
Making s = 1, all links from the colluding set go inside now:
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
1−s
xcolluding
=1+
Outline
xnormal p + 1−
Introduction
Collusion and
Making s = p, originally the set was not colluding:
Pagerank
Experiments in a
xcolluding 1
synthetic graph
=
p(1 − ) +
xnormal
Experiments in a
real Web graph
Conclusions
Z This is inversely correlated to p, the original weighted
fraction of links going to the colluding set
30. Expected Pagerank change
Pagerank
Increase under
xcolluding /xnormal as a function of p
Collusion
R. Baeza-Yates,
C. Castillo and
7
V. L´pez
o
1/ε
Maximum pagerank change
Outline
6
Introduction
5
Collusion and
Pagerank
Experiments in a
4
synthetic graph
Experiments in a
3
real Web graph
Conclusions
2
1
10-3 10-2 10-1 100
Weighted average of fraction of links to colluding nodes
31. Experiments in a synthetic graph
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
Created using the generative model [Kumar et al., 2000]
V. L´pez
o
Power-law distribution with parameter −2.1 for in-degree
Outline
and Pagerank, and −2.7 for out-degree, using parameters
Introduction
from [Pandurangan et al., 2002]
Collusion and
Pagerank
100,000–nodes scale-free graph
Experiments in a
synthetic graph
Experiments in a
Sampling by Pagerank
real Web graph
Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions
Picked a group of 100 nodes at random from each decile
Group 1 are low-ranked nodes, group 10 are high-ranked nodes
32. Experiments in a synthetic graph
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
Created using the generative model [Kumar et al., 2000]
V. L´pez
o
Power-law distribution with parameter −2.1 for in-degree
Outline
and Pagerank, and −2.7 for out-degree, using parameters
Introduction
from [Pandurangan et al., 2002]
Collusion and
Pagerank
100,000–nodes scale-free graph
Experiments in a
synthetic graph
Experiments in a
Sampling by Pagerank
real Web graph
Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions
Picked a group of 100 nodes at random from each decile
Group 1 are low-ranked nodes, group 10 are high-ranked nodes
33. Experiments in a synthetic graph
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
Created using the generative model [Kumar et al., 2000]
V. L´pez
o
Power-law distribution with parameter −2.1 for in-degree
Outline
and Pagerank, and −2.7 for out-degree, using parameters
Introduction
from [Pandurangan et al., 2002]
Collusion and
Pagerank
100,000–nodes scale-free graph
Experiments in a
synthetic graph
Experiments in a
Sampling by Pagerank
real Web graph
Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions
Picked a group of 100 nodes at random from each decile
Group 1 are low-ranked nodes, group 10 are high-ranked nodes
34. Experiments in a synthetic graph
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
Created using the generative model [Kumar et al., 2000]
V. L´pez
o
Power-law distribution with parameter −2.1 for in-degree
Outline
and Pagerank, and −2.7 for out-degree, using parameters
Introduction
from [Pandurangan et al., 2002]
Collusion and
Pagerank
100,000–nodes scale-free graph
Experiments in a
synthetic graph
Experiments in a
Sampling by Pagerank
real Web graph
Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions
Picked a group of 100 nodes at random from each decile
Group 1 are low-ranked nodes, group 10 are high-ranked nodes
35. Experiments in a synthetic graph
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
Created using the generative model [Kumar et al., 2000]
V. L´pez
o
Power-law distribution with parameter −2.1 for in-degree
Outline
and Pagerank, and −2.7 for out-degree, using parameters
Introduction
from [Pandurangan et al., 2002]
Collusion and
Pagerank
100,000–nodes scale-free graph
Experiments in a
synthetic graph
Experiments in a
Sampling by Pagerank
real Web graph
Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions
Picked a group of 100 nodes at random from each decile
Group 1 are low-ranked nodes, group 10 are high-ranked nodes
36. Original Pagerank of the nodes
Pagerank
Increase under
These are the original Pagerank values for each group
Collusion
R. Baeza-Yates,
C. Castillo and
10-2
V. L´pez
o
Outline
Originally very good
-3
Introduction
10
Collusion and
Pagerank values
Pagerank
Experiments in a
10-4
synthetic graph
Experiments in a
real Web graph
10-5
Conclusions
Originally very bad
Average
10-6
1 2 3 4 5 6 7 8 9 10
Group
37. Modified Pagerank of the nodes
Pagerank
Increase under
These are the modified Pagerank values when colluding.
Collusion
R. Baeza-Yates,
C. Castillo and
10-2
V. L´pez
o
Outline
-3
Introduction
10
Collusion and
Pagerank values
Pagerank
Experiments in a
10-4
synthetic graph
Experiments in a
real Web graph
10-5
Conclusions
Original
Clique
-6
10
1 2 3 4 5 6 7 8 9 10
Group
38. Distribution of Pagerank
Pagerank
i But Pagerank values follow a power law distribution ...
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
100
V. L´pez
o
x-2.1
Outline
10-1
Introduction
Collusion and
Pagerank
10-2
Frequency
Experiments in a
synthetic graph
10-3
Experiments in a
real Web graph
Conclusions
10-4
10-5 -6
10-5 10-4 10-3 10-2
10
Pagerank value
39. Modified Pagerank position of the nodes
Pagerank
These are the modified Pagerank positions (rankings) when
Increase under
Collusion
colluding.
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
1.0
0.9
Outline
Introduction
0.8
Collusion and
0.7
Pagerank
Pagerank ranking
0.6
Experiments in a
synthetic graph
0.5
Experiments in a
real Web graph
0.4
Conclusions
0.3
0.2
Original
0.1
Clique
0.0
1 2 3 4 5 6 7 8 9 10
Group
40. Variation of Pagerank when colluding
Pagerank
Increase under
These are the ratio of xcolluding /xoriginal
Collusion
R. Baeza-Yates,
C. Castillo and
7
V. L´pez
o
1/ε − Change in Pagerank value
Change in ranking
6
Outline
New value / original value
Introduction
5
Collusion and
Pagerank
4
Experiments in a
synthetic graph
3
Experiments in a
real Web graph
2
Conclusions
1
0
1 2 3 4 5 6 7 8 9 10
Group
41. It is not necessary to create a clique
Pagerank
Spammers can use a fraction of the links to try to avoid
Increase under
Collusion
detection
R. Baeza-Yates,
C. Castillo and
7
Full clique
V. L´pez
o
1/ε − 95%
90%
85%
Outline
6 80%
75%
Introduction
70%
New Pagerank / original Pagerank
65%
Collusion and 60%
5
55%
Pagerank
50%
45%
Experiments in a
40%
synthetic graph
4 35%
30%
Experiments in a 25%
real Web graph 20%
15%
3
10%
Conclusions
05%
2
1
1 2 3 4 5 6 7 8 9 10
Group
In the paper, other topologies: star and ring
42. It is not necessary to create a clique
Pagerank
Spammers can use a fraction of the links to try to avoid
Increase under
Collusion
detection
R. Baeza-Yates,
C. Castillo and
7
Full clique
V. L´pez
o
1/ε − 95%
90%
85%
Outline
6 80%
75%
Introduction
70%
New Pagerank / original Pagerank
65%
Collusion and 60%
5
55%
Pagerank
50%
45%
Experiments in a
40%
synthetic graph
4 35%
30%
Experiments in a 25%
real Web graph 20%
15%
3
10%
Conclusions
05%
2
1
1 2 3 4 5 6 7 8 9 10
Group
In the paper, other topologies: star and ring
43. Experiments in a real Web graph
Pagerank
Increase under
Hostgraph of 310,486 Websites from Spain
Collusion
R. Baeza-Yates,
C. Castillo and
100
V. L´pez
o
x-2.1
10-1
Outline
Introduction
-2
Collusion and
10
Pagerank
Frequency
Experiments in a
10-3
synthetic graph
Experiments in a
10-4
real Web graph
Conclusions
10-5
10-6 -6
10-5 10-4 10-3 10-2
10
Pagerank value
44. Experiments in a real Web graph
Pagerank
Increase under
Some of the nodes are already colluding [Fetterly et al., 2004]
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Outline
Introduction
Collusion and
Pagerank
Experiments in a
synthetic graph
Experiments in a
real Web graph
Conclusions
45. We study a set of Web sites
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
we modify their links.
Introduction
Disconnect the group
Collusion and
Pagerank
Create a ring
Experiments in a
Add a central page linking to all of them
synthetic graph
Experiments in a
Add a central page linking to and from all of them (star)
real Web graph
Fully connect the group (clique)
Conclusions
Now we measure the new ranking
46. We study a set of Web sites
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
we modify their links.
Introduction
Disconnect the group
Collusion and
Pagerank
Create a ring
Experiments in a
Add a central page linking to all of them
synthetic graph
Experiments in a
Add a central page linking to and from all of them (star)
real Web graph
Fully connect the group (clique)
Conclusions
Now we measure the new ranking
47. We study a set of Web sites
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
we modify their links.
Introduction
Disconnect the group
Collusion and
Pagerank
Create a ring
Experiments in a
Add a central page linking to all of them
synthetic graph
Experiments in a
Add a central page linking to and from all of them (star)
real Web graph
Fully connect the group (clique)
Conclusions
Now we measure the new ranking
48. We study a set of Web sites
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
we modify their links.
Introduction
Disconnect the group
Collusion and
Pagerank
Create a ring
Experiments in a
Add a central page linking to all of them
synthetic graph
Experiments in a
Add a central page linking to and from all of them (star)
real Web graph
Fully connect the group (clique)
Conclusions
Now we measure the new ranking
49. We study a set of Web sites
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
we modify their links.
Introduction
Disconnect the group
Collusion and
Pagerank
Create a ring
Experiments in a
Add a central page linking to all of them
synthetic graph
Experiments in a
Add a central page linking to and from all of them (star)
real Web graph
Fully connect the group (clique)
Conclusions
Now we measure the new ranking
50. We study a set of Web sites
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
we modify their links.
Introduction
Disconnect the group
Collusion and
Pagerank
Create a ring
Experiments in a
Add a central page linking to all of them
synthetic graph
Experiments in a
Add a central page linking to and from all of them (star)
real Web graph
Fully connect the group (clique)
Conclusions
Now we measure the new ranking
51. We study a set of Web sites
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
we modify their links.
Introduction
Disconnect the group
Collusion and
Pagerank
Create a ring
Experiments in a
Add a central page linking to all of them
synthetic graph
Experiments in a
Add a central page linking to and from all of them (star)
real Web graph
Fully connect the group (clique)
Conclusions
Now we measure the new ranking
52. We study a set of Web sites
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
we modify their links.
Introduction
Disconnect the group
Collusion and
Pagerank
Create a ring
Experiments in a
Add a central page linking to all of them
synthetic graph
Experiments in a
Add a central page linking to and from all of them (star)
real Web graph
Fully connect the group (clique)
Conclusions
Now we measure the new ranking
53. New rankings under graph modifications
Pagerank
Increase under
Collusion
1
R. Baeza-Yates,
C. Castillo and
0.9
V. L´pez
o
0.8
Outline
0.7
Introduction
Rankings
0.6
Collusion and
Pagerank
0.5
Experiments in a
0.4
synthetic graph
0.3
Experiments in a
real Web graph
0.2
Conclusions
0.1
0
Normal Ring Star
Disconnected Central Inv. Ring Clique
Strategy
54. Adding 5%-50% of complete subgraph
Pagerank
Increase under
Collusion
1.000
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
0.995
Outline
Introduction
Rankings
Collusion and
Pagerank
0.990
Experiments in a
synthetic graph
Experiments in a
0.985
real Web graph
Conclusions
Average ranking
0.980
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55%
Percent of links of a complete subgraph
The best sites also increase their ranking
55. Adding 5%-50% of complete subgraph
Pagerank
Increase under
Collusion
1.000
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
0.995
Outline
Introduction
Rankings
Collusion and
Pagerank
0.990
Experiments in a
synthetic graph
Experiments in a
0.985
real Web graph
Conclusions
Average ranking
0.980
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55%
Percent of links of a complete subgraph
The best sites also increase their ranking
56. Conclusions
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V Any group of nodes can increase its Pagerank
V. L´pez
o
V Nodes with high Pagerank gain less by colluding
Outline
Ideas for link spam detection
Introduction
Collusion and
X Only detecting regularities can fail to detect randomized
Pagerank
structures
Experiments in a
synthetic graph
X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph
Conclusions
V Use evidence from multiple sources
57. Conclusions
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V Any group of nodes can increase its Pagerank
V. L´pez
o
V Nodes with high Pagerank gain less by colluding
Outline
Ideas for link spam detection
Introduction
Collusion and
X Only detecting regularities can fail to detect randomized
Pagerank
structures
Experiments in a
synthetic graph
X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph
Conclusions
V Use evidence from multiple sources
58. Conclusions
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V Any group of nodes can increase its Pagerank
V. L´pez
o
V Nodes with high Pagerank gain less by colluding
Outline
Ideas for link spam detection
Introduction
Collusion and
X Only detecting regularities can fail to detect randomized
Pagerank
structures
Experiments in a
synthetic graph
X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph
Conclusions
V Use evidence from multiple sources
59. Conclusions
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V Any group of nodes can increase its Pagerank
V. L´pez
o
V Nodes with high Pagerank gain less by colluding
Outline
Ideas for link spam detection
Introduction
Collusion and
X Only detecting regularities can fail to detect randomized
Pagerank
structures
Experiments in a
synthetic graph
X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph
Conclusions
V Use evidence from multiple sources
60. Conclusions
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V Any group of nodes can increase its Pagerank
V. L´pez
o
V Nodes with high Pagerank gain less by colluding
Outline
Ideas for link spam detection
Introduction
Collusion and
X Only detecting regularities can fail to detect randomized
Pagerank
structures
Experiments in a
synthetic graph
X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph
Conclusions
V Use evidence from multiple sources
61. Conclusions
Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V Any group of nodes can increase its Pagerank
V. L´pez
o
V Nodes with high Pagerank gain less by colluding
Outline
Ideas for link spam detection
Introduction
Collusion and
X Only detecting regularities can fail to detect randomized
Pagerank
structures
Experiments in a
synthetic graph
X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph
Conclusions
V Use evidence from multiple sources
62. Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Outline
Introduction
Collusion and
Thank you
Pagerank
Experiments in a
synthetic graph
Experiments in a
real Web graph
Conclusions
63. Pagerank
Increase under
Collusion
Clausen, A. (2004).
R. Baeza-Yates,
The cost of attack of PageRank.
C. Castillo and
In Proceedings of the international conference on agents, Web
V. L´pez
o
technologies and Internet commerce (IAWTIC), Gold Coast, Australia.
Outline
Fetterly, D., Manasse, M., and Najork, M. (2004).
Introduction
Spam, damn spam, and statistics: Using statistical analysis to locate spam
Web pages.
Collusion and
Pagerank
In Proceedings of the seventh workshop on the Web and databases
(WebDB), Paris, France.
Experiments in a
synthetic graph
Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A.,
Experiments in a
and Upfal, E. (2000).
real Web graph
Stochastic models for the web graph.
Conclusions
In Proceedings of the 41st Annual Symposium on Foundations of
Computer Science (FOCS), pages 57–65, Redondo Beach, CA, USA. IEEE
CS Press.
Page, L., Brin, S., Motwani, R., and Winograd, T. (1998).
The Pagerank citation algorithm: bringing order to the web.
Technical report, Stanford Digital Library Technologies Project.
64. Pagerank
Increase under
Collusion
R. Baeza-Yates,
C. Castillo and
V. L´pez
o
Pandurangan, G., Raghavan, P., and Upfal, E. (2002).
Outline
Using Pagerank to characterize Web structure.
Introduction
In Proceedings of the 8th Annual International Computing and
Combinatorics Conference (COCOON), volume 2387 of Lecture Notes in
Collusion and
Pagerank Computer Science, pages 330–390, Singapore. Springer.
Experiments in a
Zhang, H., Goel, A., Govindan, R., Mason, K., and Roy, B. V. (2004).
synthetic graph
Making eigenvector-based reputation systems robust to collusion.
Experiments in a
In Proceedings of the third Workshop on Web Graphs (WAW), volume
real Web graph
3243 of Lecture Notes in Computer Science, pages 92–104, Rome, Italy.
Conclusions
Springer.