SlideShare a Scribd company logo
1 of 64
Download to read offline
Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                               Pagerank Increase
Outline

                       under Different Collusion Topologies
Introduction

Collusion and
Pagerank

Experiments in a
                   Ricardo Baeza-Yates, Carlos Castillo and Vicente L´pez
                                                                     o
synthetic graph

Experiments in a
real Web graph
                      ICREA Professor / Dept. of Technology / C´tedra Telef´nica
                                                               a           o
Conclusions
                            Universitat Pompeu Fabra – Barcelona, Spain


                                         May 10th, 2005
Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                       Introduction
                   1
   V. L´pez
       o


Outline

                       Collusion and Pagerank
                   2
Introduction

Collusion and
Pagerank

Experiments in a
                       Experiments in a synthetic graph
                   3
synthetic graph

Experiments in a
real Web graph

                       Experiments in a real Web graph
                   4
Conclusions




                       Conclusions
                   5
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   Study collusion
Outline

                   Nepotistic linking in a Web graph
Introduction

Collusion and
Pagerank
                          This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
                          Colluding groups could use different topologies
Experiments in a
real Web graph
                          Colluding groups could have different original rankings
Conclusions
                          How much would their ranking increase if ... ?
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   Study collusion
Outline

                   Nepotistic linking in a Web graph
Introduction

Collusion and
Pagerank
                          This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
                          Colluding groups could use different topologies
Experiments in a
real Web graph
                          Colluding groups could have different original rankings
Conclusions
                          How much would their ranking increase if ... ?
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   Study collusion
Outline

                   Nepotistic linking in a Web graph
Introduction

Collusion and
Pagerank
                          This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
                          Colluding groups could use different topologies
Experiments in a
real Web graph
                          Colluding groups could have different original rankings
Conclusions
                          How much would their ranking increase if ... ?
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   Study collusion
Outline

                   Nepotistic linking in a Web graph
Introduction

Collusion and
Pagerank
                          This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
                          Colluding groups could use different topologies
Experiments in a
real Web graph
                          Colluding groups could have different original rankings
Conclusions
                          How much would their ranking increase if ... ?
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   Study collusion
Outline

                   Nepotistic linking in a Web graph
Introduction

Collusion and
Pagerank
                          This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
                          Colluding groups could use different topologies
Experiments in a
real Web graph
                          Colluding groups could have different original rankings
Conclusions
                          How much would their ranking increase if ... ?
Framework
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We use Pagerank as the ranking function [Page et al., 1998]
Outline
                   Pagerank
Introduction
                   Let LN×N row-wise normalized link matrix
Collusion and
Pagerank
                   Let U a matrix such that Ui,j = 1/N
Experiments in a
                   Let P = (1 − )L + U
synthetic graph

                   Pagerank scores are given by v such that P T v = v
Experiments in a
real Web graph

                   Pagerank scores are the probabilities of visiting a page using a
Conclusions

                   process of random browsing, with a “reset” probability of
                     ≈ 0.15.
Framework
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We use Pagerank as the ranking function [Page et al., 1998]
Outline
                   Pagerank
Introduction
                   Let LN×N row-wise normalized link matrix
Collusion and
Pagerank
                   Let U a matrix such that Ui,j = 1/N
Experiments in a
                   Let P = (1 − )L + U
synthetic graph

                   Pagerank scores are given by v such that P T v = v
Experiments in a
real Web graph

                   Pagerank scores are the probabilities of visiting a page using a
Conclusions

                   process of random browsing, with a “reset” probability of
                     ≈ 0.15.
Framework
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We use Pagerank as the ranking function [Page et al., 1998]
Outline
                   Pagerank
Introduction
                   Let LN×N row-wise normalized link matrix
Collusion and
Pagerank
                   Let U a matrix such that Ui,j = 1/N
Experiments in a
                   Let P = (1 − )L + U
synthetic graph

                   Pagerank scores are given by v such that P T v = v
Experiments in a
real Web graph

                   Pagerank scores are the probabilities of visiting a page using a
Conclusions

                   process of random browsing, with a “reset” probability of
                     ≈ 0.15.
Gain from collusion
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                   Maximum gain [Zhang et al., 2004]:
   V. L´pez
       o


                                        New Pagerank   1
Outline
                                                     ≤
Introduction
                                        Old Pagerank
Collusion and
                        ≈ 0.15, maximum gain ≈ 7.
                   As
Pagerank

Experiments in a
synthetic graph

Experiments in a
real Web graph

Conclusions




                   First task: improve this bound.
Gain from collusion
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                   Maximum gain [Zhang et al., 2004]:
   V. L´pez
       o


                                        New Pagerank   1
Outline
                                                     ≤
Introduction
                                        Old Pagerank
Collusion and
                        ≈ 0.15, maximum gain ≈ 7.
                   As
Pagerank

Experiments in a
synthetic graph

Experiments in a
real Web graph

Conclusions




                   First task: improve this bound.
Impact of collusion in Pagerank
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                                         The Web: N pages

Outline
                                                 G'
Introduction

Collusion and
                                G
Pagerank

                                                       N-M pages
Experiments in a
                                    M pages
synthetic graph

Experiments in a
real Web graph

Conclusions
Grouping nodes for Pagerank calculation
    Pagerank
 Increase under
                   Links for Pagerank, can be “lumped” together
    Collusion

                   [Clausen, 2004]:
R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o


Outline

Introduction

Collusion and
Pagerank

Experiments in a
synthetic graph

                                                        N-M pages
Experiments in a
                              M pages    Random
real Web graph
                                          jumps
Conclusions
Links for Pagerank calculation
    Pagerank
 Increase under
    Collusion
                   Pagerankcolluding            = Pjump + Pin + Pself
                                        nodes
R. Baeza-Yates,
C. Castillo and
                                                Pin
   V. L´pez
       o


Outline

Introduction

Collusion and
Pagerank
                                                      Pjump
Experiments in a
                               M nodes,
synthetic graph
                                                                 N-M nodes,
                              Pagerank=
                                                                 Pagerank=
Experiments in a
                                    x                              1-x
real Web graph

Conclusions




                            Pself
Pagerank calculation: random jumps
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o


Outline

Introduction
                   There are N nodes in total, M in the colluding set:
Collusion and
Pagerank

                                         Pjump = (M/N)
Experiments in a
synthetic graph

Experiments in a
real Web graph

Conclusions
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                                             Pagerank(a)
R. Baeza-Yates,
                                Pin =
C. Castillo and
                                                               deg (a)
   V. L´pez
       o
                                           (a,b):(a,b)∈Ein
Outline
                                       =              Pagerank(a) p(a)
Introduction
                                           a:a∈G −G
Collusion and
Pagerank
                   Where p(a) is the fraction of links from node a pointing to
Experiments in a
                   the colluding set, possibly 0 for some nodes.
synthetic graph

Experiments in a
real Web graph

Conclusions
                                                     Pagerank(a)p(a)
                                           a:a∈G −G
                               p=
                                              a:a∈G −G Pagerank(a)

                                           a:a∈G −G Pagerank(a)p(a)
                                   =
                                                     1−x
                   Z p is a weighted average of p(a), it reflects how
                   “important” pages in the colluding set are
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                                             Pagerank(a)
R. Baeza-Yates,
                                Pin =
C. Castillo and
                                                               deg (a)
   V. L´pez
       o
                                           (a,b):(a,b)∈Ein
Outline
                                       =              Pagerank(a) p(a)
Introduction
                                           a:a∈G −G
Collusion and
Pagerank
                   Where p(a) is the fraction of links from node a pointing to
Experiments in a
                   the colluding set, possibly 0 for some nodes.
synthetic graph

Experiments in a
real Web graph

Conclusions
                                                     Pagerank(a)p(a)
                                           a:a∈G −G
                               p=
                                              a:a∈G −G Pagerank(a)

                                           a:a∈G −G Pagerank(a)p(a)
                                   =
                                                     1−x
                   Z p is a weighted average of p(a), it reflects how
                   “important” pages in the colluding set are
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                                             Pagerank(a)
R. Baeza-Yates,
                                Pin =
C. Castillo and
                                                               deg (a)
   V. L´pez
       o
                                           (a,b):(a,b)∈Ein
Outline
                                       =              Pagerank(a) p(a)
Introduction
                                           a:a∈G −G
Collusion and
Pagerank
                   Where p(a) is the fraction of links from node a pointing to
Experiments in a
                   the colluding set, possibly 0 for some nodes.
synthetic graph

Experiments in a
real Web graph

Conclusions
                                                     Pagerank(a)p(a)
                                           a:a∈G −G
                               p=
                                              a:a∈G −G Pagerank(a)

                                           a:a∈G −G Pagerank(a)p(a)
                                   =
                                                     1−x
                   Z p is a weighted average of p(a), it reflects how
                   “important” pages in the colluding set are
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                                             Pagerank(a)
R. Baeza-Yates,
                                Pin =
C. Castillo and
                                                               deg (a)
   V. L´pez
       o
                                           (a,b):(a,b)∈Ein
Outline
                                       =              Pagerank(a) p(a)
Introduction
                                           a:a∈G −G
Collusion and
Pagerank
                   Where p(a) is the fraction of links from node a pointing to
Experiments in a
                   the colluding set, possibly 0 for some nodes.
synthetic graph

Experiments in a
real Web graph

Conclusions
                                                     Pagerank(a)p(a)
                                           a:a∈G −G
                               p=
                                              a:a∈G −G Pagerank(a)

                                           a:a∈G −G Pagerank(a)p(a)
                                   =
                                                     1−x
                   Z p is a weighted average of p(a), it reflects how
                   “important” pages in the colluding set are
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                                             Pagerank(a)
R. Baeza-Yates,
                                Pin =
C. Castillo and
                                                               deg (a)
   V. L´pez
       o
                                           (a,b):(a,b)∈Ein
Outline
                                       =              Pagerank(a) p(a)
Introduction
                                           a:a∈G −G
Collusion and
Pagerank
                   Where p(a) is the fraction of links from node a pointing to
Experiments in a
                   the colluding set, possibly 0 for some nodes.
synthetic graph

Experiments in a
real Web graph

Conclusions
                                                     Pagerank(a)p(a)
                                           a:a∈G −G
                               p=
                                              a:a∈G −G Pagerank(a)

                                           a:a∈G −G Pagerank(a)p(a)
                                   =
                                                     1−x
                   Z p is a weighted average of p(a), it reflects how
                   “important” pages in the colluding set are
Pagerank calculation: incoming and self links
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   Pin can be rewritten as:
Outline

Introduction
                                       Pin = (1 − )(1 − x)p
Collusion and
Pagerank
                   Using the same trick for Pself , we can take s as the weighted
Experiments in a
                   average of the fraction of self-links of each page in the
synthetic graph

                   colluding set, and write:
Experiments in a
real Web graph

Conclusions
                                          Pself = (1 − )xs
Pagerank calculation: incoming and self links
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   Pin can be rewritten as:
Outline

Introduction
                                       Pin = (1 − )(1 − x)p
Collusion and
Pagerank
                   Using the same trick for Pself , we can take s as the weighted
Experiments in a
                   average of the fraction of self-links of each page in the
synthetic graph

                   colluding set, and write:
Experiments in a
real Web graph

Conclusions
                                          Pself = (1 − )xs
Pagerank calculation summary
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
                                           Pin= (1-)(1-x)p
C. Castillo and
   V. L´pez
       o


Outline

Introduction

Collusion and
                                          Pjump= (M/N)
Pagerank
                               M nodes,
Experiments in a                                              N-M nodes,
                              Pagerank=
synthetic graph
                                                              Pagerank=
                                  x                             1-x
Experiments in a
real Web graph

Conclusions


                      Pself= (1-)xs
Solving
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   Solving the stationary state Pin + Pjump + Pself = x yields:
Outline

                                                  M
Introduction
                                                     + (1 − ) p
                                                  N
                                    xnormal =
Collusion and
                                                (p − s)(1 − ) + 1
Pagerank

Experiments in a
synthetic graph
                   What happens when colluding ?
Experiments in a
real Web graph
                   Colluding means pointing more links to the inside
Conclusions
                   This means s → s , with s > s, yielding xcolluding
Solving
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   Solving the stationary state Pin + Pjump + Pself = x yields:
Outline

                                                  M
Introduction
                                                     + (1 − ) p
                                                  N
                                    xnormal =
Collusion and
                                                (p − s)(1 − ) + 1
Pagerank

Experiments in a
synthetic graph
                   What happens when colluding ?
Experiments in a
real Web graph
                   Colluding means pointing more links to the inside
Conclusions
                   This means s → s , with s > s, yielding xcolluding
Pagerank increase due to collusion
    Pagerank
 Increase under
    Collusion

                   Making s = 1, all links from the colluding set go inside now:
R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                                                       1−s
                                       xcolluding
                                                  =1+
Outline
                                        xnormal       p + 1−
Introduction

Collusion and
                   Making s = p, originally the set was not colluding:
Pagerank

Experiments in a
                                      xcolluding        1
synthetic graph
                                                 =
                                                   p(1 − ) +
                                       xnormal
Experiments in a
real Web graph

Conclusions




                   Z This is inversely correlated to p, the original weighted
                   fraction of links going to the colluding set
Pagerank increase due to collusion
    Pagerank
 Increase under
    Collusion

                   Making s = 1, all links from the colluding set go inside now:
R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                                                       1−s
                                       xcolluding
                                                  =1+
Outline
                                        xnormal       p + 1−
Introduction

Collusion and
                   Making s = p, originally the set was not colluding:
Pagerank

Experiments in a
                                      xcolluding        1
synthetic graph
                                                 =
                                                   p(1 − ) +
                                       xnormal
Experiments in a
real Web graph

Conclusions




                   Z This is inversely correlated to p, the original weighted
                   fraction of links going to the colluding set
Pagerank increase due to collusion
    Pagerank
 Increase under
    Collusion

                   Making s = 1, all links from the colluding set go inside now:
R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                                                       1−s
                                       xcolluding
                                                  =1+
Outline
                                        xnormal       p + 1−
Introduction

Collusion and
                   Making s = p, originally the set was not colluding:
Pagerank

Experiments in a
                                      xcolluding        1
synthetic graph
                                                 =
                                                   p(1 − ) +
                                       xnormal
Experiments in a
real Web graph

Conclusions




                   Z This is inversely correlated to p, the original weighted
                   fraction of links going to the colluding set
Expected Pagerank change
    Pagerank
 Increase under
                   xcolluding /xnormal as a function of p
    Collusion

R. Baeza-Yates,
C. Castillo and
                                                  7
   V. L´pez
       o
                                                 1/ε
                       Maximum pagerank change
Outline
                                                 6
Introduction

                                                 5
Collusion and
Pagerank

Experiments in a
                                                 4
synthetic graph

Experiments in a
                                                 3
real Web graph

Conclusions

                                                 2

                                                 1
                                                  10-3           10-2              10-1              100
                                                 Weighted average of fraction of links to colluding nodes
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                       Created using the generative model [Kumar et al., 2000]
   V. L´pez
       o


                       Power-law distribution with parameter −2.1 for in-degree
Outline

                       and Pagerank, and −2.7 for out-degree, using parameters
Introduction

                       from [Pandurangan et al., 2002]
Collusion and
Pagerank
                       100,000–nodes scale-free graph
Experiments in a
synthetic graph

Experiments in a
                   Sampling by Pagerank
real Web graph

                   Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions


                   Picked a group of 100 nodes at random from each decile
                   Group 1 are low-ranked nodes, group 10 are high-ranked nodes
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                       Created using the generative model [Kumar et al., 2000]
   V. L´pez
       o


                       Power-law distribution with parameter −2.1 for in-degree
Outline

                       and Pagerank, and −2.7 for out-degree, using parameters
Introduction

                       from [Pandurangan et al., 2002]
Collusion and
Pagerank
                       100,000–nodes scale-free graph
Experiments in a
synthetic graph

Experiments in a
                   Sampling by Pagerank
real Web graph

                   Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions


                   Picked a group of 100 nodes at random from each decile
                   Group 1 are low-ranked nodes, group 10 are high-ranked nodes
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                       Created using the generative model [Kumar et al., 2000]
   V. L´pez
       o


                       Power-law distribution with parameter −2.1 for in-degree
Outline

                       and Pagerank, and −2.7 for out-degree, using parameters
Introduction

                       from [Pandurangan et al., 2002]
Collusion and
Pagerank
                       100,000–nodes scale-free graph
Experiments in a
synthetic graph

Experiments in a
                   Sampling by Pagerank
real Web graph

                   Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions


                   Picked a group of 100 nodes at random from each decile
                   Group 1 are low-ranked nodes, group 10 are high-ranked nodes
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                       Created using the generative model [Kumar et al., 2000]
   V. L´pez
       o


                       Power-law distribution with parameter −2.1 for in-degree
Outline

                       and Pagerank, and −2.7 for out-degree, using parameters
Introduction

                       from [Pandurangan et al., 2002]
Collusion and
Pagerank
                       100,000–nodes scale-free graph
Experiments in a
synthetic graph

Experiments in a
                   Sampling by Pagerank
real Web graph

                   Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions


                   Picked a group of 100 nodes at random from each decile
                   Group 1 are low-ranked nodes, group 10 are high-ranked nodes
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                       Created using the generative model [Kumar et al., 2000]
   V. L´pez
       o


                       Power-law distribution with parameter −2.1 for in-degree
Outline

                       and Pagerank, and −2.7 for out-degree, using parameters
Introduction

                       from [Pandurangan et al., 2002]
Collusion and
Pagerank
                       100,000–nodes scale-free graph
Experiments in a
synthetic graph

Experiments in a
                   Sampling by Pagerank
real Web graph

                   Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions


                   Picked a group of 100 nodes at random from each decile
                   Group 1 are low-ranked nodes, group 10 are high-ranked nodes
Original Pagerank of the nodes
    Pagerank
 Increase under
                   These are the original Pagerank values for each group
    Collusion

R. Baeza-Yates,
C. Castillo and
                                          10-2
   V. L´pez
       o


Outline
                                                                                    Originally very good
                                            -3
Introduction
                                          10
Collusion and
                        Pagerank values


Pagerank

Experiments in a
                                          10-4
synthetic graph

Experiments in a
real Web graph


                                          10-5
Conclusions



                                                     Originally very bad
                                                                                     Average
                                          10-6
                                                 1      2     3      4     5    6      7       8   9       10
                                                                           Group
Modified Pagerank of the nodes
    Pagerank
 Increase under
                   These are the modified Pagerank values when colluding.
    Collusion

R. Baeza-Yates,
C. Castillo and
                                          10-2
   V. L´pez
       o


Outline

                                            -3
Introduction
                                          10
Collusion and
                        Pagerank values


Pagerank

Experiments in a
                                          10-4
synthetic graph

Experiments in a
real Web graph


                                          10-5
Conclusions



                                                                          Original
                                                                          Clique
                                            -6
                                          10
                                                 1   2   3   4   5    6       7      8   9   10
                                                                 Group
Distribution of Pagerank
    Pagerank

                   i But Pagerank values follow a power law distribution ...
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                                  100
   V. L´pez
       o
                                                                    x-2.1
Outline
                                  10-1
Introduction

Collusion and
Pagerank
                                  10-2
                      Frequency




Experiments in a
synthetic graph

                                  10-3
Experiments in a
real Web graph

Conclusions

                                  10-4


                                  10-5 -6
                                            10-5        10-4        10-3    10-2
                                     10
                                                   Pagerank value
Modified Pagerank position of the nodes
    Pagerank
                   These are the modified Pagerank positions (rankings) when
 Increase under
    Collusion
                   colluding.
R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                                         1.0
                                         0.9
Outline

Introduction
                                         0.8
Collusion and
                                         0.7
Pagerank
                      Pagerank ranking




                                         0.6
Experiments in a
synthetic graph
                                         0.5
Experiments in a
real Web graph
                                         0.4
Conclusions
                                         0.3
                                         0.2
                                                                        Original
                                         0.1
                                                                        Clique
                                         0.0
                                               1   2   3   4   5    6     7        8   9   10
                                                               Group
Variation of Pagerank when colluding
    Pagerank
 Increase under
                   These are the ratio of xcolluding /xoriginal
    Collusion

R. Baeza-Yates,
C. Castillo and
                                                7
   V. L´pez
       o
                                               1/ε −                    Change in Pagerank value
                                                                              Change in ranking
                                                    6
Outline
                       New value / original value
Introduction
                                                    5
Collusion and
Pagerank
                                                    4
Experiments in a
synthetic graph
                                                    3
Experiments in a
real Web graph

                                                    2
Conclusions


                                                    1

                                                    0
                                                        1   2   3   4     5    6      7      8     9   10
                                                                          Group
It is not necessary to create a clique
    Pagerank
                    Spammers can use a fraction of the links to try to avoid
 Increase under
    Collusion
                    detection
R. Baeza-Yates,
C. Castillo and
                                                           7
                                                                                                   Full clique
   V. L´pez
       o
                                                          1/ε −                                           95%
                                                                                                          90%
                                                                                                          85%
Outline
                                                           6                                              80%
                                                                                                          75%
Introduction
                                                                                                          70%
                       New Pagerank / original Pagerank




                                                                                                          65%
Collusion and                                                                                             60%
                                                           5
                                                                                                          55%
Pagerank
                                                                                                          50%
                                                                                                          45%
Experiments in a
                                                                                                          40%
synthetic graph
                                                           4                                              35%
                                                                                                          30%
Experiments in a                                                                                          25%
real Web graph                                                                                            20%
                                                                                                          15%
                                                           3
                                                                                                          10%
Conclusions
                                                                                                          05%

                                                           2



                                                           1
                                                               1   2   3   4   5           6   7   8         9   10
                                                                                   Group



                    In the paper, other topologies: star and ring
It is not necessary to create a clique
    Pagerank
                    Spammers can use a fraction of the links to try to avoid
 Increase under
    Collusion
                    detection
R. Baeza-Yates,
C. Castillo and
                                                           7
                                                                                                   Full clique
   V. L´pez
       o
                                                          1/ε −                                           95%
                                                                                                          90%
                                                                                                          85%
Outline
                                                           6                                              80%
                                                                                                          75%
Introduction
                                                                                                          70%
                       New Pagerank / original Pagerank




                                                                                                          65%
Collusion and                                                                                             60%
                                                           5
                                                                                                          55%
Pagerank
                                                                                                          50%
                                                                                                          45%
Experiments in a
                                                                                                          40%
synthetic graph
                                                           4                                              35%
                                                                                                          30%
Experiments in a                                                                                          25%
real Web graph                                                                                            20%
                                                                                                          15%
                                                           3
                                                                                                          10%
Conclusions
                                                                                                          05%

                                                           2



                                                           1
                                                               1   2   3   4   5           6   7   8         9   10
                                                                                   Group



                    In the paper, other topologies: star and ring
Experiments in a real Web graph
    Pagerank
 Increase under
                   Hostgraph of 310,486 Websites from Spain
    Collusion

R. Baeza-Yates,
C. Castillo and
                                  100
   V. L´pez
       o
                                                                    x-2.1

                                  10-1
Outline

Introduction

                                    -2
Collusion and
                                  10
Pagerank
                      Frequency




Experiments in a
                                  10-3
synthetic graph

Experiments in a

                                  10-4
real Web graph

Conclusions

                                  10-5

                                  10-6 -6
                                            10-5        10-4        10-3    10-2
                                     10
                                                   Pagerank value
Experiments in a real Web graph
    Pagerank
 Increase under
                   Some of the nodes are already colluding [Fetterly et al., 2004]
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o


Outline

Introduction

Collusion and
Pagerank

Experiments in a
synthetic graph

Experiments in a
real Web graph

Conclusions
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
New rankings under graph modifications
    Pagerank
 Increase under
    Collusion

                                 1
R. Baeza-Yates,
C. Castillo and
                               0.9
   V. L´pez
       o

                               0.8
Outline
                               0.7
Introduction
                    Rankings


                               0.6
Collusion and
Pagerank
                               0.5
Experiments in a
                               0.4
synthetic graph

                               0.3
Experiments in a
real Web graph
                               0.2
Conclusions
                               0.1
                                 0
                                             Normal      Ring      Star
                                     Disconnected Central Inv. Ring     Clique
                                                          Strategy
Adding 5%-50% of complete subgraph
    Pagerank
 Increase under
    Collusion
                                  1.000
R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o


                                  0.995
Outline

Introduction
                       Rankings


Collusion and
Pagerank
                                  0.990
Experiments in a
synthetic graph

Experiments in a
                                  0.985
real Web graph

Conclusions


                                                               Average ranking
                                  0.980
                                          0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55%
                                                Percent of links of a complete subgraph


                   The best sites also increase their ranking
Adding 5%-50% of complete subgraph
    Pagerank
 Increase under
    Collusion
                                  1.000
R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o


                                  0.995
Outline

Introduction
                       Rankings


Collusion and
Pagerank
                                  0.990
Experiments in a
synthetic graph

Experiments in a
                                  0.985
real Web graph

Conclusions


                                                               Average ranking
                                  0.980
                                          0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55%
                                                Percent of links of a complete subgraph


                   The best sites also increase their ranking
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group of nodes can increase its Pagerank
   V. L´pez
       o

                     V Nodes with high Pagerank gain less by colluding
Outline

                   Ideas for link spam detection
Introduction

Collusion and
                     X Only detecting regularities can fail to detect randomized
Pagerank

                       structures
Experiments in a
synthetic graph
                     X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph

Conclusions




                             V Use evidence from multiple sources
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group of nodes can increase its Pagerank
   V. L´pez
       o

                     V Nodes with high Pagerank gain less by colluding
Outline

                   Ideas for link spam detection
Introduction

Collusion and
                     X Only detecting regularities can fail to detect randomized
Pagerank

                       structures
Experiments in a
synthetic graph
                     X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph

Conclusions




                             V Use evidence from multiple sources
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group of nodes can increase its Pagerank
   V. L´pez
       o

                     V Nodes with high Pagerank gain less by colluding
Outline

                   Ideas for link spam detection
Introduction

Collusion and
                     X Only detecting regularities can fail to detect randomized
Pagerank

                       structures
Experiments in a
synthetic graph
                     X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph

Conclusions




                             V Use evidence from multiple sources
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group of nodes can increase its Pagerank
   V. L´pez
       o

                     V Nodes with high Pagerank gain less by colluding
Outline

                   Ideas for link spam detection
Introduction

Collusion and
                     X Only detecting regularities can fail to detect randomized
Pagerank

                       structures
Experiments in a
synthetic graph
                     X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph

Conclusions




                             V Use evidence from multiple sources
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group of nodes can increase its Pagerank
   V. L´pez
       o

                     V Nodes with high Pagerank gain less by colluding
Outline

                   Ideas for link spam detection
Introduction

Collusion and
                     X Only detecting regularities can fail to detect randomized
Pagerank

                       structures
Experiments in a
synthetic graph
                     X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph

Conclusions




                             V Use evidence from multiple sources
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group of nodes can increase its Pagerank
   V. L´pez
       o

                     V Nodes with high Pagerank gain less by colluding
Outline

                   Ideas for link spam detection
Introduction

Collusion and
                     X Only detecting regularities can fail to detect randomized
Pagerank

                       structures
Experiments in a
synthetic graph
                     X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph

Conclusions




                             V Use evidence from multiple sources
Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o


Outline

Introduction

Collusion and
                   Thank you
Pagerank

Experiments in a
synthetic graph

Experiments in a
real Web graph

Conclusions
Pagerank
 Increase under
    Collusion
                   Clausen, A. (2004).
R. Baeza-Yates,
                   The cost of attack of PageRank.
C. Castillo and
                   In Proceedings of the international conference on agents, Web
   V. L´pez
       o
                   technologies and Internet commerce (IAWTIC), Gold Coast, Australia.
Outline
                   Fetterly, D., Manasse, M., and Najork, M. (2004).
Introduction
                   Spam, damn spam, and statistics: Using statistical analysis to locate spam
                   Web pages.
Collusion and
Pagerank
                   In Proceedings of the seventh workshop on the Web and databases
                   (WebDB), Paris, France.
Experiments in a
synthetic graph
                   Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A.,
Experiments in a
                   and Upfal, E. (2000).
real Web graph
                   Stochastic models for the web graph.
Conclusions
                   In Proceedings of the 41st Annual Symposium on Foundations of
                   Computer Science (FOCS), pages 57–65, Redondo Beach, CA, USA. IEEE
                   CS Press.
                   Page, L., Brin, S., Motwani, R., and Winograd, T. (1998).
                   The Pagerank citation algorithm: bringing order to the web.
                   Technical report, Stanford Digital Library Technologies Project.
Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   Pandurangan, G., Raghavan, P., and Upfal, E. (2002).
Outline
                   Using Pagerank to characterize Web structure.
Introduction
                   In Proceedings of the 8th Annual International Computing and
                   Combinatorics Conference (COCOON), volume 2387 of Lecture Notes in
Collusion and
Pagerank           Computer Science, pages 330–390, Singapore. Springer.
Experiments in a
                   Zhang, H., Goel, A., Govindan, R., Mason, K., and Roy, B. V. (2004).
synthetic graph
                   Making eigenvector-based reputation systems robust to collusion.
Experiments in a
                   In Proceedings of the third Workshop on Web Graphs (WAW), volume
real Web graph
                   3243 of Lecture Notes in Computer Science, pages 92–104, Rome, Italy.
Conclusions
                   Springer.

More Related Content

More from Carlos Castillo (ChaTo)

More from Carlos Castillo (ChaTo) (20)

Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Link prediction
Link predictionLink prediction
Link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 
Clustering
ClusteringClustering
Clustering
 
Text similarity and the vector space model
Text similarity and the vector space modelText similarity and the vector space model
Text similarity and the vector space model
 
Keynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationKeynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open Invitation
 

Recently uploaded

Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 

Recently uploaded (20)

Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 

PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

  • 1. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pagerank Increase Outline under Different Collusion Topologies Introduction Collusion and Pagerank Experiments in a Ricardo Baeza-Yates, Carlos Castillo and Vicente L´pez o synthetic graph Experiments in a real Web graph ICREA Professor / Dept. of Technology / C´tedra Telef´nica a o Conclusions Universitat Pompeu Fabra – Barcelona, Spain May 10th, 2005
  • 2. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Introduction 1 V. L´pez o Outline Collusion and Pagerank 2 Introduction Collusion and Pagerank Experiments in a Experiments in a synthetic graph 3 synthetic graph Experiments in a real Web graph Experiments in a real Web graph 4 Conclusions Conclusions 5
  • 3. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 4. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 5. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 6. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 7. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 8. Framework Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We use Pagerank as the ranking function [Page et al., 1998] Outline Pagerank Introduction Let LN×N row-wise normalized link matrix Collusion and Pagerank Let U a matrix such that Ui,j = 1/N Experiments in a Let P = (1 − )L + U synthetic graph Pagerank scores are given by v such that P T v = v Experiments in a real Web graph Pagerank scores are the probabilities of visiting a page using a Conclusions process of random browsing, with a “reset” probability of ≈ 0.15.
  • 9. Framework Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We use Pagerank as the ranking function [Page et al., 1998] Outline Pagerank Introduction Let LN×N row-wise normalized link matrix Collusion and Pagerank Let U a matrix such that Ui,j = 1/N Experiments in a Let P = (1 − )L + U synthetic graph Pagerank scores are given by v such that P T v = v Experiments in a real Web graph Pagerank scores are the probabilities of visiting a page using a Conclusions process of random browsing, with a “reset” probability of ≈ 0.15.
  • 10. Framework Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We use Pagerank as the ranking function [Page et al., 1998] Outline Pagerank Introduction Let LN×N row-wise normalized link matrix Collusion and Pagerank Let U a matrix such that Ui,j = 1/N Experiments in a Let P = (1 − )L + U synthetic graph Pagerank scores are given by v such that P T v = v Experiments in a real Web graph Pagerank scores are the probabilities of visiting a page using a Conclusions process of random browsing, with a “reset” probability of ≈ 0.15.
  • 11. Gain from collusion Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Maximum gain [Zhang et al., 2004]: V. L´pez o New Pagerank 1 Outline ≤ Introduction Old Pagerank Collusion and ≈ 0.15, maximum gain ≈ 7. As Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions First task: improve this bound.
  • 12. Gain from collusion Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Maximum gain [Zhang et al., 2004]: V. L´pez o New Pagerank 1 Outline ≤ Introduction Old Pagerank Collusion and ≈ 0.15, maximum gain ≈ 7. As Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions First task: improve this bound.
  • 13. Impact of collusion in Pagerank Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o The Web: N pages Outline G' Introduction Collusion and G Pagerank N-M pages Experiments in a M pages synthetic graph Experiments in a real Web graph Conclusions
  • 14. Grouping nodes for Pagerank calculation Pagerank Increase under Links for Pagerank, can be “lumped” together Collusion [Clausen, 2004]: R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction Collusion and Pagerank Experiments in a synthetic graph N-M pages Experiments in a M pages Random real Web graph jumps Conclusions
  • 15. Links for Pagerank calculation Pagerank Increase under Collusion Pagerankcolluding = Pjump + Pin + Pself nodes R. Baeza-Yates, C. Castillo and Pin V. L´pez o Outline Introduction Collusion and Pagerank Pjump Experiments in a M nodes, synthetic graph N-M nodes, Pagerank= Pagerank= Experiments in a x 1-x real Web graph Conclusions Pself
  • 16. Pagerank calculation: random jumps Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction There are N nodes in total, M in the colluding set: Collusion and Pagerank Pjump = (M/N) Experiments in a synthetic graph Experiments in a real Web graph Conclusions
  • 17. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 18. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 19. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 20. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 21. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 22. Pagerank calculation: incoming and self links Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pin can be rewritten as: Outline Introduction Pin = (1 − )(1 − x)p Collusion and Pagerank Using the same trick for Pself , we can take s as the weighted Experiments in a average of the fraction of self-links of each page in the synthetic graph colluding set, and write: Experiments in a real Web graph Conclusions Pself = (1 − )xs
  • 23. Pagerank calculation: incoming and self links Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pin can be rewritten as: Outline Introduction Pin = (1 − )(1 − x)p Collusion and Pagerank Using the same trick for Pself , we can take s as the weighted Experiments in a average of the fraction of self-links of each page in the synthetic graph colluding set, and write: Experiments in a real Web graph Conclusions Pself = (1 − )xs
  • 24. Pagerank calculation summary Pagerank Increase under Collusion R. Baeza-Yates, Pin= (1-)(1-x)p C. Castillo and V. L´pez o Outline Introduction Collusion and Pjump= (M/N) Pagerank M nodes, Experiments in a N-M nodes, Pagerank= synthetic graph Pagerank= x 1-x Experiments in a real Web graph Conclusions Pself= (1-)xs
  • 25. Solving Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Solving the stationary state Pin + Pjump + Pself = x yields: Outline M Introduction + (1 − ) p N xnormal = Collusion and (p − s)(1 − ) + 1 Pagerank Experiments in a synthetic graph What happens when colluding ? Experiments in a real Web graph Colluding means pointing more links to the inside Conclusions This means s → s , with s > s, yielding xcolluding
  • 26. Solving Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Solving the stationary state Pin + Pjump + Pself = x yields: Outline M Introduction + (1 − ) p N xnormal = Collusion and (p − s)(1 − ) + 1 Pagerank Experiments in a synthetic graph What happens when colluding ? Experiments in a real Web graph Colluding means pointing more links to the inside Conclusions This means s → s , with s > s, yielding xcolluding
  • 27. Pagerank increase due to collusion Pagerank Increase under Collusion Making s = 1, all links from the colluding set go inside now: R. Baeza-Yates, C. Castillo and V. L´pez o 1−s xcolluding =1+ Outline xnormal p + 1− Introduction Collusion and Making s = p, originally the set was not colluding: Pagerank Experiments in a xcolluding 1 synthetic graph = p(1 − ) + xnormal Experiments in a real Web graph Conclusions Z This is inversely correlated to p, the original weighted fraction of links going to the colluding set
  • 28. Pagerank increase due to collusion Pagerank Increase under Collusion Making s = 1, all links from the colluding set go inside now: R. Baeza-Yates, C. Castillo and V. L´pez o 1−s xcolluding =1+ Outline xnormal p + 1− Introduction Collusion and Making s = p, originally the set was not colluding: Pagerank Experiments in a xcolluding 1 synthetic graph = p(1 − ) + xnormal Experiments in a real Web graph Conclusions Z This is inversely correlated to p, the original weighted fraction of links going to the colluding set
  • 29. Pagerank increase due to collusion Pagerank Increase under Collusion Making s = 1, all links from the colluding set go inside now: R. Baeza-Yates, C. Castillo and V. L´pez o 1−s xcolluding =1+ Outline xnormal p + 1− Introduction Collusion and Making s = p, originally the set was not colluding: Pagerank Experiments in a xcolluding 1 synthetic graph = p(1 − ) + xnormal Experiments in a real Web graph Conclusions Z This is inversely correlated to p, the original weighted fraction of links going to the colluding set
  • 30. Expected Pagerank change Pagerank Increase under xcolluding /xnormal as a function of p Collusion R. Baeza-Yates, C. Castillo and 7 V. L´pez o 1/ε Maximum pagerank change Outline 6 Introduction 5 Collusion and Pagerank Experiments in a 4 synthetic graph Experiments in a 3 real Web graph Conclusions 2 1 10-3 10-2 10-1 100 Weighted average of fraction of links to colluding nodes
  • 31. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 32. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 33. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 34. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 35. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 36. Original Pagerank of the nodes Pagerank Increase under These are the original Pagerank values for each group Collusion R. Baeza-Yates, C. Castillo and 10-2 V. L´pez o Outline Originally very good -3 Introduction 10 Collusion and Pagerank values Pagerank Experiments in a 10-4 synthetic graph Experiments in a real Web graph 10-5 Conclusions Originally very bad Average 10-6 1 2 3 4 5 6 7 8 9 10 Group
  • 37. Modified Pagerank of the nodes Pagerank Increase under These are the modified Pagerank values when colluding. Collusion R. Baeza-Yates, C. Castillo and 10-2 V. L´pez o Outline -3 Introduction 10 Collusion and Pagerank values Pagerank Experiments in a 10-4 synthetic graph Experiments in a real Web graph 10-5 Conclusions Original Clique -6 10 1 2 3 4 5 6 7 8 9 10 Group
  • 38. Distribution of Pagerank Pagerank i But Pagerank values follow a power law distribution ... Increase under Collusion R. Baeza-Yates, C. Castillo and 100 V. L´pez o x-2.1 Outline 10-1 Introduction Collusion and Pagerank 10-2 Frequency Experiments in a synthetic graph 10-3 Experiments in a real Web graph Conclusions 10-4 10-5 -6 10-5 10-4 10-3 10-2 10 Pagerank value
  • 39. Modified Pagerank position of the nodes Pagerank These are the modified Pagerank positions (rankings) when Increase under Collusion colluding. R. Baeza-Yates, C. Castillo and V. L´pez o 1.0 0.9 Outline Introduction 0.8 Collusion and 0.7 Pagerank Pagerank ranking 0.6 Experiments in a synthetic graph 0.5 Experiments in a real Web graph 0.4 Conclusions 0.3 0.2 Original 0.1 Clique 0.0 1 2 3 4 5 6 7 8 9 10 Group
  • 40. Variation of Pagerank when colluding Pagerank Increase under These are the ratio of xcolluding /xoriginal Collusion R. Baeza-Yates, C. Castillo and 7 V. L´pez o 1/ε − Change in Pagerank value Change in ranking 6 Outline New value / original value Introduction 5 Collusion and Pagerank 4 Experiments in a synthetic graph 3 Experiments in a real Web graph 2 Conclusions 1 0 1 2 3 4 5 6 7 8 9 10 Group
  • 41. It is not necessary to create a clique Pagerank Spammers can use a fraction of the links to try to avoid Increase under Collusion detection R. Baeza-Yates, C. Castillo and 7 Full clique V. L´pez o 1/ε − 95% 90% 85% Outline 6 80% 75% Introduction 70% New Pagerank / original Pagerank 65% Collusion and 60% 5 55% Pagerank 50% 45% Experiments in a 40% synthetic graph 4 35% 30% Experiments in a 25% real Web graph 20% 15% 3 10% Conclusions 05% 2 1 1 2 3 4 5 6 7 8 9 10 Group In the paper, other topologies: star and ring
  • 42. It is not necessary to create a clique Pagerank Spammers can use a fraction of the links to try to avoid Increase under Collusion detection R. Baeza-Yates, C. Castillo and 7 Full clique V. L´pez o 1/ε − 95% 90% 85% Outline 6 80% 75% Introduction 70% New Pagerank / original Pagerank 65% Collusion and 60% 5 55% Pagerank 50% 45% Experiments in a 40% synthetic graph 4 35% 30% Experiments in a 25% real Web graph 20% 15% 3 10% Conclusions 05% 2 1 1 2 3 4 5 6 7 8 9 10 Group In the paper, other topologies: star and ring
  • 43. Experiments in a real Web graph Pagerank Increase under Hostgraph of 310,486 Websites from Spain Collusion R. Baeza-Yates, C. Castillo and 100 V. L´pez o x-2.1 10-1 Outline Introduction -2 Collusion and 10 Pagerank Frequency Experiments in a 10-3 synthetic graph Experiments in a 10-4 real Web graph Conclusions 10-5 10-6 -6 10-5 10-4 10-3 10-2 10 Pagerank value
  • 44. Experiments in a real Web graph Pagerank Increase under Some of the nodes are already colluding [Fetterly et al., 2004] Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction Collusion and Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions
  • 45. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 46. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 47. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 48. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 49. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 50. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 51. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 52. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 53. New rankings under graph modifications Pagerank Increase under Collusion 1 R. Baeza-Yates, C. Castillo and 0.9 V. L´pez o 0.8 Outline 0.7 Introduction Rankings 0.6 Collusion and Pagerank 0.5 Experiments in a 0.4 synthetic graph 0.3 Experiments in a real Web graph 0.2 Conclusions 0.1 0 Normal Ring Star Disconnected Central Inv. Ring Clique Strategy
  • 54. Adding 5%-50% of complete subgraph Pagerank Increase under Collusion 1.000 R. Baeza-Yates, C. Castillo and V. L´pez o 0.995 Outline Introduction Rankings Collusion and Pagerank 0.990 Experiments in a synthetic graph Experiments in a 0.985 real Web graph Conclusions Average ranking 0.980 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% Percent of links of a complete subgraph The best sites also increase their ranking
  • 55. Adding 5%-50% of complete subgraph Pagerank Increase under Collusion 1.000 R. Baeza-Yates, C. Castillo and V. L´pez o 0.995 Outline Introduction Rankings Collusion and Pagerank 0.990 Experiments in a synthetic graph Experiments in a 0.985 real Web graph Conclusions Average ranking 0.980 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% Percent of links of a complete subgraph The best sites also increase their ranking
  • 56. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 57. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 58. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 59. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 60. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 61. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 62. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction Collusion and Thank you Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions
  • 63. Pagerank Increase under Collusion Clausen, A. (2004). R. Baeza-Yates, The cost of attack of PageRank. C. Castillo and In Proceedings of the international conference on agents, Web V. L´pez o technologies and Internet commerce (IAWTIC), Gold Coast, Australia. Outline Fetterly, D., Manasse, M., and Najork, M. (2004). Introduction Spam, damn spam, and statistics: Using statistical analysis to locate spam Web pages. Collusion and Pagerank In Proceedings of the seventh workshop on the Web and databases (WebDB), Paris, France. Experiments in a synthetic graph Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., Experiments in a and Upfal, E. (2000). real Web graph Stochastic models for the web graph. Conclusions In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), pages 57–65, Redondo Beach, CA, USA. IEEE CS Press. Page, L., Brin, S., Motwani, R., and Winograd, T. (1998). The Pagerank citation algorithm: bringing order to the web. Technical report, Stanford Digital Library Technologies Project.
  • 64. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pandurangan, G., Raghavan, P., and Upfal, E. (2002). Outline Using Pagerank to characterize Web structure. Introduction In Proceedings of the 8th Annual International Computing and Combinatorics Conference (COCOON), volume 2387 of Lecture Notes in Collusion and Pagerank Computer Science, pages 330–390, Singapore. Springer. Experiments in a Zhang, H., Goel, A., Govindan, R., Mason, K., and Roy, B. V. (2004). synthetic graph Making eigenvector-based reputation systems robust to collusion. Experiments in a In Proceedings of the third Workshop on Web Graphs (WAW), volume real Web graph 3243 of Lecture Notes in Computer Science, pages 92–104, Rome, Italy. Conclusions Springer.