Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                            ...
Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                       Introduction
              ...
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   ...
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   ...
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   ...
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   ...
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   ...
Framework
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
               ...
Framework
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
               ...
Framework
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
               ...
Gain from collusion
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                   Maximum...
Gain from collusion
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                   Maximum...
Impact of collusion in Pagerank
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
  ...
Grouping nodes for Pagerank calculation
    Pagerank
 Increase under
                   Links for Pagerank, can be “lumped...
Links for Pagerank calculation
    Pagerank
 Increase under
    Collusion
                   Pagerankcolluding            ...
Pagerank calculation: random jumps
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez...
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                          ...
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                          ...
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                          ...
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                          ...
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                          ...
Pagerank calculation: incoming and self links
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
...
Pagerank calculation: incoming and self links
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
...
Pagerank calculation summary
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
                                 ...
Solving
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                 ...
Solving
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                 ...
Pagerank increase due to collusion
    Pagerank
 Increase under
    Collusion

                   Making s = 1, all links ...
Pagerank increase due to collusion
    Pagerank
 Increase under
    Collusion

                   Making s = 1, all links ...
Pagerank increase due to collusion
    Pagerank
 Increase under
    Collusion

                   Making s = 1, all links ...
Expected Pagerank change
    Pagerank
 Increase under
                   xcolluding /xnormal as a function of p
    Collus...
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
             ...
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
             ...
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
             ...
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
             ...
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
             ...
Original Pagerank of the nodes
    Pagerank
 Increase under
                   These are the original Pagerank values for ...
Modified Pagerank of the nodes
    Pagerank
 Increase under
                   These are the modified Pagerank values when c...
Distribution of Pagerank
    Pagerank

                   i But Pagerank values follow a power law distribution ...
 Incre...
Modified Pagerank position of the nodes
    Pagerank
                   These are the modified Pagerank positions (rankings)...
Variation of Pagerank when colluding
    Pagerank
 Increase under
                   These are the ratio of xcolluding /xo...
It is not necessary to create a clique
    Pagerank
                    Spammers can use a fraction of the links to try to...
It is not necessary to create a clique
    Pagerank
                    Spammers can use a fraction of the links to try to...
Experiments in a real Web graph
    Pagerank
 Increase under
                   Hostgraph of 310,486 Websites from Spain
 ...
Experiments in a real Web graph
    Pagerank
 Increase under
                   Some of the nodes are already colluding [F...
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
      ...
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
      ...
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
      ...
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
      ...
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
      ...
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
      ...
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
      ...
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
      ...
New rankings under graph modifications
    Pagerank
 Increase under
    Collusion

                                 1
R. Ba...
Adding 5%-50% of complete subgraph
    Pagerank
 Increase under
    Collusion
                                  1.000
R. B...
Adding 5%-50% of complete subgraph
    Pagerank
 Increase under
    Collusion
                                  1.000
R. B...
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group o...
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group o...
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group o...
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group o...
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group o...
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group o...
Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o


Outline

Introduction

Coll...
Pagerank
 Increase under
    Collusion
                   Clausen, A. (2004).
R. Baeza-Yates,
                   The cost ...
Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   Pandurang...
Upcoming SlideShare
Loading in...5
×

PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

699

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
699
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

  1. 1. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pagerank Increase Outline under Different Collusion Topologies Introduction Collusion and Pagerank Experiments in a Ricardo Baeza-Yates, Carlos Castillo and Vicente L´pez o synthetic graph Experiments in a real Web graph ICREA Professor / Dept. of Technology / C´tedra Telef´nica a o Conclusions Universitat Pompeu Fabra – Barcelona, Spain May 10th, 2005
  2. 2. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Introduction 1 V. L´pez o Outline Collusion and Pagerank 2 Introduction Collusion and Pagerank Experiments in a Experiments in a synthetic graph 3 synthetic graph Experiments in a real Web graph Experiments in a real Web graph 4 Conclusions Conclusions 5
  3. 3. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  4. 4. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  5. 5. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  6. 6. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  7. 7. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  8. 8. Framework Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We use Pagerank as the ranking function [Page et al., 1998] Outline Pagerank Introduction Let LN×N row-wise normalized link matrix Collusion and Pagerank Let U a matrix such that Ui,j = 1/N Experiments in a Let P = (1 − )L + U synthetic graph Pagerank scores are given by v such that P T v = v Experiments in a real Web graph Pagerank scores are the probabilities of visiting a page using a Conclusions process of random browsing, with a “reset” probability of ≈ 0.15.
  9. 9. Framework Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We use Pagerank as the ranking function [Page et al., 1998] Outline Pagerank Introduction Let LN×N row-wise normalized link matrix Collusion and Pagerank Let U a matrix such that Ui,j = 1/N Experiments in a Let P = (1 − )L + U synthetic graph Pagerank scores are given by v such that P T v = v Experiments in a real Web graph Pagerank scores are the probabilities of visiting a page using a Conclusions process of random browsing, with a “reset” probability of ≈ 0.15.
  10. 10. Framework Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We use Pagerank as the ranking function [Page et al., 1998] Outline Pagerank Introduction Let LN×N row-wise normalized link matrix Collusion and Pagerank Let U a matrix such that Ui,j = 1/N Experiments in a Let P = (1 − )L + U synthetic graph Pagerank scores are given by v such that P T v = v Experiments in a real Web graph Pagerank scores are the probabilities of visiting a page using a Conclusions process of random browsing, with a “reset” probability of ≈ 0.15.
  11. 11. Gain from collusion Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Maximum gain [Zhang et al., 2004]: V. L´pez o New Pagerank 1 Outline ≤ Introduction Old Pagerank Collusion and ≈ 0.15, maximum gain ≈ 7. As Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions First task: improve this bound.
  12. 12. Gain from collusion Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Maximum gain [Zhang et al., 2004]: V. L´pez o New Pagerank 1 Outline ≤ Introduction Old Pagerank Collusion and ≈ 0.15, maximum gain ≈ 7. As Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions First task: improve this bound.
  13. 13. Impact of collusion in Pagerank Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o The Web: N pages Outline G' Introduction Collusion and G Pagerank N-M pages Experiments in a M pages synthetic graph Experiments in a real Web graph Conclusions
  14. 14. Grouping nodes for Pagerank calculation Pagerank Increase under Links for Pagerank, can be “lumped” together Collusion [Clausen, 2004]: R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction Collusion and Pagerank Experiments in a synthetic graph N-M pages Experiments in a M pages Random real Web graph jumps Conclusions
  15. 15. Links for Pagerank calculation Pagerank Increase under Collusion Pagerankcolluding = Pjump + Pin + Pself nodes R. Baeza-Yates, C. Castillo and Pin V. L´pez o Outline Introduction Collusion and Pagerank Pjump Experiments in a M nodes, synthetic graph N-M nodes, Pagerank= Pagerank= Experiments in a x 1-x real Web graph Conclusions Pself
  16. 16. Pagerank calculation: random jumps Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction There are N nodes in total, M in the colluding set: Collusion and Pagerank Pjump = (M/N) Experiments in a synthetic graph Experiments in a real Web graph Conclusions
  17. 17. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  18. 18. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  19. 19. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  20. 20. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  21. 21. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  22. 22. Pagerank calculation: incoming and self links Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pin can be rewritten as: Outline Introduction Pin = (1 − )(1 − x)p Collusion and Pagerank Using the same trick for Pself , we can take s as the weighted Experiments in a average of the fraction of self-links of each page in the synthetic graph colluding set, and write: Experiments in a real Web graph Conclusions Pself = (1 − )xs
  23. 23. Pagerank calculation: incoming and self links Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pin can be rewritten as: Outline Introduction Pin = (1 − )(1 − x)p Collusion and Pagerank Using the same trick for Pself , we can take s as the weighted Experiments in a average of the fraction of self-links of each page in the synthetic graph colluding set, and write: Experiments in a real Web graph Conclusions Pself = (1 − )xs
  24. 24. Pagerank calculation summary Pagerank Increase under Collusion R. Baeza-Yates, Pin= (1-)(1-x)p C. Castillo and V. L´pez o Outline Introduction Collusion and Pjump= (M/N) Pagerank M nodes, Experiments in a N-M nodes, Pagerank= synthetic graph Pagerank= x 1-x Experiments in a real Web graph Conclusions Pself= (1-)xs
  25. 25. Solving Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Solving the stationary state Pin + Pjump + Pself = x yields: Outline M Introduction + (1 − ) p N xnormal = Collusion and (p − s)(1 − ) + 1 Pagerank Experiments in a synthetic graph What happens when colluding ? Experiments in a real Web graph Colluding means pointing more links to the inside Conclusions This means s → s , with s > s, yielding xcolluding
  26. 26. Solving Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Solving the stationary state Pin + Pjump + Pself = x yields: Outline M Introduction + (1 − ) p N xnormal = Collusion and (p − s)(1 − ) + 1 Pagerank Experiments in a synthetic graph What happens when colluding ? Experiments in a real Web graph Colluding means pointing more links to the inside Conclusions This means s → s , with s > s, yielding xcolluding
  27. 27. Pagerank increase due to collusion Pagerank Increase under Collusion Making s = 1, all links from the colluding set go inside now: R. Baeza-Yates, C. Castillo and V. L´pez o 1−s xcolluding =1+ Outline xnormal p + 1− Introduction Collusion and Making s = p, originally the set was not colluding: Pagerank Experiments in a xcolluding 1 synthetic graph = p(1 − ) + xnormal Experiments in a real Web graph Conclusions Z This is inversely correlated to p, the original weighted fraction of links going to the colluding set
  28. 28. Pagerank increase due to collusion Pagerank Increase under Collusion Making s = 1, all links from the colluding set go inside now: R. Baeza-Yates, C. Castillo and V. L´pez o 1−s xcolluding =1+ Outline xnormal p + 1− Introduction Collusion and Making s = p, originally the set was not colluding: Pagerank Experiments in a xcolluding 1 synthetic graph = p(1 − ) + xnormal Experiments in a real Web graph Conclusions Z This is inversely correlated to p, the original weighted fraction of links going to the colluding set
  29. 29. Pagerank increase due to collusion Pagerank Increase under Collusion Making s = 1, all links from the colluding set go inside now: R. Baeza-Yates, C. Castillo and V. L´pez o 1−s xcolluding =1+ Outline xnormal p + 1− Introduction Collusion and Making s = p, originally the set was not colluding: Pagerank Experiments in a xcolluding 1 synthetic graph = p(1 − ) + xnormal Experiments in a real Web graph Conclusions Z This is inversely correlated to p, the original weighted fraction of links going to the colluding set
  30. 30. Expected Pagerank change Pagerank Increase under xcolluding /xnormal as a function of p Collusion R. Baeza-Yates, C. Castillo and 7 V. L´pez o 1/ε Maximum pagerank change Outline 6 Introduction 5 Collusion and Pagerank Experiments in a 4 synthetic graph Experiments in a 3 real Web graph Conclusions 2 1 10-3 10-2 10-1 100 Weighted average of fraction of links to colluding nodes
  31. 31. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  32. 32. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  33. 33. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  34. 34. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  35. 35. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  36. 36. Original Pagerank of the nodes Pagerank Increase under These are the original Pagerank values for each group Collusion R. Baeza-Yates, C. Castillo and 10-2 V. L´pez o Outline Originally very good -3 Introduction 10 Collusion and Pagerank values Pagerank Experiments in a 10-4 synthetic graph Experiments in a real Web graph 10-5 Conclusions Originally very bad Average 10-6 1 2 3 4 5 6 7 8 9 10 Group
  37. 37. Modified Pagerank of the nodes Pagerank Increase under These are the modified Pagerank values when colluding. Collusion R. Baeza-Yates, C. Castillo and 10-2 V. L´pez o Outline -3 Introduction 10 Collusion and Pagerank values Pagerank Experiments in a 10-4 synthetic graph Experiments in a real Web graph 10-5 Conclusions Original Clique -6 10 1 2 3 4 5 6 7 8 9 10 Group
  38. 38. Distribution of Pagerank Pagerank i But Pagerank values follow a power law distribution ... Increase under Collusion R. Baeza-Yates, C. Castillo and 100 V. L´pez o x-2.1 Outline 10-1 Introduction Collusion and Pagerank 10-2 Frequency Experiments in a synthetic graph 10-3 Experiments in a real Web graph Conclusions 10-4 10-5 -6 10-5 10-4 10-3 10-2 10 Pagerank value
  39. 39. Modified Pagerank position of the nodes Pagerank These are the modified Pagerank positions (rankings) when Increase under Collusion colluding. R. Baeza-Yates, C. Castillo and V. L´pez o 1.0 0.9 Outline Introduction 0.8 Collusion and 0.7 Pagerank Pagerank ranking 0.6 Experiments in a synthetic graph 0.5 Experiments in a real Web graph 0.4 Conclusions 0.3 0.2 Original 0.1 Clique 0.0 1 2 3 4 5 6 7 8 9 10 Group
  40. 40. Variation of Pagerank when colluding Pagerank Increase under These are the ratio of xcolluding /xoriginal Collusion R. Baeza-Yates, C. Castillo and 7 V. L´pez o 1/ε − Change in Pagerank value Change in ranking 6 Outline New value / original value Introduction 5 Collusion and Pagerank 4 Experiments in a synthetic graph 3 Experiments in a real Web graph 2 Conclusions 1 0 1 2 3 4 5 6 7 8 9 10 Group
  41. 41. It is not necessary to create a clique Pagerank Spammers can use a fraction of the links to try to avoid Increase under Collusion detection R. Baeza-Yates, C. Castillo and 7 Full clique V. L´pez o 1/ε − 95% 90% 85% Outline 6 80% 75% Introduction 70% New Pagerank / original Pagerank 65% Collusion and 60% 5 55% Pagerank 50% 45% Experiments in a 40% synthetic graph 4 35% 30% Experiments in a 25% real Web graph 20% 15% 3 10% Conclusions 05% 2 1 1 2 3 4 5 6 7 8 9 10 Group In the paper, other topologies: star and ring
  42. 42. It is not necessary to create a clique Pagerank Spammers can use a fraction of the links to try to avoid Increase under Collusion detection R. Baeza-Yates, C. Castillo and 7 Full clique V. L´pez o 1/ε − 95% 90% 85% Outline 6 80% 75% Introduction 70% New Pagerank / original Pagerank 65% Collusion and 60% 5 55% Pagerank 50% 45% Experiments in a 40% synthetic graph 4 35% 30% Experiments in a 25% real Web graph 20% 15% 3 10% Conclusions 05% 2 1 1 2 3 4 5 6 7 8 9 10 Group In the paper, other topologies: star and ring
  43. 43. Experiments in a real Web graph Pagerank Increase under Hostgraph of 310,486 Websites from Spain Collusion R. Baeza-Yates, C. Castillo and 100 V. L´pez o x-2.1 10-1 Outline Introduction -2 Collusion and 10 Pagerank Frequency Experiments in a 10-3 synthetic graph Experiments in a 10-4 real Web graph Conclusions 10-5 10-6 -6 10-5 10-4 10-3 10-2 10 Pagerank value
  44. 44. Experiments in a real Web graph Pagerank Increase under Some of the nodes are already colluding [Fetterly et al., 2004] Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction Collusion and Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions
  45. 45. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  46. 46. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  47. 47. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  48. 48. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  49. 49. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  50. 50. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  51. 51. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  52. 52. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  53. 53. New rankings under graph modifications Pagerank Increase under Collusion 1 R. Baeza-Yates, C. Castillo and 0.9 V. L´pez o 0.8 Outline 0.7 Introduction Rankings 0.6 Collusion and Pagerank 0.5 Experiments in a 0.4 synthetic graph 0.3 Experiments in a real Web graph 0.2 Conclusions 0.1 0 Normal Ring Star Disconnected Central Inv. Ring Clique Strategy
  54. 54. Adding 5%-50% of complete subgraph Pagerank Increase under Collusion 1.000 R. Baeza-Yates, C. Castillo and V. L´pez o 0.995 Outline Introduction Rankings Collusion and Pagerank 0.990 Experiments in a synthetic graph Experiments in a 0.985 real Web graph Conclusions Average ranking 0.980 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% Percent of links of a complete subgraph The best sites also increase their ranking
  55. 55. Adding 5%-50% of complete subgraph Pagerank Increase under Collusion 1.000 R. Baeza-Yates, C. Castillo and V. L´pez o 0.995 Outline Introduction Rankings Collusion and Pagerank 0.990 Experiments in a synthetic graph Experiments in a 0.985 real Web graph Conclusions Average ranking 0.980 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% Percent of links of a complete subgraph The best sites also increase their ranking
  56. 56. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  57. 57. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  58. 58. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  59. 59. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  60. 60. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  61. 61. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  62. 62. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction Collusion and Thank you Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions
  63. 63. Pagerank Increase under Collusion Clausen, A. (2004). R. Baeza-Yates, The cost of attack of PageRank. C. Castillo and In Proceedings of the international conference on agents, Web V. L´pez o technologies and Internet commerce (IAWTIC), Gold Coast, Australia. Outline Fetterly, D., Manasse, M., and Najork, M. (2004). Introduction Spam, damn spam, and statistics: Using statistical analysis to locate spam Web pages. Collusion and Pagerank In Proceedings of the seventh workshop on the Web and databases (WebDB), Paris, France. Experiments in a synthetic graph Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., Experiments in a and Upfal, E. (2000). real Web graph Stochastic models for the web graph. Conclusions In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), pages 57–65, Redondo Beach, CA, USA. IEEE CS Press. Page, L., Brin, S., Motwani, R., and Winograd, T. (1998). The Pagerank citation algorithm: bringing order to the web. Technical report, Stanford Digital Library Technologies Project.
  64. 64. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pandurangan, G., Raghavan, P., and Upfal, E. (2002). Outline Using Pagerank to characterize Web structure. Introduction In Proceedings of the 8th Annual International Computing and Combinatorics Conference (COCOON), volume 2387 of Lecture Notes in Collusion and Pagerank Computer Science, pages 330–390, Singapore. Springer. Experiments in a Zhang, H., Goel, A., Govindan, R., Mason, K., and Roy, B. V. (2004). synthetic graph Making eigenvector-based reputation systems robust to collusion. Experiments in a In Proceedings of the third Workshop on Web Graphs (WAW), volume real Web graph 3243 of Lecture Notes in Computer Science, pages 92–104, Rome, Italy. Conclusions Springer.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×