Your SlideShare is downloading. ×
PageRank Increase under Different Collusion Topologies (AIRWEB 2005)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

658
views

Published on

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
658
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pagerank Increase Outline under Different Collusion Topologies Introduction Collusion and Pagerank Experiments in a Ricardo Baeza-Yates, Carlos Castillo and Vicente L´pez o synthetic graph Experiments in a real Web graph ICREA Professor / Dept. of Technology / C´tedra Telef´nica a o Conclusions Universitat Pompeu Fabra – Barcelona, Spain May 10th, 2005
  • 2. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Introduction 1 V. L´pez o Outline Collusion and Pagerank 2 Introduction Collusion and Pagerank Experiments in a Experiments in a synthetic graph 3 synthetic graph Experiments in a real Web graph Experiments in a real Web graph 4 Conclusions Conclusions 5
  • 3. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 4. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 5. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 6. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 7. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 8. Framework Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We use Pagerank as the ranking function [Page et al., 1998] Outline Pagerank Introduction Let LN×N row-wise normalized link matrix Collusion and Pagerank Let U a matrix such that Ui,j = 1/N Experiments in a Let P = (1 − )L + U synthetic graph Pagerank scores are given by v such that P T v = v Experiments in a real Web graph Pagerank scores are the probabilities of visiting a page using a Conclusions process of random browsing, with a “reset” probability of ≈ 0.15.
  • 9. Framework Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We use Pagerank as the ranking function [Page et al., 1998] Outline Pagerank Introduction Let LN×N row-wise normalized link matrix Collusion and Pagerank Let U a matrix such that Ui,j = 1/N Experiments in a Let P = (1 − )L + U synthetic graph Pagerank scores are given by v such that P T v = v Experiments in a real Web graph Pagerank scores are the probabilities of visiting a page using a Conclusions process of random browsing, with a “reset” probability of ≈ 0.15.
  • 10. Framework Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We use Pagerank as the ranking function [Page et al., 1998] Outline Pagerank Introduction Let LN×N row-wise normalized link matrix Collusion and Pagerank Let U a matrix such that Ui,j = 1/N Experiments in a Let P = (1 − )L + U synthetic graph Pagerank scores are given by v such that P T v = v Experiments in a real Web graph Pagerank scores are the probabilities of visiting a page using a Conclusions process of random browsing, with a “reset” probability of ≈ 0.15.
  • 11. Gain from collusion Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Maximum gain [Zhang et al., 2004]: V. L´pez o New Pagerank 1 Outline ≤ Introduction Old Pagerank Collusion and ≈ 0.15, maximum gain ≈ 7. As Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions First task: improve this bound.
  • 12. Gain from collusion Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Maximum gain [Zhang et al., 2004]: V. L´pez o New Pagerank 1 Outline ≤ Introduction Old Pagerank Collusion and ≈ 0.15, maximum gain ≈ 7. As Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions First task: improve this bound.
  • 13. Impact of collusion in Pagerank Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o The Web: N pages Outline G' Introduction Collusion and G Pagerank N-M pages Experiments in a M pages synthetic graph Experiments in a real Web graph Conclusions
  • 14. Grouping nodes for Pagerank calculation Pagerank Increase under Links for Pagerank, can be “lumped” together Collusion [Clausen, 2004]: R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction Collusion and Pagerank Experiments in a synthetic graph N-M pages Experiments in a M pages Random real Web graph jumps Conclusions
  • 15. Links for Pagerank calculation Pagerank Increase under Collusion Pagerankcolluding = Pjump + Pin + Pself nodes R. Baeza-Yates, C. Castillo and Pin V. L´pez o Outline Introduction Collusion and Pagerank Pjump Experiments in a M nodes, synthetic graph N-M nodes, Pagerank= Pagerank= Experiments in a x 1-x real Web graph Conclusions Pself
  • 16. Pagerank calculation: random jumps Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction There are N nodes in total, M in the colluding set: Collusion and Pagerank Pjump = (M/N) Experiments in a synthetic graph Experiments in a real Web graph Conclusions
  • 17. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 18. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 19. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 20. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 21. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 22. Pagerank calculation: incoming and self links Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pin can be rewritten as: Outline Introduction Pin = (1 − )(1 − x)p Collusion and Pagerank Using the same trick for Pself , we can take s as the weighted Experiments in a average of the fraction of self-links of each page in the synthetic graph colluding set, and write: Experiments in a real Web graph Conclusions Pself = (1 − )xs
  • 23. Pagerank calculation: incoming and self links Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pin can be rewritten as: Outline Introduction Pin = (1 − )(1 − x)p Collusion and Pagerank Using the same trick for Pself , we can take s as the weighted Experiments in a average of the fraction of self-links of each page in the synthetic graph colluding set, and write: Experiments in a real Web graph Conclusions Pself = (1 − )xs
  • 24. Pagerank calculation summary Pagerank Increase under Collusion R. Baeza-Yates, Pin= (1-)(1-x)p C. Castillo and V. L´pez o Outline Introduction Collusion and Pjump= (M/N) Pagerank M nodes, Experiments in a N-M nodes, Pagerank= synthetic graph Pagerank= x 1-x Experiments in a real Web graph Conclusions Pself= (1-)xs
  • 25. Solving Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Solving the stationary state Pin + Pjump + Pself = x yields: Outline M Introduction + (1 − ) p N xnormal = Collusion and (p − s)(1 − ) + 1 Pagerank Experiments in a synthetic graph What happens when colluding ? Experiments in a real Web graph Colluding means pointing more links to the inside Conclusions This means s → s , with s > s, yielding xcolluding
  • 26. Solving Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Solving the stationary state Pin + Pjump + Pself = x yields: Outline M Introduction + (1 − ) p N xnormal = Collusion and (p − s)(1 − ) + 1 Pagerank Experiments in a synthetic graph What happens when colluding ? Experiments in a real Web graph Colluding means pointing more links to the inside Conclusions This means s → s , with s > s, yielding xcolluding
  • 27. Pagerank increase due to collusion Pagerank Increase under Collusion Making s = 1, all links from the colluding set go inside now: R. Baeza-Yates, C. Castillo and V. L´pez o 1−s xcolluding =1+ Outline xnormal p + 1− Introduction Collusion and Making s = p, originally the set was not colluding: Pagerank Experiments in a xcolluding 1 synthetic graph = p(1 − ) + xnormal Experiments in a real Web graph Conclusions Z This is inversely correlated to p, the original weighted fraction of links going to the colluding set
  • 28. Pagerank increase due to collusion Pagerank Increase under Collusion Making s = 1, all links from the colluding set go inside now: R. Baeza-Yates, C. Castillo and V. L´pez o 1−s xcolluding =1+ Outline xnormal p + 1− Introduction Collusion and Making s = p, originally the set was not colluding: Pagerank Experiments in a xcolluding 1 synthetic graph = p(1 − ) + xnormal Experiments in a real Web graph Conclusions Z This is inversely correlated to p, the original weighted fraction of links going to the colluding set
  • 29. Pagerank increase due to collusion Pagerank Increase under Collusion Making s = 1, all links from the colluding set go inside now: R. Baeza-Yates, C. Castillo and V. L´pez o 1−s xcolluding =1+ Outline xnormal p + 1− Introduction Collusion and Making s = p, originally the set was not colluding: Pagerank Experiments in a xcolluding 1 synthetic graph = p(1 − ) + xnormal Experiments in a real Web graph Conclusions Z This is inversely correlated to p, the original weighted fraction of links going to the colluding set
  • 30. Expected Pagerank change Pagerank Increase under xcolluding /xnormal as a function of p Collusion R. Baeza-Yates, C. Castillo and 7 V. L´pez o 1/ε Maximum pagerank change Outline 6 Introduction 5 Collusion and Pagerank Experiments in a 4 synthetic graph Experiments in a 3 real Web graph Conclusions 2 1 10-3 10-2 10-1 100 Weighted average of fraction of links to colluding nodes
  • 31. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 32. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 33. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 34. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 35. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 36. Original Pagerank of the nodes Pagerank Increase under These are the original Pagerank values for each group Collusion R. Baeza-Yates, C. Castillo and 10-2 V. L´pez o Outline Originally very good -3 Introduction 10 Collusion and Pagerank values Pagerank Experiments in a 10-4 synthetic graph Experiments in a real Web graph 10-5 Conclusions Originally very bad Average 10-6 1 2 3 4 5 6 7 8 9 10 Group
  • 37. Modified Pagerank of the nodes Pagerank Increase under These are the modified Pagerank values when colluding. Collusion R. Baeza-Yates, C. Castillo and 10-2 V. L´pez o Outline -3 Introduction 10 Collusion and Pagerank values Pagerank Experiments in a 10-4 synthetic graph Experiments in a real Web graph 10-5 Conclusions Original Clique -6 10 1 2 3 4 5 6 7 8 9 10 Group
  • 38. Distribution of Pagerank Pagerank i But Pagerank values follow a power law distribution ... Increase under Collusion R. Baeza-Yates, C. Castillo and 100 V. L´pez o x-2.1 Outline 10-1 Introduction Collusion and Pagerank 10-2 Frequency Experiments in a synthetic graph 10-3 Experiments in a real Web graph Conclusions 10-4 10-5 -6 10-5 10-4 10-3 10-2 10 Pagerank value
  • 39. Modified Pagerank position of the nodes Pagerank These are the modified Pagerank positions (rankings) when Increase under Collusion colluding. R. Baeza-Yates, C. Castillo and V. L´pez o 1.0 0.9 Outline Introduction 0.8 Collusion and 0.7 Pagerank Pagerank ranking 0.6 Experiments in a synthetic graph 0.5 Experiments in a real Web graph 0.4 Conclusions 0.3 0.2 Original 0.1 Clique 0.0 1 2 3 4 5 6 7 8 9 10 Group
  • 40. Variation of Pagerank when colluding Pagerank Increase under These are the ratio of xcolluding /xoriginal Collusion R. Baeza-Yates, C. Castillo and 7 V. L´pez o 1/ε − Change in Pagerank value Change in ranking 6 Outline New value / original value Introduction 5 Collusion and Pagerank 4 Experiments in a synthetic graph 3 Experiments in a real Web graph 2 Conclusions 1 0 1 2 3 4 5 6 7 8 9 10 Group
  • 41. It is not necessary to create a clique Pagerank Spammers can use a fraction of the links to try to avoid Increase under Collusion detection R. Baeza-Yates, C. Castillo and 7 Full clique V. L´pez o 1/ε − 95% 90% 85% Outline 6 80% 75% Introduction 70% New Pagerank / original Pagerank 65% Collusion and 60% 5 55% Pagerank 50% 45% Experiments in a 40% synthetic graph 4 35% 30% Experiments in a 25% real Web graph 20% 15% 3 10% Conclusions 05% 2 1 1 2 3 4 5 6 7 8 9 10 Group In the paper, other topologies: star and ring
  • 42. It is not necessary to create a clique Pagerank Spammers can use a fraction of the links to try to avoid Increase under Collusion detection R. Baeza-Yates, C. Castillo and 7 Full clique V. L´pez o 1/ε − 95% 90% 85% Outline 6 80% 75% Introduction 70% New Pagerank / original Pagerank 65% Collusion and 60% 5 55% Pagerank 50% 45% Experiments in a 40% synthetic graph 4 35% 30% Experiments in a 25% real Web graph 20% 15% 3 10% Conclusions 05% 2 1 1 2 3 4 5 6 7 8 9 10 Group In the paper, other topologies: star and ring
  • 43. Experiments in a real Web graph Pagerank Increase under Hostgraph of 310,486 Websites from Spain Collusion R. Baeza-Yates, C. Castillo and 100 V. L´pez o x-2.1 10-1 Outline Introduction -2 Collusion and 10 Pagerank Frequency Experiments in a 10-3 synthetic graph Experiments in a 10-4 real Web graph Conclusions 10-5 10-6 -6 10-5 10-4 10-3 10-2 10 Pagerank value
  • 44. Experiments in a real Web graph Pagerank Increase under Some of the nodes are already colluding [Fetterly et al., 2004] Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction Collusion and Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions
  • 45. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 46. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 47. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 48. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 49. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 50. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 51. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 52. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 53. New rankings under graph modifications Pagerank Increase under Collusion 1 R. Baeza-Yates, C. Castillo and 0.9 V. L´pez o 0.8 Outline 0.7 Introduction Rankings 0.6 Collusion and Pagerank 0.5 Experiments in a 0.4 synthetic graph 0.3 Experiments in a real Web graph 0.2 Conclusions 0.1 0 Normal Ring Star Disconnected Central Inv. Ring Clique Strategy
  • 54. Adding 5%-50% of complete subgraph Pagerank Increase under Collusion 1.000 R. Baeza-Yates, C. Castillo and V. L´pez o 0.995 Outline Introduction Rankings Collusion and Pagerank 0.990 Experiments in a synthetic graph Experiments in a 0.985 real Web graph Conclusions Average ranking 0.980 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% Percent of links of a complete subgraph The best sites also increase their ranking
  • 55. Adding 5%-50% of complete subgraph Pagerank Increase under Collusion 1.000 R. Baeza-Yates, C. Castillo and V. L´pez o 0.995 Outline Introduction Rankings Collusion and Pagerank 0.990 Experiments in a synthetic graph Experiments in a 0.985 real Web graph Conclusions Average ranking 0.980 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% Percent of links of a complete subgraph The best sites also increase their ranking
  • 56. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 57. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 58. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 59. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 60. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 61. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 62. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction Collusion and Thank you Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions
  • 63. Pagerank Increase under Collusion Clausen, A. (2004). R. Baeza-Yates, The cost of attack of PageRank. C. Castillo and In Proceedings of the international conference on agents, Web V. L´pez o technologies and Internet commerce (IAWTIC), Gold Coast, Australia. Outline Fetterly, D., Manasse, M., and Najork, M. (2004). Introduction Spam, damn spam, and statistics: Using statistical analysis to locate spam Web pages. Collusion and Pagerank In Proceedings of the seventh workshop on the Web and databases (WebDB), Paris, France. Experiments in a synthetic graph Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., Experiments in a and Upfal, E. (2000). real Web graph Stochastic models for the web graph. Conclusions In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), pages 57–65, Redondo Beach, CA, USA. IEEE CS Press. Page, L., Brin, S., Motwani, R., and Winograd, T. (1998). The Pagerank citation algorithm: bringing order to the web. Technical report, Stanford Digital Library Technologies Project.
  • 64. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pandurangan, G., Raghavan, P., and Upfal, E. (2002). Outline Using Pagerank to characterize Web structure. Introduction In Proceedings of the 8th Annual International Computing and Combinatorics Conference (COCOON), volume 2387 of Lecture Notes in Collusion and Pagerank Computer Science, pages 330–390, Singapore. Springer. Experiments in a Zhang, H., Goel, A., Govindan, R., Mason, K., and Roy, B. V. (2004). synthetic graph Making eigenvector-based reputation systems robust to collusion. Experiments in a In Proceedings of the third Workshop on Web Graphs (WAW), volume real Web graph 3243 of Lecture Notes in Computer Science, pages 92–104, Rome, Italy. Conclusions Springer.