Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Centralities_PaoloBoldi

1,603 views

Published on

Видео доклада на научном семинаре 23 сентября

Published in: Technology, Design
  • Be the first to comment

  • Be the first to like this

Centralities_PaoloBoldi

  1. 1. A MODERN VIEW OF CENTRALITY MEASURES Paolo Boldi Univ. degli Studi di Milano Work in progress with Sebastiano Vigna
  2. 2. What do these people have in common?
  3. 3. What do these people have in common?
  4. 4. What do these people have in common?
  5. 5. What do these people have in common?
  6. 6. What do these people have in common?
  7. 7. PageRank (believe it or not)
  8. 8. PageRank (believe it or not) These are the top-8 actors of the Hollywood graph according to PageRank
  9. 9. PageRank (believe it or not) These are the top-8 actors of the Hollywood graph according to PageRank Who is going to tell that is better?
  10. 10. PageRank (believe it or not) These are the top-8 actors of the Hollywood graph according to PageRank Who is going to tell that is better?
  11. 11. PageRank (believe it or not) These are the top-8 actors of the Hollywood graph according to PageRank Who is going to tell that is better?
  12. 12. PageRank sucks! (or NOT?)
  13. 13. PageRank sucks! (or NOT?) The Hollywood graph we used contains 2,000,000 nodes. Most of them are completely unknown!
  14. 14. PageRank sucks! (or NOT?) The Hollywood graph we used contains 2,000,000 nodes. Most of them are completely unknown! PageRank is not singling out the best actors...
  15. 15. PageRank sucks! (or NOT?) The Hollywood graph we used contains 2,000,000 nodes. Most of them are completely unknown! PageRank is not singling out the best actors... ...but still it is not pointing to random individuals, is it?
  16. 16. The glass is half empty
  17. 17. The glass is half empty Link Analysis sucks
  18. 18. The glass is half empty Link Analysis sucks The graph itself does not contain useful information, in general
  19. 19. The glass is half empty Link Analysis sucks The graph itself does not contain useful information, in general Of little (or no) help for IR
  20. 20. The glass is half full
  21. 21. The glass is half full Link Analysis is good, but you cannot expect any centrality index to work on any network
  22. 22. The glass is half full Link Analysis is good, but you cannot expect any centrality index to work on any network Probably PageRank is scarcely useful on Hollywood, but maybe other measures would work like a charm
  23. 23. The glass is half full Link Analysis is good, but you cannot expect any centrality index to work on any network Probably PageRank is scarcely useful on Hollywood, but maybe other measures would work like a charm Betweenness? Closeness? Katz? ...
  24. 24. The glass is half full Link Analysis is good, but you cannot expect any centrality index to work on any network Probably PageRank is scarcely useful on Hollywood, but maybe other measures would work like a charm Betweenness? Closeness? Katz? ... A problem here: some indices are computationa!y unfeasible on large networks!
  25. 25. The grand plan
  26. 26. The grand plan Centrality indices / link analysis has been with us for 60+ years
  27. 27. The grand plan Centrality indices / link analysis has been with us for 60+ years IR, sociology, psychology all have given their contributions...
  28. 28. The grand plan Centrality indices / link analysis has been with us for 60+ years IR, sociology, psychology all have given their contributions... ...turning it into a jungle
  29. 29. My point, today
  30. 30. My point, today By contract, I have to convince you that networks contain a big deal of information useful for IR
  31. 31. My point, today By contract, I have to convince you that networks contain a big deal of information useful for IR But you have to use them in a proper way (i.e., to compute suitable indices)
  32. 32. My point, today By contract, I have to convince you that networks contain a big deal of information useful for IR But you have to use them in a proper way (i.e., to compute suitable indices) And more often than not this calls for new algorithms because of their size (and sometimes density)
  33. 33. Centrality in social sciences a historical account
  34. 34. Centrality in social sciences a historical account First works by Bavelas at MIT (1948)
  35. 35. Centrality in social sciences a historical account First works by Bavelas at MIT (1948) This sparked countless works (Bavelas 1951; Katz 1953; Shaw 1954; Beauchamp 1965; Mackenzie 1966; Burgess 1969; Anthonisse 1971; Czapiel 1974...) that Freeman (1979) tried to summarize concluding that:
  36. 36. Centrality in social sciences a historical account First works by Bavelas at MIT (1948) This sparked countless works (Bavelas 1951; Katz 1953; Shaw 1954; Beauchamp 1965; Mackenzie 1966; Burgess 1969; Anthonisse 1971; Czapiel 1974...) that Freeman (1979) tried to summarize concluding that: several measures are o"en only vaguely related to the intuitive ideas they purport to index, and many are so complex that it is difficult or impossible to discover what, if anything, they are measuring
  37. 37. The 1990s revival Link Analysis Ranking
  38. 38. The 1990s revival Link Analysis Ranking With the advent of search engines, there was a strong revamp of centrality (LAR in this context)
  39. 39. The 1990s revival Link Analysis Ranking With the advent of search engines, there was a strong revamp of centrality (LAR in this context) New scenarios
  40. 40. The 1990s revival Link Analysis Ranking With the advent of search engines, there was a strong revamp of centrality (LAR in this context) New scenarios • graphs are directed mainly
  41. 41. The 1990s revival Link Analysis Ranking With the advent of search engines, there was a strong revamp of centrality (LAR in this context) New scenarios • graphs are directed mainly • they are huge
  42. 42. The 1990s revival Link Analysis Ranking With the advent of search engines, there was a strong revamp of centrality (LAR in this context) New scenarios • graphs are directed mainly • they are huge • new attention to efficiency
  43. 43. The 1990s revival Link Analysis Ranking With the advent of search engines, there was a strong revamp of centrality (LAR in this context) New scenarios • graphs are directed mainly • they are huge • new attention to efficiency LAR is the ungrateful reincarnation of Centrality
  44. 44. What a mess!
  45. 45. What a mess! Only few, brave guys tried to make some order in this mess!
  46. 46. What a mess! Only few, brave guys tried to make some order in this mess! Noteworthy (in the IR context): Craswell, Upstill, Hawking (ADCS 2003); Najork, Zaragoza, Taylor (SIGIR 2007); Najork, Gollapudi, Panigrahy (WSDM 2009)
  47. 47. What a mess! Only few, brave guys tried to make some order in this mess! Noteworthy (in the IR context): Craswell, Upstill, Hawking (ADCS 2003); Najork, Zaragoza, Taylor (SIGIR 2007); Najork, Gollapudi, Panigrahy (WSDM 2009) Guess that the others were scared by computational burden of classical measures
  48. 48. Begin the begin
  49. 49. Begin the begin The only point on which everybody seems to agree: the center of a star is more important than the other nodes
  50. 50. Begin the begin The only point on which everybody seems to agree: the center of a star is more important than the other nodes But what does it make more important?
  51. 51. Begin the begin
  52. 52. Begin the begin I have the largest degree
  53. 53. Begin the begin
  54. 54. Begin the begin I am on most shortest paths
  55. 55. Begin the begin
  56. 56. Begin the begin I am maximally close to everybody
  57. 57. Begin the begin
  58. 58. A tale of three tribes
  59. 59. A tale of three tribes Indices based on degree
  60. 60. A tale of three tribes Indices based on degree Indices based on the number of paths or shortest paths (geodesics) passing through a vertex;
  61. 61. A tale of three tribes Indices based on degree Indices based on the number of paths or shortest paths (geodesics) passing through a vertex; Indices based on distances from the vertex to the other vertices
  62. 62. A tale of three tribes Indices based on degree Indices based on the number of paths or shortest paths (geodesics) passing through a vertex; Indices based on distances from the vertex to the other vertices Let me call these indices geometric
  63. 63. Three dogs strive for a bone, and the fourth runs away with it...
  64. 64. Three dogs strive for a bone, and the fourth runs away with it... The advent of Link Analysis pushed for a third (winning) tribe: spectral indices
  65. 65. Three dogs strive for a bone, and the fourth runs away with it... The advent of Link Analysis pushed for a third (winning) tribe: spectral indices Some of the geometric indices have also a spectral (equivalent) definition
  66. 66. 1) The degree tribe
  67. 67. 1) The degree tribe (In-)Degree centrality: the number of incoming links cdeg(x) = d (x)
  68. 68. 1) The degree tribe (In-)Degree centrality: the number of incoming links Careful: when dealing with directed networks, some indices present two variants (e.g., in-degree vs. out- degree), the ones based on incoming paths being more interesting cdeg(x) = d (x)
  69. 69. 2) The path tribe
  70. 70. 2) The path tribe Betweenness centrality (Anthonisse 1971): cbetw(x) = X y,z6=x yz(x) yz
  71. 71. 2) The path tribe Betweenness centrality (Anthonisse 1971): Fraction of shortest paths from y to z passing through x cbetw(x) = X y,z6=x yz(x) yz
  72. 72. 2) The path tribe Betweenness centrality (Anthonisse 1971): cbetw(x) = X y,z6=x yz(x) yz
  73. 73. 2) The path tribe Betweenness centrality (Anthonisse 1971): Katz centrality (Katz 1953): cKatz(x) = 1X t=0 ↵t ⇧x(t) cbetw(x) = X y,z6=x yz(x) yz
  74. 74. 2) The path tribe Betweenness centrality (Anthonisse 1971): Katz centrality (Katz 1953): # of paths of length t ending in x cKatz(x) = 1X t=0 ↵t ⇧x(t) cbetw(x) = X y,z6=x yz(x) yz
  75. 75. 2) The path tribe Betweenness centrality (Anthonisse 1971): Katz centrality (Katz 1953): cKatz(x) = 1X t=0 ↵t ⇧x(t) cbetw(x) = X y,z6=x yz(x) yz
  76. 76. 3) The distance tribe
  77. 77. 3) The distance tribe Closeness centrality (Bavelas 1950): cclos(x) = 1 P y d(y, x)
  78. 78. 3) The distance tribe Closeness centrality (Bavelas 1950): Distance from y to x cclos(x) = 1 P y d(y, x)
  79. 79. 3) The distance tribe Closeness centrality (Bavelas 1950): cclos(x) = 1 P y d(y, x)
  80. 80. 3) The distance tribe Closeness centrality (Bavelas 1950): Lin centrality (Lin 1976): cclos(x) = 1 P y d(y, x) cLin(x) = canReach(x)2 P y d(y, x)
  81. 81. 3) The distance tribe Closeness centrality (Bavelas 1950): Lin centrality (Lin 1976): cclos(x) = 1 P y d(y, x) cLin(x) = canReach(x)2 P y d(y, x)
  82. 82. 3) The distance tribe Closeness centrality (Bavelas 1950): Lin centrality (Lin 1976): The summation is over all y such that d(y,x)<∞ cclos(x) = 1 P y d(y, x) cLin(x) = canReach(x)2 P y d(y, x)
  83. 83. 3) The distance tribe Closeness centrality (Bavelas 1950): Lin centrality (Lin 1976): The summation is over all y such that d(y,x)<∞ cclos(x) = 1 P y d(y, x) cLin(x) = canReach(x)2 P y d(y, x) Completely neglected by the literature
  84. 84. 3) The distance tribe a new member
  85. 85. 3) The distance tribe a new member Give a warm welcome to Harmonic centrality: charm(x) = X y6=x 1 d(y, x)
  86. 86. 3) The distance tribe a new member Give a warm welcome to Harmonic centrality: The denormalized reciprocal of the harmonic mean of a! distances (even ∞) charm(x) = X y6=x 1 d(y, x)
  87. 87. 3) The distance tribe a new member Give a warm welcome to Harmonic centrality: The denormalized reciprocal of the harmonic mean of a! distances (even ∞) Inspired by the use the the harmonic mean in (Marchiori, Latora 2000) charm(x) = X y6=x 1 d(y, x)
  88. 88. 3) The distance tribe a new member Give a warm welcome to Harmonic centrality: The denormalized reciprocal of the harmonic mean of a! distances (even ∞) Inspired by the use the the harmonic mean in (Marchiori, Latora 2000) Probably already appeared somewhere (e.g., quoted for undirected graphs in Tore Opsahl’s blog) charm(x) = X y6=x 1 d(y, x)
  89. 89. 4) The spectral tribe
  90. 90. 4) The spectral tribe All based on the eigenstructure of some graph-related matrix
  91. 91. PageRank (Brin, Page, Motwani, Winograd 1999)
  92. 92. PageRank (Brin, Page, Motwani, Winograd 1999) The idea is to start from the basic equation... cpr(x) = X y!x cpr(y) d+(y)
  93. 93. PageRank (Brin, Page, Motwani, Winograd 1999) The idea is to start from the basic equation... ...with an adjustment to make it have a unique solution (and more)... cpr(x) = X y!x cpr(y) d+(y) cpr(x) = ↵ X y!x cpr(y) d+(y) + 1 ↵ n
  94. 94. PageRank (Brin, Page, Motwani, Winograd 1999) The idea is to start from the basic equation... ...with an adjustment to make it have a unique solution (and more)... cpr(x) = X y!x cpr(y) d+(y) cpr(x) = ↵ X y!x cpr(y) d+(y) + 1 ↵ n
  95. 95. PageRank (Brin, Page, Motwani, Winograd 1999) The idea is to start from the basic equation... ...with an adjustment to make it have a unique solution (and more)... It is the dominant eigenvector of cpr(x) = X y!x cpr(y) d+(y) cpr(x) = ↵ X y!x cpr(y) d+(y) + 1 ↵ n ↵Gr + (1 ↵)1T 1/n
  96. 96. PageRank (Brin, Page, Motwani, Winograd 1999) The idea is to start from the basic equation... ...with an adjustment to make it have a unique solution (and more)... It is the dominant eigenvector of Katz is the dominant eigenvector of cpr(x) = X y!x cpr(y) d+(y) cpr(x) = ↵ X y!x cpr(y) d+(y) + 1 ↵ n ↵Gr + (1 ↵)1T 1/n ↵G + (1 ↵)eT 1/n
  97. 97. Seeley index (Seeley 1949)
  98. 98. Seeley index (Seeley 1949) It is essentially like PageRank with no damping factor; equivalently, the stable state of the natural random walk on G: cSeeley(x) = ✓ lim t!1 1 n (Gr)t ◆ x
  99. 99. Seeley index (Seeley 1949) It is essentially like PageRank with no damping factor; equivalently, the stable state of the natural random walk on G: cSeeley(x) = ✓ lim t!1 1 n (Gr)t ◆ x
  100. 100. Seeley index (Seeley 1949) It is essentially like PageRank with no damping factor; equivalently, the stable state of the natural random walk on G: It is obtained as the limit of PageRank when the damping goes to 1 cSeeley(x) = ✓ lim t!1 1 n (Gr)t ◆ x
  101. 101. Seeley index (Seeley 1949) It is essentially like PageRank with no damping factor; equivalently, the stable state of the natural random walk on G: It is obtained as the limit of PageRank when the damping goes to 1 In general it is a dominant eigenvector of cSeeley(x) = ✓ lim t!1 1 n (Gr)t ◆ x Gr
  102. 102. HITS (Kleinberg 1997)
  103. 103. HITS (Kleinberg 1997) The idea is to start from the system: cHauth(x) = X y!x cHhub(y) cHhub(x) = X x!y cHauth(y)
  104. 104. HITS (Kleinberg 1997) The idea is to start from the system: HITS centrality is defined to be the “authoritativeness” score cHauth(x) = X y!x cHhub(y) cHhub(x) = X x!y cHauth(y)
  105. 105. HITS (Kleinberg 1997) The idea is to start from the system: HITS centrality is defined to be the “authoritativeness” score It is a dominant eigenvector of cHauth(x) = X y!x cHhub(y) cHhub(x) = X x!y cHauth(y) GT G
  106. 106. HITS (, SALSA etc.)
  107. 107. HITS (, SALSA etc.) WARNING: These measures were proposed exactly for ranking results in hyperlinked collections
  108. 108. HITS (, SALSA etc.) WARNING: These measures were proposed exactly for ranking results in hyperlinked collections Should be applied not to the whole graph, but to a suitable subgraph derived from the query
  109. 109. HITS (, SALSA etc.) WARNING: These measures were proposed exactly for ranking results in hyperlinked collections Should be applied not to the whole graph, but to a suitable subgraph derived from the query How the subgraph is derived is very relevant for effectiveness (Najork, Gollapudi, Panighray 2009)
  110. 110. HITS (, SALSA etc.) WARNING: These measures were proposed exactly for ranking results in hyperlinked collections Should be applied not to the whole graph, but to a suitable subgraph derived from the query How the subgraph is derived is very relevant for effectiveness (Najork, Gollapudi, Panighray 2009) Not really the central point here, though...
  111. 111. SALSA (Lempel, Moran 2001)
  112. 112. SALSA (Lempel, Moran 2001) The idea is to start from the system: cSauth(x) = X y!x cShub(y) d+(y) cShub(x) = X x!y cSauth(y) d (y)
  113. 113. SALSA (Lempel, Moran 2001) The idea is to start from the system: SALSA centrality is defined to be the “authoritativeness” score cSauth(x) = X y!x cShub(y) d+(y) cShub(x) = X x!y cSauth(y) d (y)
  114. 114. SALSA (Lempel, Moran 2001) The idea is to start from the system: SALSA centrality is defined to be the “authoritativeness” score It is a dominant eigenvector of cSauth(x) = X y!x cShub(y) d+(y) cShub(x) = X x!y cSauth(y) d (y) GT c Gr
  115. 115. How? How to assess centrality
  116. 116. How? How to assess centrality Axiomatic approach
  117. 117. How? How to assess centrality Axiomatic approach Ground-truth approach
  118. 118. How? How to assess centrality Axiomatic approach Ground-truth approach IR approach
  119. 119. How? How to assess centrality Axiomatic approach Ground-truth approach IR approach Computational feasibility approach
  120. 120. Axiomatic lens
  121. 121. Axiomatic lens start from some minimal mathematical requirements
  122. 122. Axiomatic Sensitivity to size
  123. 123. Axiomatic Sensitivity to size k p Two disjoint (or very far) components of a single network
  124. 124. Axiomatic Sensitivity to size When k or p goes to ∞, the nodes of the corresponding subnetwork must become more important k p Two disjoint (or very far) components of a single network
  125. 125. Axiomatic Sensitivity to density
  126. 126. Axiomatic Sensitivity to density The blue and the red node have the same importance (the two rings have the same size!)
  127. 127. Axiomatic Sensitivity to density The blue and the red node have the same importance (the two rings have the same size!)
  128. 128. Axiomatic Sensitivity to density Densifying the left-hand side, we expect the red node to become more important than the blue node
  129. 129. Axiomatic Monotonicity G x y G’ x y
  130. 130. Axiomatic Monotonicity Adding an arc increases the importance of the target node G x y G’ x y
  131. 131. An axiomatic slaughter
  132. 132. An axiomatic slaughter
  133. 133. An axiomatic slaughter Size Density Monot. Degree only k yes yes Betweennes s only p no no Katz only k yes yes Closeness no no no Lin only k no no Harmonic yes yes yes PageRank no yes yes Seeley no yes no HITS only k yes no SALSA no yes no
  134. 134. An axiomatic slaughter Size Density Monot. Degree only k yes yes Betweennes s only p no no Katz only k yes yes Closeness no no no Lin only k no no Harmonic yes yes yes PageRank no yes yes Seeley no yes no HITS only k yes no SALSA no yes no
  135. 135. An axiomatic slaughter Size Density Monot. Degree only k yes yes Betweennes s only p no no Katz only k yes yes Closeness no no no Lin only k no no Harmonic yes yes yes PageRank no yes yes Seeley no yes no HITS only k yes no SALSA no yes no
  136. 136. An axiomatic slaughter Size Density Monot. Degree only k yes yes Betweennes s only p no no Katz only k yes yes Closeness no no no Lin only k no no Harmonic yes yes yes PageRank no yes yes Seeley no yes no HITS only k yes no SALSA no yes no
  137. 137. Ground-truth lens (mostly: anecdoctic / comparative)
  138. 138. Ground-truth lens (mostly: anecdoctic / comparative) check them against real (social?) networks on which you have some ground truth about importance/centrality/...
  139. 139. Hollywood: PageRank Ron Jeremy Adolf Hitler Lloyd Kaufman George W. Bush Ronald Reagan Bill Clinton Martin Sheen Debbie Rochon
  140. 140. Hollywood: PageRank Ron Jeremy Adolf Hitler Lloyd Kaufman George W. Bush Ronald Reagan Bill Clinton Martin Sheen Debbie Rochon
  141. 141. Hollywood: Degree William Shatner Bess Flowers Martin Sheen Ronald Reagan George Clooney Samuel Jackson Robin Williams Tom Hanks
  142. 142. Hollywood: Degree William Shatner Bess Flowers Martin Sheen Ronald Reagan George Clooney Samuel Jackson Robin Williams Tom Hanks
  143. 143. Hollywood: Betweenness Adolf Hitler Lloyd Kaufman Ron Jeremy Tony Robinson Olu Jacobs Max von Sydow Udo Kier George W. Bush
  144. 144. Hollywood: Betweenness Adolf Hitler Lloyd Kaufman Ron Jeremy Tony Robinson Olu Jacobs Max von Sydow Udo Kier George W. Bush
  145. 145. Hollywood: Katz William Shatner Martin Sheen George Clooney Robin WilliamsTom Hanks Ronald Reagan Bruce Willis Samuel Jackson
  146. 146. Hollywood: Katz William Shatner Martin Sheen George Clooney Robin WilliamsTom Hanks Ronald Reagan Bruce Willis Samuel Jackson
  147. 147. Hollywood: Closeness Lina Tjeng Ryan VillapotoAnh Loan Nguyen Thi Chad Reed Bjorn van Wenum J.P. Ramackers Herbert Sydney R.D. Nicholson
  148. 148. Hollywood: Closeness Lina Tjeng Ryan VillapotoAnh Loan Nguyen Thi Chad Reed Bjorn van Wenum J.P. Ramackers Herbert Sydney R.D. Nicholson
  149. 149. Hollywood: Closeness Lina Tjeng Ryan VillapotoAnh Loan Nguyen Thi Chad Reed Bjorn van Wenum J.P. Ramackers Herbert Sydney R.D. Nicholson Isolatednodeshavelargestcentrality
  150. 150. Hollywood: Lin George Clooney Samuel Jackson Martin Sheen Dennis Hopper Antonio Banderas Madonna Michael Douglas Tom Cruise
  151. 151. Hollywood: HITS William Shatner Robin Williams Bruce WillisTom Hanks Michael Douglas Cameron Diaz Harrison Ford Pierce Brosnan
  152. 152. Hollywood: Harmonic George Clooney Samuel Jackson Sharon Stone Tom Hanks Martin Sheen Dennis Hopper Antonio Banderas Madonna
  153. 153. What about the web? .uk Top Ten
  154. 154. What about the web? .uk Top Ten
  155. 155. What about the web? .uk Top Ten PageRank Katz Lin Harmonic http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.kelkoo.co.uk/ http://www.bbc.co.uk/accessibility/ http://news.bbc.co.uk/ http://www.names.co.uk/ http://www.kelkoo.co.uk/b/a/ kc_top_searches_charts.html http://news.bbc.co.uk/ http://www.dti.gov.uk/ http://www.names.co.uk/hosting.html http://www.kelkoo.co.uk/b/a/ co_2765_128501-company-information- pages.html http://www.dti.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.names.co.uk/email.html http://www.kelkoo.co.uk/b/a/sm_site- map.html?displayType=alpha http://www.google.co.uk/ http://www.google.co.uk/ http://www.names.co.uk/ controlpanel.html http://www.kelkoo.co.uk/b/a/ co_5199_128501-how-to-use- kelkoo.html http://www.guardian.co.uk/ http://www.bbc.co.uk/accessibility/ http://www.scdc.org.uk/ http://www.kelkoo.co.uk/b/a/ co_2120_128501-shopping-guides- price-comparison-on-kelkoo-uk.html http://www.homeoffice.gov.uk/ http://www.homeoffice.gov.uk/ http://www.freelyricsearch.co.uk/ index.html http://www.ebay.co.uk/ http://www.statistics.gov.uk/ http://www.statistics.gov.uk/ http://www.247partypeople.co.uk/ login.asp http://www.top50scrappers.co.uk/ http://www.bbc.co.uk/privacy/ http://www.bbc.co.uk/privacy/ http://www.becs.co.uk/catalog/ cookie_usage.php http://cgi1.ebay.co.uk/aw-cgi/ eBayISAPI.dll?TimeShow http://www.bbc.co.uk/info/ http://www.bbc.co.uk/info/
  156. 156. What about the web? .uk Top Ten PageRank Katz Lin Harmonic http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.kelkoo.co.uk/ http://www.bbc.co.uk/accessibility/ http://news.bbc.co.uk/ http://www.names.co.uk/ http://www.kelkoo.co.uk/b/a/ kc_top_searches_charts.html http://news.bbc.co.uk/ http://www.dti.gov.uk/ http://www.names.co.uk/hosting.html http://www.kelkoo.co.uk/b/a/ co_2765_128501-company-information- pages.html http://www.dti.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.names.co.uk/email.html http://www.kelkoo.co.uk/b/a/sm_site- map.html?displayType=alpha http://www.google.co.uk/ http://www.google.co.uk/ http://www.names.co.uk/ controlpanel.html http://www.kelkoo.co.uk/b/a/ co_5199_128501-how-to-use- kelkoo.html http://www.guardian.co.uk/ http://www.bbc.co.uk/accessibility/ http://www.scdc.org.uk/ http://www.kelkoo.co.uk/b/a/ co_2120_128501-shopping-guides- price-comparison-on-kelkoo-uk.html http://www.homeoffice.gov.uk/ http://www.homeoffice.gov.uk/ http://www.freelyricsearch.co.uk/ index.html http://www.ebay.co.uk/ http://www.statistics.gov.uk/ http://www.statistics.gov.uk/ http://www.247partypeople.co.uk/ login.asp http://www.top50scrappers.co.uk/ http://www.bbc.co.uk/privacy/ http://www.bbc.co.uk/privacy/ http://www.becs.co.uk/catalog/ cookie_usage.php http://cgi1.ebay.co.uk/aw-cgi/ eBayISAPI.dll?TimeShow http://www.bbc.co.uk/info/ http://www.bbc.co.uk/info/
  157. 157. What about the web? .uk Top Ten PageRank Katz Lin Harmonic http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.kelkoo.co.uk/ http://www.bbc.co.uk/accessibility/ http://news.bbc.co.uk/ http://www.names.co.uk/ http://www.kelkoo.co.uk/b/a/ kc_top_searches_charts.html http://news.bbc.co.uk/ http://www.dti.gov.uk/ http://www.names.co.uk/hosting.html http://www.kelkoo.co.uk/b/a/ co_2765_128501-company-information- pages.html http://www.dti.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.names.co.uk/email.html http://www.kelkoo.co.uk/b/a/sm_site- map.html?displayType=alpha http://www.google.co.uk/ http://www.google.co.uk/ http://www.names.co.uk/ controlpanel.html http://www.kelkoo.co.uk/b/a/ co_5199_128501-how-to-use- kelkoo.html http://www.guardian.co.uk/ http://www.bbc.co.uk/accessibility/ http://www.scdc.org.uk/ http://www.kelkoo.co.uk/b/a/ co_2120_128501-shopping-guides- price-comparison-on-kelkoo-uk.html http://www.homeoffice.gov.uk/ http://www.homeoffice.gov.uk/ http://www.freelyricsearch.co.uk/ index.html http://www.ebay.co.uk/ http://www.statistics.gov.uk/ http://www.statistics.gov.uk/ http://www.247partypeople.co.uk/ login.asp http://www.top50scrappers.co.uk/ http://www.bbc.co.uk/privacy/ http://www.bbc.co.uk/privacy/ http://www.becs.co.uk/catalog/ cookie_usage.php http://cgi1.ebay.co.uk/aw-cgi/ eBayISAPI.dll?TimeShow http://www.bbc.co.uk/info/ http://www.bbc.co.uk/info/
  158. 158. What about the web? .uk Top Ten PageRank Katz Lin Harmonic http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.kelkoo.co.uk/ http://www.bbc.co.uk/accessibility/ http://news.bbc.co.uk/ http://www.names.co.uk/ http://www.kelkoo.co.uk/b/a/ kc_top_searches_charts.html http://news.bbc.co.uk/ http://www.dti.gov.uk/ http://www.names.co.uk/hosting.html http://www.kelkoo.co.uk/b/a/ co_2765_128501-company-information- pages.html http://www.dti.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.names.co.uk/email.html http://www.kelkoo.co.uk/b/a/sm_site- map.html?displayType=alpha http://www.google.co.uk/ http://www.google.co.uk/ http://www.names.co.uk/ controlpanel.html http://www.kelkoo.co.uk/b/a/ co_5199_128501-how-to-use- kelkoo.html http://www.guardian.co.uk/ http://www.bbc.co.uk/accessibility/ http://www.scdc.org.uk/ http://www.kelkoo.co.uk/b/a/ co_2120_128501-shopping-guides- price-comparison-on-kelkoo-uk.html http://www.homeoffice.gov.uk/ http://www.homeoffice.gov.uk/ http://www.freelyricsearch.co.uk/ index.html http://www.ebay.co.uk/ http://www.statistics.gov.uk/ http://www.statistics.gov.uk/ http://www.247partypeople.co.uk/ login.asp http://www.top50scrappers.co.uk/ http://www.bbc.co.uk/privacy/ http://www.bbc.co.uk/privacy/ http://www.becs.co.uk/catalog/ cookie_usage.php http://cgi1.ebay.co.uk/aw-cgi/ eBayISAPI.dll?TimeShow http://www.bbc.co.uk/info/ http://www.bbc.co.uk/info/
  159. 159. How do they compare?
  160. 160. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ
  161. 161. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ most geometric indices and HITS are rather correlated to one another
  162. 162. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ
  163. 163. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ Katz, degree and SALSA are also highly correlated
  164. 164. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ
  165. 165. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ Betweenness does not correlate to anything
  166. 166. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ
  167. 167. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ PageRank stands alone
  168. 168. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ
  169. 169. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ
  170. 170. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ Betweenness could not be computed because of graph size (106M nodes)
  171. 171. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ
  172. 172. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ The same correlations as with Hollywood, but even more emphasized
  173. 173. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ
  174. 174. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ Exception: HITS used to be correlated with the geometric indices, while now it is alone
  175. 175. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ
  176. 176. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ A larger correlation between PageRank and Katz & degree
  177. 177. Orkut (2007 snapshot) Kendall’s τ
  178. 178. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9522 0.8982 0.8242 1.0000 0.5521 0.5391 0.5513 0.3508 0.7265 0.7596 0.8016 Katz 1/4 0.9522 1.0000 0.9489 0.8750 0.9522 0.5972 0.5875 0.5967 0.3984 0.6868 0.7179 0.7577 Katz 1/2 0.8982 0.9489 1.0000 0.9275 0.8982 0.6382 0.6338 0.6380 0.4491 0.6400 0.6690 0.7067 Katz 3/4 0.8242 0.8750 0.9275 1.0000 0.8242 0.6839 0.6910 0.6842 0.5213 0.5742 0.6005 0.6355 SALSA 1.0000 0.9522 0.8982 0.8242 1.0000 0.5521 0.5391 0.5513 0.3505 0.7265 0.7596 0.8016 closeness 0.5521 0.5972 0.6382 0.6839 0.5521 1.0000 0.9458 0.9830 0.6539 0.3862 0.4040 0.4268 harmonic 0.5391 0.5875 0.6338 0.6910 0.5391 0.9458 1.0000 0.9471 0.7090 0.3612 0.3777 0.3992 Lin 0.5513 0.5967 0.6380 0.6842 0.5513 0.9830 0.9471 1.0000 0.6546 0.3852 0.4030 0.4257 HITS 0.3508 0.3984 0.4491 0.5213 0.3505 0.6539 0.7090 0.6546 1.0000 0.1689 0.1778 0.1917 PR 1/4 0.7265 0.6868 0.6400 0.5742 0.7265 0.3862 0.3612 0.3852 0.1689 1.0000 0.9520 0.8889 PR 1/2 0.7596 0.7179 0.6690 0.6005 0.7596 0.4040 0.3777 0.4030 0.1778 0.9520 1.0000 0.9363 PR 3/4 0.8016 0.7577 0.7067 0.6355 0.8016 0.4268 0.3992 0.4257 0.1917 0.8889 0.9363 1.0000 Table 1: Orkut 1 Orkut (2007 snapshot) Kendall’s τ
  179. 179. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9522 0.8982 0.8242 1.0000 0.5521 0.5391 0.5513 0.3508 0.7265 0.7596 0.8016 Katz 1/4 0.9522 1.0000 0.9489 0.8750 0.9522 0.5972 0.5875 0.5967 0.3984 0.6868 0.7179 0.7577 Katz 1/2 0.8982 0.9489 1.0000 0.9275 0.8982 0.6382 0.6338 0.6380 0.4491 0.6400 0.6690 0.7067 Katz 3/4 0.8242 0.8750 0.9275 1.0000 0.8242 0.6839 0.6910 0.6842 0.5213 0.5742 0.6005 0.6355 SALSA 1.0000 0.9522 0.8982 0.8242 1.0000 0.5521 0.5391 0.5513 0.3505 0.7265 0.7596 0.8016 closeness 0.5521 0.5972 0.6382 0.6839 0.5521 1.0000 0.9458 0.9830 0.6539 0.3862 0.4040 0.4268 harmonic 0.5391 0.5875 0.6338 0.6910 0.5391 0.9458 1.0000 0.9471 0.7090 0.3612 0.3777 0.3992 Lin 0.5513 0.5967 0.6380 0.6842 0.5513 0.9830 0.9471 1.0000 0.6546 0.3852 0.4030 0.4257 HITS 0.3508 0.3984 0.4491 0.5213 0.3505 0.6539 0.7090 0.6546 1.0000 0.1689 0.1778 0.1917 PR 1/4 0.7265 0.6868 0.6400 0.5742 0.7265 0.3862 0.3612 0.3852 0.1689 1.0000 0.9520 0.8889 PR 1/2 0.7596 0.7179 0.6690 0.6005 0.7596 0.4040 0.3777 0.4030 0.1778 0.9520 1.0000 0.9363 PR 3/4 0.8016 0.7577 0.7067 0.6355 0.8016 0.4268 0.3992 0.4257 0.1917 0.8889 0.9363 1.0000 Table 1: Orkut 1 Orkut (2007 snapshot) Kendall’s τ The same correlation as in Hollywood, even if this time SALSA is also pretty correlated with PageRank as well
  180. 180. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9522 0.8982 0.8242 1.0000 0.5521 0.5391 0.5513 0.3508 0.7265 0.7596 0.8016 Katz 1/4 0.9522 1.0000 0.9489 0.8750 0.9522 0.5972 0.5875 0.5967 0.3984 0.6868 0.7179 0.7577 Katz 1/2 0.8982 0.9489 1.0000 0.9275 0.8982 0.6382 0.6338 0.6380 0.4491 0.6400 0.6690 0.7067 Katz 3/4 0.8242 0.8750 0.9275 1.0000 0.8242 0.6839 0.6910 0.6842 0.5213 0.5742 0.6005 0.6355 SALSA 1.0000 0.9522 0.8982 0.8242 1.0000 0.5521 0.5391 0.5513 0.3505 0.7265 0.7596 0.8016 closeness 0.5521 0.5972 0.6382 0.6839 0.5521 1.0000 0.9458 0.9830 0.6539 0.3862 0.4040 0.4268 harmonic 0.5391 0.5875 0.6338 0.6910 0.5391 0.9458 1.0000 0.9471 0.7090 0.3612 0.3777 0.3992 Lin 0.5513 0.5967 0.6380 0.6842 0.5513 0.9830 0.9471 1.0000 0.6546 0.3852 0.4030 0.4257 HITS 0.3508 0.3984 0.4491 0.5213 0.3505 0.6539 0.7090 0.6546 1.0000 0.1689 0.1778 0.1917 PR 1/4 0.7265 0.6868 0.6400 0.5742 0.7265 0.3862 0.3612 0.3852 0.1689 1.0000 0.9520 0.8889 PR 1/2 0.7596 0.7179 0.6690 0.6005 0.7596 0.4040 0.3777 0.4030 0.1778 0.9520 1.0000 0.9363 PR 3/4 0.8016 0.7577 0.7067 0.6355 0.8016 0.4268 0.3992 0.4257 0.1917 0.8889 0.9363 1.0000 Table 1: Orkut 1 Orkut (2007 snapshot) Kendall’s τ
  181. 181. IR lens (using TREC .gov2)
  182. 182. IR lens (using TREC .gov2) use centrality (in isolation or combined with textual features) to rerank query results and see how good (bad) they do
  183. 183. TREC .gov2
  184. 184. TREC .gov2 150 queries (query title words, in AND; with stemming, no stopword elimination)
  185. 185. TREC .gov2 150 queries (query title words, in AND; with stemming, no stopword elimination) Generated the result graph using the method described by Najork et al. 2009 (a variant of Kleinberg’s HITS graph, taking a in-links and b out-links)
  186. 186. TREC .gov2 150 queries (query title words, in AND; with stemming, no stopword elimination) Generated the result graph using the method described by Najork et al. 2009 (a variant of Kleinberg’s HITS graph, taking a in-links and b out-links) Considered many combinations: here I present only the cases a=b=0 (i.e., subgraph induced by the result set)
  187. 187. TREC .gov2 150 queries (query title words, in AND; with stemming, no stopword elimination) Generated the result graph using the method described by Najork et al. 2009 (a variant of Kleinberg’s HITS graph, taking a in-links and b out-links) Considered many combinations: here I present only the cases a=b=0 (i.e., subgraph induced by the result set) With or without intra-host links
  188. 188. P@10 and NDCG@10
  189. 189. P@10 and NDCG@10
  190. 190. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Betweenness 0.0584 0.0595 0.0577 0.0588 Closeness 0.1101 0.1061 0.1121 0.1168 PageRank (best) 0.1107 0.1078 0.1295 0.1347 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 Katz (best) 0.1242 0.1228 0.1262 0.1297 Lin 0.1295 0.1308 0.1248 0.1286 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293
  191. 191. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Betweenness 0.0584 0.0595 0.0577 0.0588 Closeness 0.1101 0.1061 0.1121 0.1168 PageRank (best) 0.1107 0.1078 0.1295 0.1347 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 Katz (best) 0.1242 0.1228 0.1262 0.1297 Lin 0.1295 0.1308 0.1248 0.1286 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293
  192. 192. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Betweenness 0.0584 0.0595 0.0577 0.0588 Closeness 0.1101 0.1061 0.1121 0.1168 PageRank (best) 0.1107 0.1078 0.1295 0.1347 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 Katz (best) 0.1242 0.1228 0.1262 0.1297 Lin 0.1295 0.1308 0.1248 0.1286 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293
  193. 193. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Betweenness 0.0584 0.0595 0.0577 0.0588 Closeness 0.1101 0.1061 0.1121 0.1168 PageRank (best) 0.1107 0.1078 0.1295 0.1347 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 Katz (best) 0.1242 0.1228 0.1262 0.1297 Lin 0.1295 0.1308 0.1248 0.1286 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293
  194. 194. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Betweenness 0.0584 0.0595 0.0577 0.0588 Closeness 0.1101 0.1061 0.1121 0.1168 PageRank (best) 0.1107 0.1078 0.1295 0.1347 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 Katz (best) 0.1242 0.1228 0.1262 0.1297 Lin 0.1295 0.1308 0.1248 0.1286 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293
  195. 195. Intra-host links?
  196. 196. Intra-host links? Keep them or throw them away?
  197. 197. Intra-host links? Keep them or throw them away? Most indices get better if you throw them away...
  198. 198. Intra-host links? Keep them or throw them away? Most indices get better if you throw them away... Throwing such links away injects a lot of information, but apparently harmonic doesn’t need it!
  199. 199. Intra-host links? Keep them or throw them away? Most indices get better if you throw them away... Throwing such links away injects a lot of information, but apparently harmonic doesn’t need it! ...but harmonic is better (and best of all) with the whole thing!
  200. 200. .uk (May 2007 snapshot) Kendall’s τ with no intra-host links degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9995 0.9995 0.9995 0.9965 0.9883 0.9984 0.9984 0.8767 0.9956 0.9956 0.9955 Katz 1/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 1/2 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 3/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 SALSA 0.9965 0.9960 0.9960 0.9960 1.0000 0.9904 0.9950 0.9950 0.8718 0.9959 0.9959 0.9958 closeness 0.9883 0.9870 0.9870 0.9870 0.9904 1.0000 0.9859 0.9876 0.8714 0.9894 0.9893 0.9892 harmonic 0.9984 0.9986 0.9986 0.9986 0.9950 0.9859 1.0000 0.9984 0.8759 0.9944 0.9945 0.9946 Lin 0.9984 0.9978 0.9978 0.9978 0.9950 0.9876 0.9984 1.0000 0.8759 0.9945 0.9946 0.9946 HITS 0.8767 0.8763 0.8763 0.8763 0.8718 0.8714 0.8759 0.8759 1.0000 0.8727 0.8727 0.8727 PR 1/4 0.9956 0.9952 0.9952 0.9952 0.9959 0.9894 0.9944 0.9945 0.8727 1.0000 0.9998 0.9997 PR 1/2 0.9956 0.9953 0.9953 0.9953 0.9959 0.9893 0.9945 0.9946 0.8727 0.9998 1.0000 0.9999 PR 3/4 0.9955 0.9953 0.9953 0.9953 0.9958 0.9892 0.9946 0.9946 0.8727 0.9997 0.9999 1.0000 Table 1: .uk-nn 1
  201. 201. .uk (May 2007 snapshot) Kendall’s τ with no intra-host links Everything becomes correlated. Why??? degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9995 0.9995 0.9995 0.9965 0.9883 0.9984 0.9984 0.8767 0.9956 0.9956 0.9955 Katz 1/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 1/2 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 3/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 SALSA 0.9965 0.9960 0.9960 0.9960 1.0000 0.9904 0.9950 0.9950 0.8718 0.9959 0.9959 0.9958 closeness 0.9883 0.9870 0.9870 0.9870 0.9904 1.0000 0.9859 0.9876 0.8714 0.9894 0.9893 0.9892 harmonic 0.9984 0.9986 0.9986 0.9986 0.9950 0.9859 1.0000 0.9984 0.8759 0.9944 0.9945 0.9946 Lin 0.9984 0.9978 0.9978 0.9978 0.9950 0.9876 0.9984 1.0000 0.8759 0.9945 0.9946 0.9946 HITS 0.8767 0.8763 0.8763 0.8763 0.8718 0.8714 0.8759 0.8759 1.0000 0.8727 0.8727 0.8727 PR 1/4 0.9956 0.9952 0.9952 0.9952 0.9959 0.9894 0.9944 0.9945 0.8727 1.0000 0.9998 0.9997 PR 1/2 0.9956 0.9953 0.9953 0.9953 0.9959 0.9893 0.9945 0.9946 0.8727 0.9998 1.0000 0.9999 PR 3/4 0.9955 0.9953 0.9953 0.9953 0.9958 0.9892 0.9946 0.9946 0.8727 0.9997 0.9999 1.0000 Table 1: .uk-nn 1
  202. 202. .uk (May 2007 snapshot) Kendall’s τ with no intra-host links degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9995 0.9995 0.9995 0.9965 0.9883 0.9984 0.9984 0.8767 0.9956 0.9956 0.9955 Katz 1/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 1/2 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 3/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 SALSA 0.9965 0.9960 0.9960 0.9960 1.0000 0.9904 0.9950 0.9950 0.8718 0.9959 0.9959 0.9958 closeness 0.9883 0.9870 0.9870 0.9870 0.9904 1.0000 0.9859 0.9876 0.8714 0.9894 0.9893 0.9892 harmonic 0.9984 0.9986 0.9986 0.9986 0.9950 0.9859 1.0000 0.9984 0.8759 0.9944 0.9945 0.9946 Lin 0.9984 0.9978 0.9978 0.9978 0.9950 0.9876 0.9984 1.0000 0.8759 0.9945 0.9946 0.9946 HITS 0.8767 0.8763 0.8763 0.8763 0.8718 0.8714 0.8759 0.8759 1.0000 0.8727 0.8727 0.8727 PR 1/4 0.9956 0.9952 0.9952 0.9952 0.9959 0.9894 0.9944 0.9945 0.8727 1.0000 0.9998 0.9997 PR 1/2 0.9956 0.9953 0.9953 0.9953 0.9959 0.9893 0.9945 0.9946 0.8727 0.9998 1.0000 0.9999 PR 3/4 0.9955 0.9953 0.9953 0.9953 0.9958 0.9892 0.9946 0.9946 0.8727 0.9997 0.9999 1.0000 Table 1: .uk-nn 1
  203. 203. .uk (May 2007 snapshot) Kendall’s τ with no intra-host links Because most nodes have degree 0 (and hence they also have all the other scores at a tie) degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9995 0.9995 0.9995 0.9965 0.9883 0.9984 0.9984 0.8767 0.9956 0.9956 0.9955 Katz 1/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 1/2 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 3/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 SALSA 0.9965 0.9960 0.9960 0.9960 1.0000 0.9904 0.9950 0.9950 0.8718 0.9959 0.9959 0.9958 closeness 0.9883 0.9870 0.9870 0.9870 0.9904 1.0000 0.9859 0.9876 0.8714 0.9894 0.9893 0.9892 harmonic 0.9984 0.9986 0.9986 0.9986 0.9950 0.9859 1.0000 0.9984 0.8759 0.9944 0.9945 0.9946 Lin 0.9984 0.9978 0.9978 0.9978 0.9950 0.9876 0.9984 1.0000 0.8759 0.9945 0.9946 0.9946 HITS 0.8767 0.8763 0.8763 0.8763 0.8718 0.8714 0.8759 0.8759 1.0000 0.8727 0.8727 0.8727 PR 1/4 0.9956 0.9952 0.9952 0.9952 0.9959 0.9894 0.9944 0.9945 0.8727 1.0000 0.9998 0.9997 PR 1/2 0.9956 0.9953 0.9953 0.9953 0.9959 0.9893 0.9945 0.9946 0.8727 0.9998 1.0000 0.9999 PR 3/4 0.9955 0.9953 0.9953 0.9953 0.9958 0.9892 0.9946 0.9946 0.8727 0.9997 0.9999 1.0000 Table 1: .uk-nn 1
  204. 204. P@10 and NDCG@10
  205. 205. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  206. 206. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  207. 207. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842 d (x) · canReach(x)
  208. 208. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  209. 209. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  210. 210. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842 d (x) · connected(x)
  211. 211. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  212. 212. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  213. 213. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  214. 214. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  215. 215. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842 TODO Study features in combination
  216. 216. Computational feasibility lens
  217. 217. Computational feasibility lens which indices are computable effectively on large networks; consider also parallelizability / distributability...
  218. 218. Best algorithms so far
  219. 219. Best algorithms so far
  220. 220. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  221. 221. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  222. 222. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  223. 223. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  224. 224. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  225. 225. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  226. 226. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  227. 227. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  228. 228. Geometric
  229. 229. Geometric What about geometic measures, like closeness, Lin and harmonic?
  230. 230. Computing harmonic
  231. 231. Computing harmonic Let us take harmonic as an example
  232. 232. Computing harmonic Let us take harmonic as an example But how easy is it to compute? charm(x) = X y6=x 1 d(y, x)
  233. 233. Computing harmonic Let us take harmonic as an example But how easy is it to compute? ...or... charm(x) = X y6=x 1 d(y, x) charm(x) = 1X t=1 |Bt(x)| |Bt 1(x)| t
  234. 234. Computing harmonic Let us take harmonic as an example But how easy is it to compute? ...or... charm(x) = X y6=x 1 d(y, x) charm(x) = 1X t=1 |Bt(x)| |Bt 1(x)| t Ball of radius t about x
  235. 235. Computing harmonic Let us take harmonic as an example But how easy is it to compute? ...or... charm(x) = X y6=x 1 d(y, x) charm(x) = 1X t=1 |Bt(x)| |Bt 1(x)| t
  236. 236. Computing by diffusion
  237. 237. Computing by diffusion Clearly B0(x) = {x}
  238. 238. Computing by diffusion Clearly Moreover B0(x) = {x} Bt+1(x) = {x} [ [ x!y Bt(y)
  239. 239. Computing by diffusion Clearly Moreover So one needs just one single sequential scan of the graph to compute the balls at the next iteration B0(x) = {x} Bt+1(x) = {x} [ [ x!y Bt(y)
  240. 240. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  241. 241. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  242. 242. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  243. 243. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  244. 244. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  245. 245. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  246. 246. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  247. 247. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  248. 248. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  249. 249. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  250. 250. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  251. 251. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  252. 252. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺☺ ☺☺ ☺ ☺
  253. 253. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺☺ ☺ ☺ ☺ ☺
  254. 254. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺☺ ☺ ☺ ☺ ☺
  255. 255. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺☺ ☺ ☺☺ ☺
  256. 256. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺☺ ☺ ☺☺ ☺
  257. 257. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺ ☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺ ☺☺ ☺☺☺
  258. 258. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺ ☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺ ☺☺ ☺☺☺
  259. 259. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺ ☺☺ ☺☺☺
  260. 260. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺☺☺☺ ☺ ☺ ☺☺ ☺ ☺☺ ☺☺☺
  261. 261. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺☺☺☺ ☺ ☺ ☺☺ ☺ ☺☺ ☺☺☺
  262. 262. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺☺☺☺ ☺ ☺☺☺ ☺ ☺☺ ☺☺☺
  263. 263. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺☺☺☺ ☺ ☺☺☺ ☺ ☺☺ ☺☺☺
  264. 264. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺☺☺☺ ☺ ☺☺☺ ☺ ☺☺☺☺☺
  265. 265. Easy but expensive
  266. 266. Easy but expensive Each set uses linear space; overall quadratic
  267. 267. Easy but expensive Each set uses linear space; overall quadratic Impossible!
  268. 268. Easy but expensive Each set uses linear space; overall quadratic Impossible! But what if we use approximate sets?
  269. 269. Easy but expensive Each set uses linear space; overall quadratic Impossible! But what if we use approximate sets? Idea: use probabilistic counters, which represent sets but answer just to “size?” questions
  270. 270. Easy but expensive Each set uses linear space; overall quadratic Impossible! But what if we use approximate sets? Idea: use probabilistic counters, which represent sets but answer just to “size?” questions Very small! With 40 bits you can count up to 4 billion with a standard deviation of 6%
  271. 271. Use the Force
  272. 272. Use the Force HyperBall (http://webgraph.dsi.unimi.it/) does the job
  273. 273. Use the Force HyperBall (http://webgraph.dsi.unimi.it/) does the job Fully exploits multicore architectures; uses broadword microparallelization
  274. 274. Use the Force HyperBall (http://webgraph.dsi.unimi.it/) does the job Fully exploits multicore architectures; uses broadword microparallelization Works like a charm on networks with hundreds of millions of nodes
  275. 275. What is HyperBall
  276. 276. What is HyperBall It uses the diffusion idea to compute (at the same time):
  277. 277. What is HyperBall It uses the diffusion idea to compute (at the same time): Lin + Closeness + Harmonic + Number of reachable nodes...
  278. 278. What is HyperBall It uses the diffusion idea to compute (at the same time): Lin + Closeness + Harmonic + Number of reachable nodes... It employs Flajolet’s HyperLogLog counters to store sets
  279. 279. What is HyperBall It uses the diffusion idea to compute (at the same time): Lin + Closeness + Harmonic + Number of reachable nodes... It employs Flajolet’s HyperLogLog counters to store sets More on HyperBall in the next few slides...
  280. 280. Main trick
  281. 281. Main trick Choose an approximate set such that unions can be computed quickly
  282. 282. Main trick Choose an approximate set such that unions can be computed quickly ANF [Palmer et al., KDD ’02] uses Martin–Flajolet (MF) counters (log n+c space)
  283. 283. Main trick Choose an approximate set such that unions can be computed quickly ANF [Palmer et al., KDD ’02] uses Martin–Flajolet (MF) counters (log n+c space) We use HyperLogLog counters [Flajolet et al.,2007] (loglog n space)
  284. 284. Main trick Choose an approximate set such that unions can be computed quickly ANF [Palmer et al., KDD ’02] uses Martin–Flajolet (MF) counters (log n+c space) We use HyperLogLog counters [Flajolet et al.,2007] (loglog n space) MF counters can be combined with an OR
  285. 285. Main trick Choose an approximate set such that unions can be computed quickly ANF [Palmer et al., KDD ’02] uses Martin–Flajolet (MF) counters (log n+c space) We use HyperLogLog counters [Flajolet et al.,2007] (loglog n space) MF counters can be combined with an OR We use broadword programming to combine HyperLogLog counters quickly!
  286. 286. HyperLogLog counters
  287. 287. HyperLogLog counters Instead of actually counting, we observe a statistical feature of a set (think stream) of elements
  288. 288. HyperLogLog counters Instead of actually counting, we observe a statistical feature of a set (think stream) of elements The feature: the number of trailing zeroes of the value of a very good hash function
  289. 289. HyperLogLog counters Instead of actually counting, we observe a statistical feature of a set (think stream) of elements The feature: the number of trailing zeroes of the value of a very good hash function We keep track of the maximum m (log log n bits!)
  290. 290. HyperLogLog counters Instead of actually counting, we observe a statistical feature of a set (think stream) of elements The feature: the number of trailing zeroes of the value of a very good hash function We keep track of the maximum m (log log n bits!) The number of distinct elements ∝ 2m
  291. 291. HyperLogLog counters Instead of actually counting, we observe a statistical feature of a set (think stream) of elements The feature: the number of trailing zeroes of the value of a very good hash function We keep track of the maximum m (log log n bits!) The number of distinct elements ∝ 2m Important: the counter of stream AB is simply the maximum of the counters of A and B!
  292. 292. Other ideas
  293. 293. Other ideas We keep track of modifications: we do not maximize with unmodified counters
  294. 294. Other ideas We keep track of modifications: we do not maximize with unmodified counters Systolic computation: each modified set signals back to predecessors that something is going to happen (much fewer updates!)
  295. 295. Other ideas We keep track of modifications: we do not maximize with unmodified counters Systolic computation: each modified set signals back to predecessors that something is going to happen (much fewer updates!) Multicore exploitation by decomposition: a task is updating just 1000 counters (almost linear scaling)
  296. 296. Footprint
  297. 297. Footprint Scalability: a minimum of 20 bytes per node
  298. 298. Footprint Scalability: a minimum of 20 bytes per node On a 2TiB machine, 100 billion nodes
  299. 299. Footprint Scalability: a minimum of 20 bytes per node On a 2TiB machine, 100 billion nodes Graph structure is accessed by memory-mapping in a compressed form (WebGraph)
  300. 300. Footprint Scalability: a minimum of 20 bytes per node On a 2TiB machine, 100 billion nodes Graph structure is accessed by memory-mapping in a compressed form (WebGraph) Pointers to the graph are stored using succinct lists (Elias-Fano representation)
  301. 301. Performance
  302. 302. Performance On a 177K nodes
  303. 303. Performance On a 177K nodes Hadoop: 2875s per iteration [Kang, Papadimitriou, Sun and H. Tong, 2011]
  304. 304. Performance On a 177K nodes Hadoop: 2875s per iteration [Kang, Papadimitriou, Sun and H. Tong, 2011] HyperBall on this laptop: 70s per iteration
  305. 305. Performance On a 177K nodes Hadoop: 2875s per iteration [Kang, Papadimitriou, Sun and H. Tong, 2011] HyperBall on this laptop: 70s per iteration On a 32-core workstation: 23s per iteration
  306. 306. Performance On a 177K nodes Hadoop: 2875s per iteration [Kang, Papadimitriou, Sun and H. Tong, 2011] HyperBall on this laptop: 70s per iteration On a 32-core workstation: 23s per iteration On ClueWeb09 (4.8G nodes, 8G arcs) on a 40-core workstation: 141m (avg. 40s per iteration)
  307. 307. Convergence 5 15 25 35 45 55 65 75 85 95 0.0000.0020.0040.006 Harmonic centrality # runs Relativeerror
  308. 308. And when the Force is not enough?
  309. 309. And when the Force is not enough? Diffusion processes are easily distributable
  310. 310. And when the Force is not enough? Diffusion processes are easily distributable A Pregel-like implementation of HyperANF is underway (by Sebastian Schelter, TU Berlin)
  311. 311. Did you see the light?
  312. 312. Did you see the light? No? Me neither... But we have a path
  313. 313. Did you see the light? No? Me neither... But we have a path Some things have been ruled out, at least
  314. 314. Did you see the light? No? Me neither... But we have a path Some things have been ruled out, at least Some others, that were neglected, have found revenge
  315. 315. Did you see the light? No? Me neither... But we have a path Some things have been ruled out, at least Some others, that were neglected, have found revenge Some new ones seem to be promising
  316. 316. Questions?

×