A MODERN VIEW OF
CENTRALITY MEASURES
Paolo Boldi
Univ. degli Studi di Milano
Work in progress with Sebastiano Vigna
What do these people
have in common?
What do these people
have in common?
What do these people
have in common?
What do these people
have in common?
What do these people
have in common?
PageRank (believe it or not)
PageRank (believe it or not)
These are the top-8 actors of the Hollywood graph
according to PageRank
PageRank (believe it or not)
These are the top-8 actors of the Hollywood graph
according to PageRank
Who is going to tell
...
PageRank (believe it or not)
These are the top-8 actors of the Hollywood graph
according to PageRank
Who is going to tell
...
PageRank (believe it or not)
These are the top-8 actors of the Hollywood graph
according to PageRank
Who is going to tell
...
PageRank sucks!
(or NOT?)
PageRank sucks!
(or NOT?)
The Hollywood graph we used contains 2,000,000
nodes. Most of them are completely unknown!
PageRank sucks!
(or NOT?)
The Hollywood graph we used contains 2,000,000
nodes. Most of them are completely unknown!
PageR...
PageRank sucks!
(or NOT?)
The Hollywood graph we used contains 2,000,000
nodes. Most of them are completely unknown!
PageR...
The glass is half
empty
The glass is half
empty
Link Analysis sucks
The glass is half
empty
Link Analysis sucks
The graph itself does not contain useful information, in
general
The glass is half
empty
Link Analysis sucks
The graph itself does not contain useful information, in
general
Of little (or...
The glass is half full
The glass is half full
Link Analysis is good, but you cannot expect
any centrality index to work on any network
The glass is half full
Link Analysis is good, but you cannot expect
any centrality index to work on any network
Probably P...
The glass is half full
Link Analysis is good, but you cannot expect
any centrality index to work on any network
Probably P...
The glass is half full
Link Analysis is good, but you cannot expect
any centrality index to work on any network
Probably P...
The grand plan
The grand plan
Centrality indices / link analysis has been with us for 60+
years
The grand plan
Centrality indices / link analysis has been with us for 60+
years
IR, sociology, psychology all have given ...
The grand plan
Centrality indices / link analysis has been with us for 60+
years
IR, sociology, psychology all have given ...
My point, today
My point, today
By contract, I have to convince you that networks
contain a big deal of information useful for IR
My point, today
By contract, I have to convince you that networks
contain a big deal of information useful for IR
But you ...
My point, today
By contract, I have to convince you that networks
contain a big deal of information useful for IR
But you ...
Centrality in social sciences
a historical account
Centrality in social sciences
a historical account
First works by Bavelas at MIT (1948)
Centrality in social sciences
a historical account
First works by Bavelas at MIT (1948)
This sparked countless works (Bave...
Centrality in social sciences
a historical account
First works by Bavelas at MIT (1948)
This sparked countless works (Bave...
The 1990s revival
Link Analysis Ranking
The 1990s revival
Link Analysis Ranking
With the advent of search engines, there was a strong
revamp of centrality (LAR in...
The 1990s revival
Link Analysis Ranking
With the advent of search engines, there was a strong
revamp of centrality (LAR in...
The 1990s revival
Link Analysis Ranking
With the advent of search engines, there was a strong
revamp of centrality (LAR in...
The 1990s revival
Link Analysis Ranking
With the advent of search engines, there was a strong
revamp of centrality (LAR in...
The 1990s revival
Link Analysis Ranking
With the advent of search engines, there was a strong
revamp of centrality (LAR in...
The 1990s revival
Link Analysis Ranking
With the advent of search engines, there was a strong
revamp of centrality (LAR in...
What a mess!
What a mess!
Only few, brave guys tried to make some order in this
mess!
What a mess!
Only few, brave guys tried to make some order in this
mess!
Noteworthy (in the IR context): Craswell, Upstill...
What a mess!
Only few, brave guys tried to make some order in this
mess!
Noteworthy (in the IR context): Craswell, Upstill...
Begin the begin
Begin the begin
The only point on
which everybody
seems to agree: the
center of a star is more
important than the
other no...
Begin the begin
The only point on
which everybody
seems to agree: the
center of a star is more
important than the
other no...
Begin the begin
Begin the begin
I have the largest
degree
Begin the begin
Begin the begin
I am on most
shortest paths
Begin the begin
Begin the begin
I am maximally
close to everybody
Begin the begin
A tale of three tribes
A tale of three tribes
Indices based on degree
A tale of three tribes
Indices based on degree
Indices based on the number of paths or shortest paths
(geodesics) passing ...
A tale of three tribes
Indices based on degree
Indices based on the number of paths or shortest paths
(geodesics) passing ...
A tale of three tribes
Indices based on degree
Indices based on the number of paths or shortest paths
(geodesics) passing ...
Three dogs strive for a bone, and
the fourth runs away with it...
Three dogs strive for a bone, and
the fourth runs away with it...
The advent of Link Analysis pushed for a third
(winning)...
Three dogs strive for a bone, and
the fourth runs away with it...
The advent of Link Analysis pushed for a third
(winning)...
1) The degree tribe
1) The degree tribe
(In-)Degree centrality: the number of incoming links
cdeg(x) = d (x)
1) The degree tribe
(In-)Degree centrality: the number of incoming links
Careful: when dealing with directed networks, som...
2) The path tribe
2) The path tribe
Betweenness centrality (Anthonisse 1971):
cbetw(x) =
X
y,z6=x
yz(x)
yz
2) The path tribe
Betweenness centrality (Anthonisse 1971):
Fraction of shortest
paths from y to z
passing through x
cbetw...
2) The path tribe
Betweenness centrality (Anthonisse 1971):
cbetw(x) =
X
y,z6=x
yz(x)
yz
2) The path tribe
Betweenness centrality (Anthonisse 1971):
Katz centrality (Katz 1953):
cKatz(x) =
1X
t=0
↵t
⇧x(t)
cbetw(...
2) The path tribe
Betweenness centrality (Anthonisse 1971):
Katz centrality (Katz 1953):
# of paths of length t
ending in ...
2) The path tribe
Betweenness centrality (Anthonisse 1971):
Katz centrality (Katz 1953):
cKatz(x) =
1X
t=0
↵t
⇧x(t)
cbetw(...
3) The distance tribe
3) The distance tribe
Closeness centrality (Bavelas 1950):
cclos(x) =
1
P
y d(y, x)
3) The distance tribe
Closeness centrality (Bavelas 1950):
Distance from y
to x
cclos(x) =
1
P
y d(y, x)
3) The distance tribe
Closeness centrality (Bavelas 1950):
cclos(x) =
1
P
y d(y, x)
3) The distance tribe
Closeness centrality (Bavelas 1950):
Lin centrality (Lin 1976):
cclos(x) =
1
P
y d(y, x)
cLin(x) =
c...
3) The distance tribe
Closeness centrality (Bavelas 1950):
Lin centrality (Lin 1976):
cclos(x) =
1
P
y d(y, x)
cLin(x) =
c...
3) The distance tribe
Closeness centrality (Bavelas 1950):
Lin centrality (Lin 1976):
The summation is over all y such tha...
3) The distance tribe
Closeness centrality (Bavelas 1950):
Lin centrality (Lin 1976):
The summation is over all y such tha...
3) The distance tribe
a new member
3) The distance tribe
a new member
Give a warm welcome to Harmonic centrality:
charm(x) =
X
y6=x
1
d(y, x)
3) The distance tribe
a new member
Give a warm welcome to Harmonic centrality:
The denormalized reciprocal of the harmonic...
3) The distance tribe
a new member
Give a warm welcome to Harmonic centrality:
The denormalized reciprocal of the harmonic...
3) The distance tribe
a new member
Give a warm welcome to Harmonic centrality:
The denormalized reciprocal of the harmonic...
4) The spectral tribe
4) The spectral tribe
All based on the eigenstructure of some graph-related
matrix
PageRank
(Brin, Page, Motwani, Winograd 1999)
PageRank
(Brin, Page, Motwani, Winograd 1999)
The idea is to start from the basic equation...
cpr(x) =
X
y!x
cpr(y)
d+(y)
PageRank
(Brin, Page, Motwani, Winograd 1999)
The idea is to start from the basic equation...
...with an adjustment to mak...
PageRank
(Brin, Page, Motwani, Winograd 1999)
The idea is to start from the basic equation...
...with an adjustment to mak...
PageRank
(Brin, Page, Motwani, Winograd 1999)
The idea is to start from the basic equation...
...with an adjustment to mak...
PageRank
(Brin, Page, Motwani, Winograd 1999)
The idea is to start from the basic equation...
...with an adjustment to mak...
Seeley index
(Seeley 1949)
Seeley index
(Seeley 1949)
It is essentially like PageRank with no damping factor;
equivalently, the stable state of the n...
Seeley index
(Seeley 1949)
It is essentially like PageRank with no damping factor;
equivalently, the stable state of the n...
Seeley index
(Seeley 1949)
It is essentially like PageRank with no damping factor;
equivalently, the stable state of the n...
Seeley index
(Seeley 1949)
It is essentially like PageRank with no damping factor;
equivalently, the stable state of the n...
HITS
(Kleinberg 1997)
HITS
(Kleinberg 1997)
The idea is to start from the system:
cHauth(x) =
X
y!x
cHhub(y)
cHhub(x) =
X
x!y
cHauth(y)
HITS
(Kleinberg 1997)
The idea is to start from the system:
HITS centrality is defined to be the “authoritativeness”
score
...
HITS
(Kleinberg 1997)
The idea is to start from the system:
HITS centrality is defined to be the “authoritativeness”
score
...
HITS (, SALSA etc.)
HITS (, SALSA etc.)
WARNING: These measures were proposed exactly
for ranking results in hyperlinked collections
HITS (, SALSA etc.)
WARNING: These measures were proposed exactly
for ranking results in hyperlinked collections
Should be...
HITS (, SALSA etc.)
WARNING: These measures were proposed exactly
for ranking results in hyperlinked collections
Should be...
HITS (, SALSA etc.)
WARNING: These measures were proposed exactly
for ranking results in hyperlinked collections
Should be...
SALSA
(Lempel, Moran 2001)
SALSA
(Lempel, Moran 2001)
The idea is to start from the system:
cSauth(x) =
X
y!x
cShub(y)
d+(y)
cShub(x) =
X
x!y
cSauth(...
SALSA
(Lempel, Moran 2001)
The idea is to start from the system:
SALSA centrality is defined to be the
“authoritativeness” ...
SALSA
(Lempel, Moran 2001)
The idea is to start from the system:
SALSA centrality is defined to be the
“authoritativeness” ...
How?
How to assess centrality
How?
How to assess centrality
Axiomatic approach
How?
How to assess centrality
Axiomatic approach
Ground-truth approach
How?
How to assess centrality
Axiomatic approach
Ground-truth approach
IR approach
How?
How to assess centrality
Axiomatic approach
Ground-truth approach
IR approach
Computational feasibility approach
Axiomatic lens
Axiomatic lens
start from some minimal mathematical requirements
Axiomatic
Sensitivity to size
Axiomatic
Sensitivity to size
k p
Two disjoint (or
very far)
components of a
single network
Axiomatic
Sensitivity to size
When k or p goes to ∞, the nodes of the
corresponding subnetwork must become
more important
...
Axiomatic
Sensitivity to density
Axiomatic
Sensitivity to density
The blue and the red node have the same
importance (the two rings have the same size!)
Axiomatic
Sensitivity to density
The blue and the red node have the same
importance (the two rings have the same size!)
Axiomatic
Sensitivity to density
Densifying the left-hand side, we expect the red node to
become more important than the b...
Axiomatic
Monotonicity
G
x
y
G’
x
y
Axiomatic
Monotonicity
Adding an arc increases the importance of the
target node
G
x
y
G’
x
y
An axiomatic slaughter
An axiomatic slaughter
An axiomatic slaughter
Size Density Monot.
Degree only k yes yes
Betweennes
s
only p no no
Katz only k yes yes
Closeness n...
An axiomatic slaughter
Size Density Monot.
Degree only k yes yes
Betweennes
s
only p no no
Katz only k yes yes
Closeness n...
An axiomatic slaughter
Size Density Monot.
Degree only k yes yes
Betweennes
s
only p no no
Katz only k yes yes
Closeness n...
An axiomatic slaughter
Size Density Monot.
Degree only k yes yes
Betweennes
s
only p no no
Katz only k yes yes
Closeness n...
Ground-truth lens
(mostly: anecdoctic / comparative)
Ground-truth lens
(mostly: anecdoctic / comparative)
check them against real (social?) networks on which you have
some gro...
Hollywood: PageRank
Ron Jeremy Adolf Hitler Lloyd Kaufman George W. Bush
Ronald Reagan Bill Clinton Martin Sheen Debbie Ro...
Hollywood: PageRank
Ron Jeremy Adolf Hitler Lloyd Kaufman George W. Bush
Ronald Reagan Bill Clinton Martin Sheen Debbie Ro...
Hollywood: Degree
William Shatner Bess Flowers Martin Sheen Ronald Reagan
George Clooney Samuel Jackson Robin Williams Tom...
Hollywood: Degree
William Shatner Bess Flowers Martin Sheen Ronald Reagan
George Clooney Samuel Jackson Robin Williams Tom...
Hollywood: Betweenness
Adolf Hitler Lloyd Kaufman Ron Jeremy Tony Robinson
Olu Jacobs Max von Sydow Udo Kier George W. Bush
Hollywood: Betweenness
Adolf Hitler Lloyd Kaufman Ron Jeremy Tony Robinson
Olu Jacobs Max von Sydow Udo Kier George W. Bush
Hollywood: Katz
William Shatner Martin Sheen
George Clooney
Robin WilliamsTom Hanks
Ronald Reagan Bruce Willis Samuel Jack...
Hollywood: Katz
William Shatner Martin Sheen
George Clooney
Robin WilliamsTom Hanks
Ronald Reagan Bruce Willis Samuel Jack...
Hollywood: Closeness
Lina Tjeng Ryan VillapotoAnh Loan Nguyen Thi Chad Reed
Bjorn van Wenum J.P. Ramackers Herbert Sydney ...
Hollywood: Closeness
Lina Tjeng Ryan VillapotoAnh Loan Nguyen Thi Chad Reed
Bjorn van Wenum J.P. Ramackers Herbert Sydney ...
Hollywood: Closeness
Lina Tjeng Ryan VillapotoAnh Loan Nguyen Thi Chad Reed
Bjorn van Wenum J.P. Ramackers Herbert Sydney ...
Hollywood: Lin
George Clooney Samuel Jackson Martin Sheen Dennis Hopper
Antonio Banderas Madonna Michael Douglas Tom Cruise
Hollywood: HITS
William Shatner Robin Williams Bruce WillisTom Hanks
Michael Douglas Cameron Diaz Harrison Ford Pierce Bro...
Hollywood: Harmonic
George Clooney Samuel Jackson Sharon Stone Tom Hanks
Martin Sheen Dennis Hopper Antonio Banderas Madon...
What about the web?
.uk Top Ten
What about the web?
.uk Top Ten
What about the web?
.uk Top Ten
PageRank Katz Lin Harmonic
http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www....
What about the web?
.uk Top Ten
PageRank Katz Lin Harmonic
http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www....
What about the web?
.uk Top Ten
PageRank Katz Lin Harmonic
http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www....
What about the web?
.uk Top Ten
PageRank Katz Lin Harmonic
http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www....
How do they compare?
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9709 0.92...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9709 0.92...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9709 0.92...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9709 0.92...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9709 0.92...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9709 0.92...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9709 0.92...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9709 0.92...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9053 0.9024 0.900...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9053 0.9024 0.900...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9053 0.9024 0.900...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9053 0.9024 0.900...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9053 0.9024 0.900...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9053 0.9024 0.900...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9053 0.9024 0.900...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9053 0.9024 0.900...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9053 0.9024 0.900...
Orkut (2007 snapshot)
Kendall’s τ
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9522 0.8982 0.824...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9522 0.8982 0.824...
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4
degree 1.0000 0.9522 0.8982 0.824...
IR lens
(using TREC .gov2)
IR lens
(using TREC .gov2)
use centrality (in isolation or combined with textual features)
to rerank query results and see...
TREC .gov2
TREC .gov2
150 queries (query title words, in AND; with stemming,
no stopword elimination)
TREC .gov2
150 queries (query title words, in AND; with stemming,
no stopword elimination)
Generated the result graph usin...
TREC .gov2
150 queries (query title words, in AND; with stemming,
no stopword elimination)
Generated the result graph usin...
TREC .gov2
150 queries (query title words, in AND; with stemming,
no stopword elimination)
Generated the result graph usin...
P@10 and NDCG@10
P@10 and NDCG@10
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Betweenness 0.058...
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Betweenness 0.058...
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Betweenness 0.058...
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Betweenness 0.058...
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Betweenness 0.058...
Intra-host links?
Intra-host links?
Keep them or throw them away?
Intra-host links?
Keep them or throw them away?
Most indices get better if you throw them away...
Intra-host links?
Keep them or throw them away?
Most indices get better if you throw them away...
Throwing such links away...
Intra-host links?
Keep them or throw them away?
Most indices get better if you throw them away...
Throwing such links away...
.uk (May 2007 snapshot)
Kendall’s τ with no intra-host links
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Li...
.uk (May 2007 snapshot)
Kendall’s τ with no intra-host links
Everything becomes correlated. Why???
degree Katz 1/4 Katz 1/...
.uk (May 2007 snapshot)
Kendall’s τ with no intra-host links
degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Li...
.uk (May 2007 snapshot)
Kendall’s τ with no intra-host links
Because most nodes have degree 0 (and hence
they also have al...
P@10 and NDCG@10
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Weighted degree 0...
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Weighted degree 0...
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Weighted degree 0...
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Weighted degree 0...
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Weighted degree 0...
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Weighted degree 0...
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Weighted degree 0...
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Weighted degree 0...
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Weighted degree 0...
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Weighted degree 0...
P@10 and NDCG@10
All linksAll links Inter-host links onlyInter-host links only
P@10 NDCG@10 P@10 NDCG@10
Weighted degree 0...
Computational feasibility lens
Computational feasibility lens
which indices are computable effectively on large
networks; consider also parallelizability ...
Best algorithms so far
Best algorithms so far
Best algorithms so far
How? Progr.
Degree Trivial O(1) - - +++
Betweennes
s
O(nm) [Brandes 2001] - no -
Katz Iterative (Ga...
Best algorithms so far
How? Progr.
Degree Trivial O(1) - - +++
Betweennes
s
O(nm) [Brandes 2001] - no -
Katz Iterative (Ga...
Best algorithms so far
How? Progr.
Degree Trivial O(1) - - +++
Betweennes
s
O(nm) [Brandes 2001] - no -
Katz Iterative (Ga...
Best algorithms so far
How? Progr.
Degree Trivial O(1) - - +++
Betweennes
s
O(nm) [Brandes 2001] - no -
Katz Iterative (Ga...
Best algorithms so far
How? Progr.
Degree Trivial O(1) - - +++
Betweennes
s
O(nm) [Brandes 2001] - no -
Katz Iterative (Ga...
Best algorithms so far
How? Progr.
Degree Trivial O(1) - - +++
Betweennes
s
O(nm) [Brandes 2001] - no -
Katz Iterative (Ga...
Best algorithms so far
How? Progr.
Degree Trivial O(1) - - +++
Betweennes
s
O(nm) [Brandes 2001] - no -
Katz Iterative (Ga...
Best algorithms so far
How? Progr.
Degree Trivial O(1) - - +++
Betweennes
s
O(nm) [Brandes 2001] - no -
Katz Iterative (Ga...
Geometric
Geometric
What about geometic measures, like closeness, Lin
and harmonic?
Computing harmonic
Computing harmonic
Let us take harmonic as an example
Computing harmonic
Let us take harmonic as an example
But how easy is it to compute?
charm(x) =
X
y6=x
1
d(y, x)
Computing harmonic
Let us take harmonic as an example
But how easy is it to compute?
...or...
charm(x) =
X
y6=x
1
d(y, x)
...
Computing harmonic
Let us take harmonic as an example
But how easy is it to compute?
...or...
charm(x) =
X
y6=x
1
d(y, x)
...
Computing harmonic
Let us take harmonic as an example
But how easy is it to compute?
...or...
charm(x) =
X
y6=x
1
d(y, x)
...
Computing by diffusion
Computing by diffusion
Clearly B0(x) = {x}
Computing by diffusion
Clearly
Moreover
B0(x) = {x}
Bt+1(x) = {x} [
[
x!y
Bt(y)
Computing by diffusion
Clearly
Moreover
So one needs just one single sequential scan of the
graph to compute the balls at ...
A round of updates
☺
☺
☺
☺
☺
☺
☺
☺
☺
☺
☺
☺
☺☺
☺
☺
☺
☺
☺☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺
☺
☺
☺
☺
☺
☺☺
☺
☺
☺
☺
☺☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺
☺
☺
☺
☺
☺☺
☺
☺
☺
☺
☺☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺ ☺
☺
☺
☺
☺☺
☺
☺
☺
☺
☺☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺ ☺
☺
☺
☺
☺☺
☺
☺
☺
☺
☺☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺ ☺
☺ ☺
☺
☺☺
☺
☺
☺
☺
☺☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺ ☺
☺ ☺
☺
☺☺
☺
☺
☺
☺
☺☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺ ☺
☺ ☺
☺☺
☺
☺
☺
☺
☺
☺☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺ ☺
☺ ☺
☺☺
☺
☺
☺
☺
☺
☺☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺ ☺
☺ ☺
☺☺
☺☺
☺
☺
☺
☺☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺ ☺
☺ ☺
☺☺
☺☺ ☺
☺
☺
☺☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺ ☺
☺ ☺
☺☺
☺☺ ☺
☺
☺
☺☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺ ☺
☺ ☺
☺☺
☺☺ ☺
☺☺
☺☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺ ☺
☺ ☺
☺☺
☺☺ ☺
☺☺ ☺
☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺ ☺
☺ ☺
☺☺
☺☺ ☺
☺☺ ☺
☺
☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺ ☺
☺ ☺
☺☺
☺☺ ☺
☺☺ ☺
☺☺
☺
A round of updates
☺
☺
☺
☺
☺
☺
☺☺ ☺
☺ ☺
☺☺
☺☺ ☺
☺☺ ☺
☺☺ ☺
Another round...
☺☺
☺ ☺ ☺
☺☺ ☺
☺ ☺
☺☺ ☺
☺☺☺
☺ ☺ ☺
☺ ☺
☺☺☺
☺ ☺
☺☺ ☺
☺☺
☺☺☺
Another round...
☺☺
☺ ☺ ☺
☺☺ ☺
☺ ☺
☺☺ ☺
☺☺☺
☺ ☺ ☺
☺ ☺
☺☺☺
☺ ☺
☺☺ ☺
☺☺
☺☺☺
Another round...
☺☺
☺ ☺ ☺
☺☺ ☺
☺ ☺
☺☺ ☺
☺☺☺
☺ ☺ ☺☺ ☺
☺☺☺
☺ ☺
☺☺ ☺
☺☺
☺☺☺
Another round...
☺☺
☺ ☺ ☺
☺☺ ☺
☺ ☺
☺☺ ☺
☺☺☺
☺ ☺ ☺☺ ☺☺☺☺
☺ ☺
☺☺ ☺
☺☺
☺☺☺
Another round...
☺☺
☺ ☺ ☺
☺☺ ☺
☺ ☺
☺☺ ☺
☺☺☺
☺ ☺ ☺☺ ☺☺☺☺
☺ ☺
☺☺ ☺
☺☺
☺☺☺
Another round...
☺☺
☺ ☺ ☺
☺☺ ☺
☺ ☺
☺☺ ☺
☺☺☺
☺ ☺ ☺☺ ☺☺☺☺
☺ ☺☺☺ ☺
☺☺
☺☺☺
Another round...
☺☺
☺ ☺ ☺
☺☺ ☺
☺ ☺
☺☺ ☺
☺☺☺
☺ ☺ ☺☺ ☺☺☺☺
☺ ☺☺☺ ☺
☺☺
☺☺☺
Another round...
☺☺
☺ ☺ ☺
☺☺ ☺
☺ ☺
☺☺ ☺
☺☺☺
☺ ☺ ☺☺ ☺☺☺☺
☺ ☺☺☺ ☺
☺☺☺☺☺
Easy but expensive
Easy but expensive
Each set uses linear space; overall quadratic
Easy but expensive
Each set uses linear space; overall quadratic
Impossible!
Easy but expensive
Each set uses linear space; overall quadratic
Impossible!
But what if we use approximate sets?
Easy but expensive
Each set uses linear space; overall quadratic
Impossible!
But what if we use approximate sets?
Idea: us...
Easy but expensive
Each set uses linear space; overall quadratic
Impossible!
But what if we use approximate sets?
Idea: us...
Use the Force
Use the Force
HyperBall (http://webgraph.dsi.unimi.it/)
does the job
Use the Force
HyperBall (http://webgraph.dsi.unimi.it/)
does the job
Fully exploits multicore architectures; uses broadwor...
Use the Force
HyperBall (http://webgraph.dsi.unimi.it/)
does the job
Fully exploits multicore architectures; uses broadwor...
What is HyperBall
What is HyperBall
It uses the diffusion idea to compute (at the same
time):
What is HyperBall
It uses the diffusion idea to compute (at the same
time):
Lin + Closeness + Harmonic + Number of reachabl...
What is HyperBall
It uses the diffusion idea to compute (at the same
time):
Lin + Closeness + Harmonic + Number of reachabl...
What is HyperBall
It uses the diffusion idea to compute (at the same
time):
Lin + Closeness + Harmonic + Number of reachabl...
Main trick
Main trick
Choose an approximate set such that unions can be
computed quickly
Main trick
Choose an approximate set such that unions can be
computed quickly
ANF [Palmer et al., KDD ’02] uses Martin–Fla...
Main trick
Choose an approximate set such that unions can be
computed quickly
ANF [Palmer et al., KDD ’02] uses Martin–Fla...
Main trick
Choose an approximate set such that unions can be
computed quickly
ANF [Palmer et al., KDD ’02] uses Martin–Fla...
Main trick
Choose an approximate set such that unions can be
computed quickly
ANF [Palmer et al., KDD ’02] uses Martin–Fla...
HyperLogLog counters
HyperLogLog counters
Instead of actually counting, we observe a statistical
feature of a set (think stream) of elements
HyperLogLog counters
Instead of actually counting, we observe a statistical
feature of a set (think stream) of elements
Th...
HyperLogLog counters
Instead of actually counting, we observe a statistical
feature of a set (think stream) of elements
Th...
HyperLogLog counters
Instead of actually counting, we observe a statistical
feature of a set (think stream) of elements
Th...
HyperLogLog counters
Instead of actually counting, we observe a statistical
feature of a set (think stream) of elements
Th...
Other ideas
Other ideas
We keep track of modifications: we do not maximize
with unmodified counters
Other ideas
We keep track of modifications: we do not maximize
with unmodified counters
Systolic computation: each modified s...
Other ideas
We keep track of modifications: we do not maximize
with unmodified counters
Systolic computation: each modified s...
Footprint
Footprint
Scalability: a minimum of 20 bytes per node
Footprint
Scalability: a minimum of 20 bytes per node
On a 2TiB machine, 100 billion nodes
Footprint
Scalability: a minimum of 20 bytes per node
On a 2TiB machine, 100 billion nodes
Graph structure is accessed by ...
Footprint
Scalability: a minimum of 20 bytes per node
On a 2TiB machine, 100 billion nodes
Graph structure is accessed by ...
Performance
Performance
On a 177K nodes
Performance
On a 177K nodes
Hadoop: 2875s per iteration [Kang, Papadimitriou,
Sun and H. Tong, 2011]
Performance
On a 177K nodes
Hadoop: 2875s per iteration [Kang, Papadimitriou,
Sun and H. Tong, 2011]
HyperBall on this lap...
Performance
On a 177K nodes
Hadoop: 2875s per iteration [Kang, Papadimitriou,
Sun and H. Tong, 2011]
HyperBall on this lap...
Performance
On a 177K nodes
Hadoop: 2875s per iteration [Kang, Papadimitriou,
Sun and H. Tong, 2011]
HyperBall on this lap...
Convergence
5 15 25 35 45 55 65 75 85 95
0.0000.0020.0040.006
Harmonic centrality
# runs
Relativeerror
And when the Force is not enough?
And when the Force is not enough?
Diffusion processes are easily distributable
And when the Force is not enough?
Diffusion processes are easily distributable
A Pregel-like implementation of HyperANF is
...
Did you see the
light?
Did you see the
light?
No? Me neither... But we have a path
Did you see the
light?
No? Me neither... But we have a path
Some things have been ruled out, at least
Did you see the
light?
No? Me neither... But we have a path
Some things have been ruled out, at least
Some others, that we...
Did you see the
light?
No? Me neither... But we have a path
Some things have been ruled out, at least
Some others, that we...
Questions?
Centralities_PaoloBoldi
Upcoming SlideShare
Loading in …5
×

Centralities_PaoloBoldi

1,490 views

Published on

Видео доклада на научном семинаре 23 сентября

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,490
On SlideShare
0
From Embeds
0
Number of Embeds
419
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Centralities_PaoloBoldi

  1. 1. A MODERN VIEW OF CENTRALITY MEASURES Paolo Boldi Univ. degli Studi di Milano Work in progress with Sebastiano Vigna
  2. 2. What do these people have in common?
  3. 3. What do these people have in common?
  4. 4. What do these people have in common?
  5. 5. What do these people have in common?
  6. 6. What do these people have in common?
  7. 7. PageRank (believe it or not)
  8. 8. PageRank (believe it or not) These are the top-8 actors of the Hollywood graph according to PageRank
  9. 9. PageRank (believe it or not) These are the top-8 actors of the Hollywood graph according to PageRank Who is going to tell that is better?
  10. 10. PageRank (believe it or not) These are the top-8 actors of the Hollywood graph according to PageRank Who is going to tell that is better?
  11. 11. PageRank (believe it or not) These are the top-8 actors of the Hollywood graph according to PageRank Who is going to tell that is better?
  12. 12. PageRank sucks! (or NOT?)
  13. 13. PageRank sucks! (or NOT?) The Hollywood graph we used contains 2,000,000 nodes. Most of them are completely unknown!
  14. 14. PageRank sucks! (or NOT?) The Hollywood graph we used contains 2,000,000 nodes. Most of them are completely unknown! PageRank is not singling out the best actors...
  15. 15. PageRank sucks! (or NOT?) The Hollywood graph we used contains 2,000,000 nodes. Most of them are completely unknown! PageRank is not singling out the best actors... ...but still it is not pointing to random individuals, is it?
  16. 16. The glass is half empty
  17. 17. The glass is half empty Link Analysis sucks
  18. 18. The glass is half empty Link Analysis sucks The graph itself does not contain useful information, in general
  19. 19. The glass is half empty Link Analysis sucks The graph itself does not contain useful information, in general Of little (or no) help for IR
  20. 20. The glass is half full
  21. 21. The glass is half full Link Analysis is good, but you cannot expect any centrality index to work on any network
  22. 22. The glass is half full Link Analysis is good, but you cannot expect any centrality index to work on any network Probably PageRank is scarcely useful on Hollywood, but maybe other measures would work like a charm
  23. 23. The glass is half full Link Analysis is good, but you cannot expect any centrality index to work on any network Probably PageRank is scarcely useful on Hollywood, but maybe other measures would work like a charm Betweenness? Closeness? Katz? ...
  24. 24. The glass is half full Link Analysis is good, but you cannot expect any centrality index to work on any network Probably PageRank is scarcely useful on Hollywood, but maybe other measures would work like a charm Betweenness? Closeness? Katz? ... A problem here: some indices are computationa!y unfeasible on large networks!
  25. 25. The grand plan
  26. 26. The grand plan Centrality indices / link analysis has been with us for 60+ years
  27. 27. The grand plan Centrality indices / link analysis has been with us for 60+ years IR, sociology, psychology all have given their contributions...
  28. 28. The grand plan Centrality indices / link analysis has been with us for 60+ years IR, sociology, psychology all have given their contributions... ...turning it into a jungle
  29. 29. My point, today
  30. 30. My point, today By contract, I have to convince you that networks contain a big deal of information useful for IR
  31. 31. My point, today By contract, I have to convince you that networks contain a big deal of information useful for IR But you have to use them in a proper way (i.e., to compute suitable indices)
  32. 32. My point, today By contract, I have to convince you that networks contain a big deal of information useful for IR But you have to use them in a proper way (i.e., to compute suitable indices) And more often than not this calls for new algorithms because of their size (and sometimes density)
  33. 33. Centrality in social sciences a historical account
  34. 34. Centrality in social sciences a historical account First works by Bavelas at MIT (1948)
  35. 35. Centrality in social sciences a historical account First works by Bavelas at MIT (1948) This sparked countless works (Bavelas 1951; Katz 1953; Shaw 1954; Beauchamp 1965; Mackenzie 1966; Burgess 1969; Anthonisse 1971; Czapiel 1974...) that Freeman (1979) tried to summarize concluding that:
  36. 36. Centrality in social sciences a historical account First works by Bavelas at MIT (1948) This sparked countless works (Bavelas 1951; Katz 1953; Shaw 1954; Beauchamp 1965; Mackenzie 1966; Burgess 1969; Anthonisse 1971; Czapiel 1974...) that Freeman (1979) tried to summarize concluding that: several measures are o"en only vaguely related to the intuitive ideas they purport to index, and many are so complex that it is difficult or impossible to discover what, if anything, they are measuring
  37. 37. The 1990s revival Link Analysis Ranking
  38. 38. The 1990s revival Link Analysis Ranking With the advent of search engines, there was a strong revamp of centrality (LAR in this context)
  39. 39. The 1990s revival Link Analysis Ranking With the advent of search engines, there was a strong revamp of centrality (LAR in this context) New scenarios
  40. 40. The 1990s revival Link Analysis Ranking With the advent of search engines, there was a strong revamp of centrality (LAR in this context) New scenarios • graphs are directed mainly
  41. 41. The 1990s revival Link Analysis Ranking With the advent of search engines, there was a strong revamp of centrality (LAR in this context) New scenarios • graphs are directed mainly • they are huge
  42. 42. The 1990s revival Link Analysis Ranking With the advent of search engines, there was a strong revamp of centrality (LAR in this context) New scenarios • graphs are directed mainly • they are huge • new attention to efficiency
  43. 43. The 1990s revival Link Analysis Ranking With the advent of search engines, there was a strong revamp of centrality (LAR in this context) New scenarios • graphs are directed mainly • they are huge • new attention to efficiency LAR is the ungrateful reincarnation of Centrality
  44. 44. What a mess!
  45. 45. What a mess! Only few, brave guys tried to make some order in this mess!
  46. 46. What a mess! Only few, brave guys tried to make some order in this mess! Noteworthy (in the IR context): Craswell, Upstill, Hawking (ADCS 2003); Najork, Zaragoza, Taylor (SIGIR 2007); Najork, Gollapudi, Panigrahy (WSDM 2009)
  47. 47. What a mess! Only few, brave guys tried to make some order in this mess! Noteworthy (in the IR context): Craswell, Upstill, Hawking (ADCS 2003); Najork, Zaragoza, Taylor (SIGIR 2007); Najork, Gollapudi, Panigrahy (WSDM 2009) Guess that the others were scared by computational burden of classical measures
  48. 48. Begin the begin
  49. 49. Begin the begin The only point on which everybody seems to agree: the center of a star is more important than the other nodes
  50. 50. Begin the begin The only point on which everybody seems to agree: the center of a star is more important than the other nodes But what does it make more important?
  51. 51. Begin the begin
  52. 52. Begin the begin I have the largest degree
  53. 53. Begin the begin
  54. 54. Begin the begin I am on most shortest paths
  55. 55. Begin the begin
  56. 56. Begin the begin I am maximally close to everybody
  57. 57. Begin the begin
  58. 58. A tale of three tribes
  59. 59. A tale of three tribes Indices based on degree
  60. 60. A tale of three tribes Indices based on degree Indices based on the number of paths or shortest paths (geodesics) passing through a vertex;
  61. 61. A tale of three tribes Indices based on degree Indices based on the number of paths or shortest paths (geodesics) passing through a vertex; Indices based on distances from the vertex to the other vertices
  62. 62. A tale of three tribes Indices based on degree Indices based on the number of paths or shortest paths (geodesics) passing through a vertex; Indices based on distances from the vertex to the other vertices Let me call these indices geometric
  63. 63. Three dogs strive for a bone, and the fourth runs away with it...
  64. 64. Three dogs strive for a bone, and the fourth runs away with it... The advent of Link Analysis pushed for a third (winning) tribe: spectral indices
  65. 65. Three dogs strive for a bone, and the fourth runs away with it... The advent of Link Analysis pushed for a third (winning) tribe: spectral indices Some of the geometric indices have also a spectral (equivalent) definition
  66. 66. 1) The degree tribe
  67. 67. 1) The degree tribe (In-)Degree centrality: the number of incoming links cdeg(x) = d (x)
  68. 68. 1) The degree tribe (In-)Degree centrality: the number of incoming links Careful: when dealing with directed networks, some indices present two variants (e.g., in-degree vs. out- degree), the ones based on incoming paths being more interesting cdeg(x) = d (x)
  69. 69. 2) The path tribe
  70. 70. 2) The path tribe Betweenness centrality (Anthonisse 1971): cbetw(x) = X y,z6=x yz(x) yz
  71. 71. 2) The path tribe Betweenness centrality (Anthonisse 1971): Fraction of shortest paths from y to z passing through x cbetw(x) = X y,z6=x yz(x) yz
  72. 72. 2) The path tribe Betweenness centrality (Anthonisse 1971): cbetw(x) = X y,z6=x yz(x) yz
  73. 73. 2) The path tribe Betweenness centrality (Anthonisse 1971): Katz centrality (Katz 1953): cKatz(x) = 1X t=0 ↵t ⇧x(t) cbetw(x) = X y,z6=x yz(x) yz
  74. 74. 2) The path tribe Betweenness centrality (Anthonisse 1971): Katz centrality (Katz 1953): # of paths of length t ending in x cKatz(x) = 1X t=0 ↵t ⇧x(t) cbetw(x) = X y,z6=x yz(x) yz
  75. 75. 2) The path tribe Betweenness centrality (Anthonisse 1971): Katz centrality (Katz 1953): cKatz(x) = 1X t=0 ↵t ⇧x(t) cbetw(x) = X y,z6=x yz(x) yz
  76. 76. 3) The distance tribe
  77. 77. 3) The distance tribe Closeness centrality (Bavelas 1950): cclos(x) = 1 P y d(y, x)
  78. 78. 3) The distance tribe Closeness centrality (Bavelas 1950): Distance from y to x cclos(x) = 1 P y d(y, x)
  79. 79. 3) The distance tribe Closeness centrality (Bavelas 1950): cclos(x) = 1 P y d(y, x)
  80. 80. 3) The distance tribe Closeness centrality (Bavelas 1950): Lin centrality (Lin 1976): cclos(x) = 1 P y d(y, x) cLin(x) = canReach(x)2 P y d(y, x)
  81. 81. 3) The distance tribe Closeness centrality (Bavelas 1950): Lin centrality (Lin 1976): cclos(x) = 1 P y d(y, x) cLin(x) = canReach(x)2 P y d(y, x)
  82. 82. 3) The distance tribe Closeness centrality (Bavelas 1950): Lin centrality (Lin 1976): The summation is over all y such that d(y,x)<∞ cclos(x) = 1 P y d(y, x) cLin(x) = canReach(x)2 P y d(y, x)
  83. 83. 3) The distance tribe Closeness centrality (Bavelas 1950): Lin centrality (Lin 1976): The summation is over all y such that d(y,x)<∞ cclos(x) = 1 P y d(y, x) cLin(x) = canReach(x)2 P y d(y, x) Completely neglected by the literature
  84. 84. 3) The distance tribe a new member
  85. 85. 3) The distance tribe a new member Give a warm welcome to Harmonic centrality: charm(x) = X y6=x 1 d(y, x)
  86. 86. 3) The distance tribe a new member Give a warm welcome to Harmonic centrality: The denormalized reciprocal of the harmonic mean of a! distances (even ∞) charm(x) = X y6=x 1 d(y, x)
  87. 87. 3) The distance tribe a new member Give a warm welcome to Harmonic centrality: The denormalized reciprocal of the harmonic mean of a! distances (even ∞) Inspired by the use the the harmonic mean in (Marchiori, Latora 2000) charm(x) = X y6=x 1 d(y, x)
  88. 88. 3) The distance tribe a new member Give a warm welcome to Harmonic centrality: The denormalized reciprocal of the harmonic mean of a! distances (even ∞) Inspired by the use the the harmonic mean in (Marchiori, Latora 2000) Probably already appeared somewhere (e.g., quoted for undirected graphs in Tore Opsahl’s blog) charm(x) = X y6=x 1 d(y, x)
  89. 89. 4) The spectral tribe
  90. 90. 4) The spectral tribe All based on the eigenstructure of some graph-related matrix
  91. 91. PageRank (Brin, Page, Motwani, Winograd 1999)
  92. 92. PageRank (Brin, Page, Motwani, Winograd 1999) The idea is to start from the basic equation... cpr(x) = X y!x cpr(y) d+(y)
  93. 93. PageRank (Brin, Page, Motwani, Winograd 1999) The idea is to start from the basic equation... ...with an adjustment to make it have a unique solution (and more)... cpr(x) = X y!x cpr(y) d+(y) cpr(x) = ↵ X y!x cpr(y) d+(y) + 1 ↵ n
  94. 94. PageRank (Brin, Page, Motwani, Winograd 1999) The idea is to start from the basic equation... ...with an adjustment to make it have a unique solution (and more)... cpr(x) = X y!x cpr(y) d+(y) cpr(x) = ↵ X y!x cpr(y) d+(y) + 1 ↵ n
  95. 95. PageRank (Brin, Page, Motwani, Winograd 1999) The idea is to start from the basic equation... ...with an adjustment to make it have a unique solution (and more)... It is the dominant eigenvector of cpr(x) = X y!x cpr(y) d+(y) cpr(x) = ↵ X y!x cpr(y) d+(y) + 1 ↵ n ↵Gr + (1 ↵)1T 1/n
  96. 96. PageRank (Brin, Page, Motwani, Winograd 1999) The idea is to start from the basic equation... ...with an adjustment to make it have a unique solution (and more)... It is the dominant eigenvector of Katz is the dominant eigenvector of cpr(x) = X y!x cpr(y) d+(y) cpr(x) = ↵ X y!x cpr(y) d+(y) + 1 ↵ n ↵Gr + (1 ↵)1T 1/n ↵G + (1 ↵)eT 1/n
  97. 97. Seeley index (Seeley 1949)
  98. 98. Seeley index (Seeley 1949) It is essentially like PageRank with no damping factor; equivalently, the stable state of the natural random walk on G: cSeeley(x) = ✓ lim t!1 1 n (Gr)t ◆ x
  99. 99. Seeley index (Seeley 1949) It is essentially like PageRank with no damping factor; equivalently, the stable state of the natural random walk on G: cSeeley(x) = ✓ lim t!1 1 n (Gr)t ◆ x
  100. 100. Seeley index (Seeley 1949) It is essentially like PageRank with no damping factor; equivalently, the stable state of the natural random walk on G: It is obtained as the limit of PageRank when the damping goes to 1 cSeeley(x) = ✓ lim t!1 1 n (Gr)t ◆ x
  101. 101. Seeley index (Seeley 1949) It is essentially like PageRank with no damping factor; equivalently, the stable state of the natural random walk on G: It is obtained as the limit of PageRank when the damping goes to 1 In general it is a dominant eigenvector of cSeeley(x) = ✓ lim t!1 1 n (Gr)t ◆ x Gr
  102. 102. HITS (Kleinberg 1997)
  103. 103. HITS (Kleinberg 1997) The idea is to start from the system: cHauth(x) = X y!x cHhub(y) cHhub(x) = X x!y cHauth(y)
  104. 104. HITS (Kleinberg 1997) The idea is to start from the system: HITS centrality is defined to be the “authoritativeness” score cHauth(x) = X y!x cHhub(y) cHhub(x) = X x!y cHauth(y)
  105. 105. HITS (Kleinberg 1997) The idea is to start from the system: HITS centrality is defined to be the “authoritativeness” score It is a dominant eigenvector of cHauth(x) = X y!x cHhub(y) cHhub(x) = X x!y cHauth(y) GT G
  106. 106. HITS (, SALSA etc.)
  107. 107. HITS (, SALSA etc.) WARNING: These measures were proposed exactly for ranking results in hyperlinked collections
  108. 108. HITS (, SALSA etc.) WARNING: These measures were proposed exactly for ranking results in hyperlinked collections Should be applied not to the whole graph, but to a suitable subgraph derived from the query
  109. 109. HITS (, SALSA etc.) WARNING: These measures were proposed exactly for ranking results in hyperlinked collections Should be applied not to the whole graph, but to a suitable subgraph derived from the query How the subgraph is derived is very relevant for effectiveness (Najork, Gollapudi, Panighray 2009)
  110. 110. HITS (, SALSA etc.) WARNING: These measures were proposed exactly for ranking results in hyperlinked collections Should be applied not to the whole graph, but to a suitable subgraph derived from the query How the subgraph is derived is very relevant for effectiveness (Najork, Gollapudi, Panighray 2009) Not really the central point here, though...
  111. 111. SALSA (Lempel, Moran 2001)
  112. 112. SALSA (Lempel, Moran 2001) The idea is to start from the system: cSauth(x) = X y!x cShub(y) d+(y) cShub(x) = X x!y cSauth(y) d (y)
  113. 113. SALSA (Lempel, Moran 2001) The idea is to start from the system: SALSA centrality is defined to be the “authoritativeness” score cSauth(x) = X y!x cShub(y) d+(y) cShub(x) = X x!y cSauth(y) d (y)
  114. 114. SALSA (Lempel, Moran 2001) The idea is to start from the system: SALSA centrality is defined to be the “authoritativeness” score It is a dominant eigenvector of cSauth(x) = X y!x cShub(y) d+(y) cShub(x) = X x!y cSauth(y) d (y) GT c Gr
  115. 115. How? How to assess centrality
  116. 116. How? How to assess centrality Axiomatic approach
  117. 117. How? How to assess centrality Axiomatic approach Ground-truth approach
  118. 118. How? How to assess centrality Axiomatic approach Ground-truth approach IR approach
  119. 119. How? How to assess centrality Axiomatic approach Ground-truth approach IR approach Computational feasibility approach
  120. 120. Axiomatic lens
  121. 121. Axiomatic lens start from some minimal mathematical requirements
  122. 122. Axiomatic Sensitivity to size
  123. 123. Axiomatic Sensitivity to size k p Two disjoint (or very far) components of a single network
  124. 124. Axiomatic Sensitivity to size When k or p goes to ∞, the nodes of the corresponding subnetwork must become more important k p Two disjoint (or very far) components of a single network
  125. 125. Axiomatic Sensitivity to density
  126. 126. Axiomatic Sensitivity to density The blue and the red node have the same importance (the two rings have the same size!)
  127. 127. Axiomatic Sensitivity to density The blue and the red node have the same importance (the two rings have the same size!)
  128. 128. Axiomatic Sensitivity to density Densifying the left-hand side, we expect the red node to become more important than the blue node
  129. 129. Axiomatic Monotonicity G x y G’ x y
  130. 130. Axiomatic Monotonicity Adding an arc increases the importance of the target node G x y G’ x y
  131. 131. An axiomatic slaughter
  132. 132. An axiomatic slaughter
  133. 133. An axiomatic slaughter Size Density Monot. Degree only k yes yes Betweennes s only p no no Katz only k yes yes Closeness no no no Lin only k no no Harmonic yes yes yes PageRank no yes yes Seeley no yes no HITS only k yes no SALSA no yes no
  134. 134. An axiomatic slaughter Size Density Monot. Degree only k yes yes Betweennes s only p no no Katz only k yes yes Closeness no no no Lin only k no no Harmonic yes yes yes PageRank no yes yes Seeley no yes no HITS only k yes no SALSA no yes no
  135. 135. An axiomatic slaughter Size Density Monot. Degree only k yes yes Betweennes s only p no no Katz only k yes yes Closeness no no no Lin only k no no Harmonic yes yes yes PageRank no yes yes Seeley no yes no HITS only k yes no SALSA no yes no
  136. 136. An axiomatic slaughter Size Density Monot. Degree only k yes yes Betweennes s only p no no Katz only k yes yes Closeness no no no Lin only k no no Harmonic yes yes yes PageRank no yes yes Seeley no yes no HITS only k yes no SALSA no yes no
  137. 137. Ground-truth lens (mostly: anecdoctic / comparative)
  138. 138. Ground-truth lens (mostly: anecdoctic / comparative) check them against real (social?) networks on which you have some ground truth about importance/centrality/...
  139. 139. Hollywood: PageRank Ron Jeremy Adolf Hitler Lloyd Kaufman George W. Bush Ronald Reagan Bill Clinton Martin Sheen Debbie Rochon
  140. 140. Hollywood: PageRank Ron Jeremy Adolf Hitler Lloyd Kaufman George W. Bush Ronald Reagan Bill Clinton Martin Sheen Debbie Rochon
  141. 141. Hollywood: Degree William Shatner Bess Flowers Martin Sheen Ronald Reagan George Clooney Samuel Jackson Robin Williams Tom Hanks
  142. 142. Hollywood: Degree William Shatner Bess Flowers Martin Sheen Ronald Reagan George Clooney Samuel Jackson Robin Williams Tom Hanks
  143. 143. Hollywood: Betweenness Adolf Hitler Lloyd Kaufman Ron Jeremy Tony Robinson Olu Jacobs Max von Sydow Udo Kier George W. Bush
  144. 144. Hollywood: Betweenness Adolf Hitler Lloyd Kaufman Ron Jeremy Tony Robinson Olu Jacobs Max von Sydow Udo Kier George W. Bush
  145. 145. Hollywood: Katz William Shatner Martin Sheen George Clooney Robin WilliamsTom Hanks Ronald Reagan Bruce Willis Samuel Jackson
  146. 146. Hollywood: Katz William Shatner Martin Sheen George Clooney Robin WilliamsTom Hanks Ronald Reagan Bruce Willis Samuel Jackson
  147. 147. Hollywood: Closeness Lina Tjeng Ryan VillapotoAnh Loan Nguyen Thi Chad Reed Bjorn van Wenum J.P. Ramackers Herbert Sydney R.D. Nicholson
  148. 148. Hollywood: Closeness Lina Tjeng Ryan VillapotoAnh Loan Nguyen Thi Chad Reed Bjorn van Wenum J.P. Ramackers Herbert Sydney R.D. Nicholson
  149. 149. Hollywood: Closeness Lina Tjeng Ryan VillapotoAnh Loan Nguyen Thi Chad Reed Bjorn van Wenum J.P. Ramackers Herbert Sydney R.D. Nicholson Isolatednodeshavelargestcentrality
  150. 150. Hollywood: Lin George Clooney Samuel Jackson Martin Sheen Dennis Hopper Antonio Banderas Madonna Michael Douglas Tom Cruise
  151. 151. Hollywood: HITS William Shatner Robin Williams Bruce WillisTom Hanks Michael Douglas Cameron Diaz Harrison Ford Pierce Brosnan
  152. 152. Hollywood: Harmonic George Clooney Samuel Jackson Sharon Stone Tom Hanks Martin Sheen Dennis Hopper Antonio Banderas Madonna
  153. 153. What about the web? .uk Top Ten
  154. 154. What about the web? .uk Top Ten
  155. 155. What about the web? .uk Top Ten PageRank Katz Lin Harmonic http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.kelkoo.co.uk/ http://www.bbc.co.uk/accessibility/ http://news.bbc.co.uk/ http://www.names.co.uk/ http://www.kelkoo.co.uk/b/a/ kc_top_searches_charts.html http://news.bbc.co.uk/ http://www.dti.gov.uk/ http://www.names.co.uk/hosting.html http://www.kelkoo.co.uk/b/a/ co_2765_128501-company-information- pages.html http://www.dti.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.names.co.uk/email.html http://www.kelkoo.co.uk/b/a/sm_site- map.html?displayType=alpha http://www.google.co.uk/ http://www.google.co.uk/ http://www.names.co.uk/ controlpanel.html http://www.kelkoo.co.uk/b/a/ co_5199_128501-how-to-use- kelkoo.html http://www.guardian.co.uk/ http://www.bbc.co.uk/accessibility/ http://www.scdc.org.uk/ http://www.kelkoo.co.uk/b/a/ co_2120_128501-shopping-guides- price-comparison-on-kelkoo-uk.html http://www.homeoffice.gov.uk/ http://www.homeoffice.gov.uk/ http://www.freelyricsearch.co.uk/ index.html http://www.ebay.co.uk/ http://www.statistics.gov.uk/ http://www.statistics.gov.uk/ http://www.247partypeople.co.uk/ login.asp http://www.top50scrappers.co.uk/ http://www.bbc.co.uk/privacy/ http://www.bbc.co.uk/privacy/ http://www.becs.co.uk/catalog/ cookie_usage.php http://cgi1.ebay.co.uk/aw-cgi/ eBayISAPI.dll?TimeShow http://www.bbc.co.uk/info/ http://www.bbc.co.uk/info/
  156. 156. What about the web? .uk Top Ten PageRank Katz Lin Harmonic http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.kelkoo.co.uk/ http://www.bbc.co.uk/accessibility/ http://news.bbc.co.uk/ http://www.names.co.uk/ http://www.kelkoo.co.uk/b/a/ kc_top_searches_charts.html http://news.bbc.co.uk/ http://www.dti.gov.uk/ http://www.names.co.uk/hosting.html http://www.kelkoo.co.uk/b/a/ co_2765_128501-company-information- pages.html http://www.dti.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.names.co.uk/email.html http://www.kelkoo.co.uk/b/a/sm_site- map.html?displayType=alpha http://www.google.co.uk/ http://www.google.co.uk/ http://www.names.co.uk/ controlpanel.html http://www.kelkoo.co.uk/b/a/ co_5199_128501-how-to-use- kelkoo.html http://www.guardian.co.uk/ http://www.bbc.co.uk/accessibility/ http://www.scdc.org.uk/ http://www.kelkoo.co.uk/b/a/ co_2120_128501-shopping-guides- price-comparison-on-kelkoo-uk.html http://www.homeoffice.gov.uk/ http://www.homeoffice.gov.uk/ http://www.freelyricsearch.co.uk/ index.html http://www.ebay.co.uk/ http://www.statistics.gov.uk/ http://www.statistics.gov.uk/ http://www.247partypeople.co.uk/ login.asp http://www.top50scrappers.co.uk/ http://www.bbc.co.uk/privacy/ http://www.bbc.co.uk/privacy/ http://www.becs.co.uk/catalog/ cookie_usage.php http://cgi1.ebay.co.uk/aw-cgi/ eBayISAPI.dll?TimeShow http://www.bbc.co.uk/info/ http://www.bbc.co.uk/info/
  157. 157. What about the web? .uk Top Ten PageRank Katz Lin Harmonic http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.kelkoo.co.uk/ http://www.bbc.co.uk/accessibility/ http://news.bbc.co.uk/ http://www.names.co.uk/ http://www.kelkoo.co.uk/b/a/ kc_top_searches_charts.html http://news.bbc.co.uk/ http://www.dti.gov.uk/ http://www.names.co.uk/hosting.html http://www.kelkoo.co.uk/b/a/ co_2765_128501-company-information- pages.html http://www.dti.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.names.co.uk/email.html http://www.kelkoo.co.uk/b/a/sm_site- map.html?displayType=alpha http://www.google.co.uk/ http://www.google.co.uk/ http://www.names.co.uk/ controlpanel.html http://www.kelkoo.co.uk/b/a/ co_5199_128501-how-to-use- kelkoo.html http://www.guardian.co.uk/ http://www.bbc.co.uk/accessibility/ http://www.scdc.org.uk/ http://www.kelkoo.co.uk/b/a/ co_2120_128501-shopping-guides- price-comparison-on-kelkoo-uk.html http://www.homeoffice.gov.uk/ http://www.homeoffice.gov.uk/ http://www.freelyricsearch.co.uk/ index.html http://www.ebay.co.uk/ http://www.statistics.gov.uk/ http://www.statistics.gov.uk/ http://www.247partypeople.co.uk/ login.asp http://www.top50scrappers.co.uk/ http://www.bbc.co.uk/privacy/ http://www.bbc.co.uk/privacy/ http://www.becs.co.uk/catalog/ cookie_usage.php http://cgi1.ebay.co.uk/aw-cgi/ eBayISAPI.dll?TimeShow http://www.bbc.co.uk/info/ http://www.bbc.co.uk/info/
  158. 158. What about the web? .uk Top Ten PageRank Katz Lin Harmonic http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.kelkoo.co.uk/ http://www.bbc.co.uk/accessibility/ http://news.bbc.co.uk/ http://www.names.co.uk/ http://www.kelkoo.co.uk/b/a/ kc_top_searches_charts.html http://news.bbc.co.uk/ http://www.dti.gov.uk/ http://www.names.co.uk/hosting.html http://www.kelkoo.co.uk/b/a/ co_2765_128501-company-information- pages.html http://www.dti.gov.uk/ http://www.direct.gov.uk/en/index.htm http://www.names.co.uk/email.html http://www.kelkoo.co.uk/b/a/sm_site- map.html?displayType=alpha http://www.google.co.uk/ http://www.google.co.uk/ http://www.names.co.uk/ controlpanel.html http://www.kelkoo.co.uk/b/a/ co_5199_128501-how-to-use- kelkoo.html http://www.guardian.co.uk/ http://www.bbc.co.uk/accessibility/ http://www.scdc.org.uk/ http://www.kelkoo.co.uk/b/a/ co_2120_128501-shopping-guides- price-comparison-on-kelkoo-uk.html http://www.homeoffice.gov.uk/ http://www.homeoffice.gov.uk/ http://www.freelyricsearch.co.uk/ index.html http://www.ebay.co.uk/ http://www.statistics.gov.uk/ http://www.statistics.gov.uk/ http://www.247partypeople.co.uk/ login.asp http://www.top50scrappers.co.uk/ http://www.bbc.co.uk/privacy/ http://www.bbc.co.uk/privacy/ http://www.becs.co.uk/catalog/ cookie_usage.php http://cgi1.ebay.co.uk/aw-cgi/ eBayISAPI.dll?TimeShow http://www.bbc.co.uk/info/ http://www.bbc.co.uk/info/
  159. 159. How do they compare?
  160. 160. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ
  161. 161. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ most geometric indices and HITS are rather correlated to one another
  162. 162. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ
  163. 163. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ Katz, degree and SALSA are also highly correlated
  164. 164. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ
  165. 165. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ Betweenness does not correlate to anything
  166. 166. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ
  167. 167. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS between PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9709 0.9287 0.8627 0.9005 0.4357 0.5526 0.5512 0.5170 0.5034 0.3699 0.4225 0.5074 Katz 1/4 0.9709 1.0000 0.9609 0.8957 0.8719 0.4638 0.5816 0.5801 0.5476 0.5026 0.3448 0.3964 0.4801 Katz 1/2 0.9287 0.9609 1.0000 0.9369 0.8291 0.4965 0.6157 0.6139 0.5849 0.4952 0.3108 0.3605 0.4416 Katz 3/4 0.8627 0.8957 0.9369 1.0000 0.7630 0.5488 0.6697 0.6676 0.6478 0.4811 0.2633 0.3098 0.3865 SALSA 0.9005 0.8719 0.8291 0.7630 1.0000 0.5371 0.4519 0.4504 0.4185 0.4692 0.4496 0.5042 0.5924 closeness 0.4357 0.4638 0.4965 0.5488 0.5371 1.0000 0.8503 0.8508 0.7366 0.3293 0.1529 0.1813 0.2319 harmonic 0.5526 0.5816 0.6157 0.6697 0.4519 0.8503 1.0000 0.9925 0.8694 0.3929 0.0752 0.1041 0.1549 Lin 0.5512 0.5801 0.6139 0.6676 0.4504 0.8508 0.9925 1.0000 0.8680 0.3916 0.0753 0.1041 0.1546 HITS 0.5170 0.5476 0.5849 0.6478 0.4185 0.7366 0.8694 0.8680 1.0000 0.3645 0.0518 0.0780 0.1249 between 0.5034 0.5026 0.4952 0.4811 0.4692 0.3293 0.3929 0.3916 0.3696 1.0000 0.4852 0.4909 0.4923 PR 1/4 0.3699 0.3448 0.3108 0.2633 0.4496 0.1529 0.0752 0.0753 0.0518 0.4852 1.0000 0.9317 0.8276 PR 1/2 0.4225 0.3964 0.3605 0.3098 0.5042 0.1813 0.1041 0.1041 0.0780 0.4909 0.9317 1.0000 0.8952 PR 3/4 0.5074 0.4801 0.4416 0.3865 0.5924 0.2319 0.1549 0.1546 0.1249 0.4923 0.8276 0.8952 1.0000 Table 1: Hollywood 1 Hollywood Kendall’s τ PageRank stands alone
  168. 168. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ
  169. 169. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ
  170. 170. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ Betweenness could not be computed because of graph size (106M nodes)
  171. 171. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ
  172. 172. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ The same correlations as with Hollywood, but even more emphasized
  173. 173. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ
  174. 174. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ Exception: HITS used to be correlated with the geometric indices, while now it is alone
  175. 175. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ
  176. 176. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9053 0.9024 0.9000 0.9114 0.1950 0.2060 0.2060 0.2853 0.6449 0.6161 0.5784 Katz 1/4 0.9053 1.0000 0.9957 0.9922 0.8141 0.2059 0.2268 0.2265 0.2773 0.5917 0.5820 0.5595 Katz 1/2 0.9024 0.9957 1.0000 0.9966 0.8112 0.2078 0.2289 0.2286 0.2776 0.5914 0.5827 0.5611 Katz 3/4 0.9000 0.9922 0.9966 1.0000 0.8089 0.2094 0.2307 0.2303 0.2778 0.5911 0.5832 0.5622 SALSA 0.9114 0.8141 0.8112 0.8089 1.0000 0.1782 0.1617 0.1619 0.1917 0.6445 0.6146 0.5747 closeness 0.1950 0.2059 0.2078 0.2094 0.1782 1.0000 0.8592 0.8566 0.3817 0.1518 0.1746 0.2004 harmonic 0.2060 0.2268 0.2289 0.2307 0.1617 0.8592 1.0000 0.9694 0.4253 0.1503 0.1770 0.2072 Lin 0.2060 0.2265 0.2286 0.2303 0.1619 0.8566 0.9694 1.0000 0.4272 0.1503 0.1768 0.2069 HITS 0.2853 0.2773 0.2776 0.2778 0.1917 0.3817 0.4253 0.4272 1.0000 0.1529 0.1484 0.1415 PR 1/4 0.6449 0.5917 0.5914 0.5911 0.6445 0.1518 0.1503 0.1503 0.1529 1.0000 0.9182 0.8289 PR 1/2 0.6161 0.5820 0.5827 0.5832 0.6146 0.1746 0.1770 0.1768 0.1484 0.9182 1.0000 0.9088 PR 3/4 0.5784 0.5595 0.5611 0.5622 0.5747 0.2004 0.2072 0.2069 0.1415 0.8289 0.9088 1.0000 Table 1: .uk 1 .uk (May 2007 snapshot) Kendall’s τ A larger correlation between PageRank and Katz & degree
  177. 177. Orkut (2007 snapshot) Kendall’s τ
  178. 178. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9522 0.8982 0.8242 1.0000 0.5521 0.5391 0.5513 0.3508 0.7265 0.7596 0.8016 Katz 1/4 0.9522 1.0000 0.9489 0.8750 0.9522 0.5972 0.5875 0.5967 0.3984 0.6868 0.7179 0.7577 Katz 1/2 0.8982 0.9489 1.0000 0.9275 0.8982 0.6382 0.6338 0.6380 0.4491 0.6400 0.6690 0.7067 Katz 3/4 0.8242 0.8750 0.9275 1.0000 0.8242 0.6839 0.6910 0.6842 0.5213 0.5742 0.6005 0.6355 SALSA 1.0000 0.9522 0.8982 0.8242 1.0000 0.5521 0.5391 0.5513 0.3505 0.7265 0.7596 0.8016 closeness 0.5521 0.5972 0.6382 0.6839 0.5521 1.0000 0.9458 0.9830 0.6539 0.3862 0.4040 0.4268 harmonic 0.5391 0.5875 0.6338 0.6910 0.5391 0.9458 1.0000 0.9471 0.7090 0.3612 0.3777 0.3992 Lin 0.5513 0.5967 0.6380 0.6842 0.5513 0.9830 0.9471 1.0000 0.6546 0.3852 0.4030 0.4257 HITS 0.3508 0.3984 0.4491 0.5213 0.3505 0.6539 0.7090 0.6546 1.0000 0.1689 0.1778 0.1917 PR 1/4 0.7265 0.6868 0.6400 0.5742 0.7265 0.3862 0.3612 0.3852 0.1689 1.0000 0.9520 0.8889 PR 1/2 0.7596 0.7179 0.6690 0.6005 0.7596 0.4040 0.3777 0.4030 0.1778 0.9520 1.0000 0.9363 PR 3/4 0.8016 0.7577 0.7067 0.6355 0.8016 0.4268 0.3992 0.4257 0.1917 0.8889 0.9363 1.0000 Table 1: Orkut 1 Orkut (2007 snapshot) Kendall’s τ
  179. 179. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9522 0.8982 0.8242 1.0000 0.5521 0.5391 0.5513 0.3508 0.7265 0.7596 0.8016 Katz 1/4 0.9522 1.0000 0.9489 0.8750 0.9522 0.5972 0.5875 0.5967 0.3984 0.6868 0.7179 0.7577 Katz 1/2 0.8982 0.9489 1.0000 0.9275 0.8982 0.6382 0.6338 0.6380 0.4491 0.6400 0.6690 0.7067 Katz 3/4 0.8242 0.8750 0.9275 1.0000 0.8242 0.6839 0.6910 0.6842 0.5213 0.5742 0.6005 0.6355 SALSA 1.0000 0.9522 0.8982 0.8242 1.0000 0.5521 0.5391 0.5513 0.3505 0.7265 0.7596 0.8016 closeness 0.5521 0.5972 0.6382 0.6839 0.5521 1.0000 0.9458 0.9830 0.6539 0.3862 0.4040 0.4268 harmonic 0.5391 0.5875 0.6338 0.6910 0.5391 0.9458 1.0000 0.9471 0.7090 0.3612 0.3777 0.3992 Lin 0.5513 0.5967 0.6380 0.6842 0.5513 0.9830 0.9471 1.0000 0.6546 0.3852 0.4030 0.4257 HITS 0.3508 0.3984 0.4491 0.5213 0.3505 0.6539 0.7090 0.6546 1.0000 0.1689 0.1778 0.1917 PR 1/4 0.7265 0.6868 0.6400 0.5742 0.7265 0.3862 0.3612 0.3852 0.1689 1.0000 0.9520 0.8889 PR 1/2 0.7596 0.7179 0.6690 0.6005 0.7596 0.4040 0.3777 0.4030 0.1778 0.9520 1.0000 0.9363 PR 3/4 0.8016 0.7577 0.7067 0.6355 0.8016 0.4268 0.3992 0.4257 0.1917 0.8889 0.9363 1.0000 Table 1: Orkut 1 Orkut (2007 snapshot) Kendall’s τ The same correlation as in Hollywood, even if this time SALSA is also pretty correlated with PageRank as well
  180. 180. degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9522 0.8982 0.8242 1.0000 0.5521 0.5391 0.5513 0.3508 0.7265 0.7596 0.8016 Katz 1/4 0.9522 1.0000 0.9489 0.8750 0.9522 0.5972 0.5875 0.5967 0.3984 0.6868 0.7179 0.7577 Katz 1/2 0.8982 0.9489 1.0000 0.9275 0.8982 0.6382 0.6338 0.6380 0.4491 0.6400 0.6690 0.7067 Katz 3/4 0.8242 0.8750 0.9275 1.0000 0.8242 0.6839 0.6910 0.6842 0.5213 0.5742 0.6005 0.6355 SALSA 1.0000 0.9522 0.8982 0.8242 1.0000 0.5521 0.5391 0.5513 0.3505 0.7265 0.7596 0.8016 closeness 0.5521 0.5972 0.6382 0.6839 0.5521 1.0000 0.9458 0.9830 0.6539 0.3862 0.4040 0.4268 harmonic 0.5391 0.5875 0.6338 0.6910 0.5391 0.9458 1.0000 0.9471 0.7090 0.3612 0.3777 0.3992 Lin 0.5513 0.5967 0.6380 0.6842 0.5513 0.9830 0.9471 1.0000 0.6546 0.3852 0.4030 0.4257 HITS 0.3508 0.3984 0.4491 0.5213 0.3505 0.6539 0.7090 0.6546 1.0000 0.1689 0.1778 0.1917 PR 1/4 0.7265 0.6868 0.6400 0.5742 0.7265 0.3862 0.3612 0.3852 0.1689 1.0000 0.9520 0.8889 PR 1/2 0.7596 0.7179 0.6690 0.6005 0.7596 0.4040 0.3777 0.4030 0.1778 0.9520 1.0000 0.9363 PR 3/4 0.8016 0.7577 0.7067 0.6355 0.8016 0.4268 0.3992 0.4257 0.1917 0.8889 0.9363 1.0000 Table 1: Orkut 1 Orkut (2007 snapshot) Kendall’s τ
  181. 181. IR lens (using TREC .gov2)
  182. 182. IR lens (using TREC .gov2) use centrality (in isolation or combined with textual features) to rerank query results and see how good (bad) they do
  183. 183. TREC .gov2
  184. 184. TREC .gov2 150 queries (query title words, in AND; with stemming, no stopword elimination)
  185. 185. TREC .gov2 150 queries (query title words, in AND; with stemming, no stopword elimination) Generated the result graph using the method described by Najork et al. 2009 (a variant of Kleinberg’s HITS graph, taking a in-links and b out-links)
  186. 186. TREC .gov2 150 queries (query title words, in AND; with stemming, no stopword elimination) Generated the result graph using the method described by Najork et al. 2009 (a variant of Kleinberg’s HITS graph, taking a in-links and b out-links) Considered many combinations: here I present only the cases a=b=0 (i.e., subgraph induced by the result set)
  187. 187. TREC .gov2 150 queries (query title words, in AND; with stemming, no stopword elimination) Generated the result graph using the method described by Najork et al. 2009 (a variant of Kleinberg’s HITS graph, taking a in-links and b out-links) Considered many combinations: here I present only the cases a=b=0 (i.e., subgraph induced by the result set) With or without intra-host links
  188. 188. P@10 and NDCG@10
  189. 189. P@10 and NDCG@10
  190. 190. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Betweenness 0.0584 0.0595 0.0577 0.0588 Closeness 0.1101 0.1061 0.1121 0.1168 PageRank (best) 0.1107 0.1078 0.1295 0.1347 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 Katz (best) 0.1242 0.1228 0.1262 0.1297 Lin 0.1295 0.1308 0.1248 0.1286 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293
  191. 191. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Betweenness 0.0584 0.0595 0.0577 0.0588 Closeness 0.1101 0.1061 0.1121 0.1168 PageRank (best) 0.1107 0.1078 0.1295 0.1347 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 Katz (best) 0.1242 0.1228 0.1262 0.1297 Lin 0.1295 0.1308 0.1248 0.1286 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293
  192. 192. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Betweenness 0.0584 0.0595 0.0577 0.0588 Closeness 0.1101 0.1061 0.1121 0.1168 PageRank (best) 0.1107 0.1078 0.1295 0.1347 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 Katz (best) 0.1242 0.1228 0.1262 0.1297 Lin 0.1295 0.1308 0.1248 0.1286 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293
  193. 193. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Betweenness 0.0584 0.0595 0.0577 0.0588 Closeness 0.1101 0.1061 0.1121 0.1168 PageRank (best) 0.1107 0.1078 0.1295 0.1347 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 Katz (best) 0.1242 0.1228 0.1262 0.1297 Lin 0.1295 0.1308 0.1248 0.1286 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293
  194. 194. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Betweenness 0.0584 0.0595 0.0577 0.0588 Closeness 0.1101 0.1061 0.1121 0.1168 PageRank (best) 0.1107 0.1078 0.1295 0.1347 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 Katz (best) 0.1242 0.1228 0.1262 0.1297 Lin 0.1295 0.1308 0.1248 0.1286 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293
  195. 195. Intra-host links?
  196. 196. Intra-host links? Keep them or throw them away?
  197. 197. Intra-host links? Keep them or throw them away? Most indices get better if you throw them away...
  198. 198. Intra-host links? Keep them or throw them away? Most indices get better if you throw them away... Throwing such links away injects a lot of information, but apparently harmonic doesn’t need it!
  199. 199. Intra-host links? Keep them or throw them away? Most indices get better if you throw them away... Throwing such links away injects a lot of information, but apparently harmonic doesn’t need it! ...but harmonic is better (and best of all) with the whole thing!
  200. 200. .uk (May 2007 snapshot) Kendall’s τ with no intra-host links degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9995 0.9995 0.9995 0.9965 0.9883 0.9984 0.9984 0.8767 0.9956 0.9956 0.9955 Katz 1/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 1/2 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 3/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 SALSA 0.9965 0.9960 0.9960 0.9960 1.0000 0.9904 0.9950 0.9950 0.8718 0.9959 0.9959 0.9958 closeness 0.9883 0.9870 0.9870 0.9870 0.9904 1.0000 0.9859 0.9876 0.8714 0.9894 0.9893 0.9892 harmonic 0.9984 0.9986 0.9986 0.9986 0.9950 0.9859 1.0000 0.9984 0.8759 0.9944 0.9945 0.9946 Lin 0.9984 0.9978 0.9978 0.9978 0.9950 0.9876 0.9984 1.0000 0.8759 0.9945 0.9946 0.9946 HITS 0.8767 0.8763 0.8763 0.8763 0.8718 0.8714 0.8759 0.8759 1.0000 0.8727 0.8727 0.8727 PR 1/4 0.9956 0.9952 0.9952 0.9952 0.9959 0.9894 0.9944 0.9945 0.8727 1.0000 0.9998 0.9997 PR 1/2 0.9956 0.9953 0.9953 0.9953 0.9959 0.9893 0.9945 0.9946 0.8727 0.9998 1.0000 0.9999 PR 3/4 0.9955 0.9953 0.9953 0.9953 0.9958 0.9892 0.9946 0.9946 0.8727 0.9997 0.9999 1.0000 Table 1: .uk-nn 1
  201. 201. .uk (May 2007 snapshot) Kendall’s τ with no intra-host links Everything becomes correlated. Why??? degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9995 0.9995 0.9995 0.9965 0.9883 0.9984 0.9984 0.8767 0.9956 0.9956 0.9955 Katz 1/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 1/2 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 3/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 SALSA 0.9965 0.9960 0.9960 0.9960 1.0000 0.9904 0.9950 0.9950 0.8718 0.9959 0.9959 0.9958 closeness 0.9883 0.9870 0.9870 0.9870 0.9904 1.0000 0.9859 0.9876 0.8714 0.9894 0.9893 0.9892 harmonic 0.9984 0.9986 0.9986 0.9986 0.9950 0.9859 1.0000 0.9984 0.8759 0.9944 0.9945 0.9946 Lin 0.9984 0.9978 0.9978 0.9978 0.9950 0.9876 0.9984 1.0000 0.8759 0.9945 0.9946 0.9946 HITS 0.8767 0.8763 0.8763 0.8763 0.8718 0.8714 0.8759 0.8759 1.0000 0.8727 0.8727 0.8727 PR 1/4 0.9956 0.9952 0.9952 0.9952 0.9959 0.9894 0.9944 0.9945 0.8727 1.0000 0.9998 0.9997 PR 1/2 0.9956 0.9953 0.9953 0.9953 0.9959 0.9893 0.9945 0.9946 0.8727 0.9998 1.0000 0.9999 PR 3/4 0.9955 0.9953 0.9953 0.9953 0.9958 0.9892 0.9946 0.9946 0.8727 0.9997 0.9999 1.0000 Table 1: .uk-nn 1
  202. 202. .uk (May 2007 snapshot) Kendall’s τ with no intra-host links degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9995 0.9995 0.9995 0.9965 0.9883 0.9984 0.9984 0.8767 0.9956 0.9956 0.9955 Katz 1/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 1/2 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 3/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 SALSA 0.9965 0.9960 0.9960 0.9960 1.0000 0.9904 0.9950 0.9950 0.8718 0.9959 0.9959 0.9958 closeness 0.9883 0.9870 0.9870 0.9870 0.9904 1.0000 0.9859 0.9876 0.8714 0.9894 0.9893 0.9892 harmonic 0.9984 0.9986 0.9986 0.9986 0.9950 0.9859 1.0000 0.9984 0.8759 0.9944 0.9945 0.9946 Lin 0.9984 0.9978 0.9978 0.9978 0.9950 0.9876 0.9984 1.0000 0.8759 0.9945 0.9946 0.9946 HITS 0.8767 0.8763 0.8763 0.8763 0.8718 0.8714 0.8759 0.8759 1.0000 0.8727 0.8727 0.8727 PR 1/4 0.9956 0.9952 0.9952 0.9952 0.9959 0.9894 0.9944 0.9945 0.8727 1.0000 0.9998 0.9997 PR 1/2 0.9956 0.9953 0.9953 0.9953 0.9959 0.9893 0.9945 0.9946 0.8727 0.9998 1.0000 0.9999 PR 3/4 0.9955 0.9953 0.9953 0.9953 0.9958 0.9892 0.9946 0.9946 0.8727 0.9997 0.9999 1.0000 Table 1: .uk-nn 1
  203. 203. .uk (May 2007 snapshot) Kendall’s τ with no intra-host links Because most nodes have degree 0 (and hence they also have all the other scores at a tie) degree Katz 1/4 Katz 1/2 Katz 3/4 SALSA closeness harmonic Lin HITS PR 1/4 PR 1/2 PR 3/4 degree 1.0000 0.9995 0.9995 0.9995 0.9965 0.9883 0.9984 0.9984 0.8767 0.9956 0.9956 0.9955 Katz 1/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 1/2 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 Katz 3/4 0.9995 1.0000 1.0000 1.0000 0.9960 0.9870 0.9986 0.9978 0.8763 0.9952 0.9953 0.9953 SALSA 0.9965 0.9960 0.9960 0.9960 1.0000 0.9904 0.9950 0.9950 0.8718 0.9959 0.9959 0.9958 closeness 0.9883 0.9870 0.9870 0.9870 0.9904 1.0000 0.9859 0.9876 0.8714 0.9894 0.9893 0.9892 harmonic 0.9984 0.9986 0.9986 0.9986 0.9950 0.9859 1.0000 0.9984 0.8759 0.9944 0.9945 0.9946 Lin 0.9984 0.9978 0.9978 0.9978 0.9950 0.9876 0.9984 1.0000 0.8759 0.9945 0.9946 0.9946 HITS 0.8767 0.8763 0.8763 0.8763 0.8718 0.8714 0.8759 0.8759 1.0000 0.8727 0.8727 0.8727 PR 1/4 0.9956 0.9952 0.9952 0.9952 0.9959 0.9894 0.9944 0.9945 0.8727 1.0000 0.9998 0.9997 PR 1/2 0.9956 0.9953 0.9953 0.9953 0.9959 0.9893 0.9945 0.9946 0.8727 0.9998 1.0000 0.9999 PR 3/4 0.9955 0.9953 0.9953 0.9953 0.9958 0.9892 0.9946 0.9946 0.8727 0.9997 0.9999 1.0000 Table 1: .uk-nn 1
  204. 204. P@10 and NDCG@10
  205. 205. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  206. 206. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  207. 207. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842 d (x) · canReach(x)
  208. 208. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  209. 209. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  210. 210. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842 d (x) · connected(x)
  211. 211. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  212. 212. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  213. 213. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  214. 214. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842
  215. 215. P@10 and NDCG@10 All linksAll links Inter-host links onlyInter-host links only P@10 NDCG@10 P@10 NDCG@10 Weighted degree 0.1356 0.1373 0.1262 0.1295 SALSinA 0.1349 0.1357 0.1255 0.1318 Degree 0.1208 0.1091 0.1248 0.1283 SALSA 0.1221 0.1194 0.1282 0.1384 HITS 0.1349 0.1364 0.1107 0.1179 Harmonic 0.1430 0.1449 0.1262 0.1293 BM25 0.5644 0.5842 0.5644 0.5842 TODO Study features in combination
  216. 216. Computational feasibility lens
  217. 217. Computational feasibility lens which indices are computable effectively on large networks; consider also parallelizability / distributability...
  218. 218. Best algorithms so far
  219. 219. Best algorithms so far
  220. 220. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  221. 221. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  222. 222. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  223. 223. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  224. 224. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  225. 225. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  226. 226. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  227. 227. Best algorithms so far How? Progr. Degree Trivial O(1) - - +++ Betweennes s O(nm) [Brandes 2001] - no - Katz Iterative (Gauss-Seidel) Fast convergence yes + PageRank Iterative (PM, GS, Jacobi...) Fast convergence yes +++ Seeley Iterative (PM) Slow convergence no - HITS Iterative (PM) Slow convergence no - SALSA Direct O(n+) - no +++
  228. 228. Geometric
  229. 229. Geometric What about geometic measures, like closeness, Lin and harmonic?
  230. 230. Computing harmonic
  231. 231. Computing harmonic Let us take harmonic as an example
  232. 232. Computing harmonic Let us take harmonic as an example But how easy is it to compute? charm(x) = X y6=x 1 d(y, x)
  233. 233. Computing harmonic Let us take harmonic as an example But how easy is it to compute? ...or... charm(x) = X y6=x 1 d(y, x) charm(x) = 1X t=1 |Bt(x)| |Bt 1(x)| t
  234. 234. Computing harmonic Let us take harmonic as an example But how easy is it to compute? ...or... charm(x) = X y6=x 1 d(y, x) charm(x) = 1X t=1 |Bt(x)| |Bt 1(x)| t Ball of radius t about x
  235. 235. Computing harmonic Let us take harmonic as an example But how easy is it to compute? ...or... charm(x) = X y6=x 1 d(y, x) charm(x) = 1X t=1 |Bt(x)| |Bt 1(x)| t
  236. 236. Computing by diffusion
  237. 237. Computing by diffusion Clearly B0(x) = {x}
  238. 238. Computing by diffusion Clearly Moreover B0(x) = {x} Bt+1(x) = {x} [ [ x!y Bt(y)
  239. 239. Computing by diffusion Clearly Moreover So one needs just one single sequential scan of the graph to compute the balls at the next iteration B0(x) = {x} Bt+1(x) = {x} [ [ x!y Bt(y)
  240. 240. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  241. 241. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  242. 242. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  243. 243. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  244. 244. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  245. 245. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  246. 246. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  247. 247. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  248. 248. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  249. 249. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  250. 250. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  251. 251. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺
  252. 252. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺☺ ☺☺ ☺ ☺
  253. 253. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺☺ ☺ ☺ ☺ ☺
  254. 254. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺☺ ☺ ☺ ☺ ☺
  255. 255. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺☺ ☺ ☺☺ ☺
  256. 256. A round of updates ☺ ☺ ☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺☺ ☺ ☺☺ ☺ ☺☺ ☺
  257. 257. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺ ☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺ ☺☺ ☺☺☺
  258. 258. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺ ☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺ ☺☺ ☺☺☺
  259. 259. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺ ☺☺ ☺☺☺
  260. 260. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺☺☺☺ ☺ ☺ ☺☺ ☺ ☺☺ ☺☺☺
  261. 261. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺☺☺☺ ☺ ☺ ☺☺ ☺ ☺☺ ☺☺☺
  262. 262. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺☺☺☺ ☺ ☺☺☺ ☺ ☺☺ ☺☺☺
  263. 263. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺☺☺☺ ☺ ☺☺☺ ☺ ☺☺ ☺☺☺
  264. 264. Another round... ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺ ☺ ☺☺ ☺ ☺☺☺ ☺ ☺ ☺☺ ☺☺☺☺ ☺ ☺☺☺ ☺ ☺☺☺☺☺
  265. 265. Easy but expensive
  266. 266. Easy but expensive Each set uses linear space; overall quadratic
  267. 267. Easy but expensive Each set uses linear space; overall quadratic Impossible!
  268. 268. Easy but expensive Each set uses linear space; overall quadratic Impossible! But what if we use approximate sets?
  269. 269. Easy but expensive Each set uses linear space; overall quadratic Impossible! But what if we use approximate sets? Idea: use probabilistic counters, which represent sets but answer just to “size?” questions
  270. 270. Easy but expensive Each set uses linear space; overall quadratic Impossible! But what if we use approximate sets? Idea: use probabilistic counters, which represent sets but answer just to “size?” questions Very small! With 40 bits you can count up to 4 billion with a standard deviation of 6%
  271. 271. Use the Force
  272. 272. Use the Force HyperBall (http://webgraph.dsi.unimi.it/) does the job
  273. 273. Use the Force HyperBall (http://webgraph.dsi.unimi.it/) does the job Fully exploits multicore architectures; uses broadword microparallelization
  274. 274. Use the Force HyperBall (http://webgraph.dsi.unimi.it/) does the job Fully exploits multicore architectures; uses broadword microparallelization Works like a charm on networks with hundreds of millions of nodes
  275. 275. What is HyperBall
  276. 276. What is HyperBall It uses the diffusion idea to compute (at the same time):
  277. 277. What is HyperBall It uses the diffusion idea to compute (at the same time): Lin + Closeness + Harmonic + Number of reachable nodes...
  278. 278. What is HyperBall It uses the diffusion idea to compute (at the same time): Lin + Closeness + Harmonic + Number of reachable nodes... It employs Flajolet’s HyperLogLog counters to store sets
  279. 279. What is HyperBall It uses the diffusion idea to compute (at the same time): Lin + Closeness + Harmonic + Number of reachable nodes... It employs Flajolet’s HyperLogLog counters to store sets More on HyperBall in the next few slides...
  280. 280. Main trick
  281. 281. Main trick Choose an approximate set such that unions can be computed quickly
  282. 282. Main trick Choose an approximate set such that unions can be computed quickly ANF [Palmer et al., KDD ’02] uses Martin–Flajolet (MF) counters (log n+c space)
  283. 283. Main trick Choose an approximate set such that unions can be computed quickly ANF [Palmer et al., KDD ’02] uses Martin–Flajolet (MF) counters (log n+c space) We use HyperLogLog counters [Flajolet et al.,2007] (loglog n space)
  284. 284. Main trick Choose an approximate set such that unions can be computed quickly ANF [Palmer et al., KDD ’02] uses Martin–Flajolet (MF) counters (log n+c space) We use HyperLogLog counters [Flajolet et al.,2007] (loglog n space) MF counters can be combined with an OR
  285. 285. Main trick Choose an approximate set such that unions can be computed quickly ANF [Palmer et al., KDD ’02] uses Martin–Flajolet (MF) counters (log n+c space) We use HyperLogLog counters [Flajolet et al.,2007] (loglog n space) MF counters can be combined with an OR We use broadword programming to combine HyperLogLog counters quickly!
  286. 286. HyperLogLog counters
  287. 287. HyperLogLog counters Instead of actually counting, we observe a statistical feature of a set (think stream) of elements
  288. 288. HyperLogLog counters Instead of actually counting, we observe a statistical feature of a set (think stream) of elements The feature: the number of trailing zeroes of the value of a very good hash function
  289. 289. HyperLogLog counters Instead of actually counting, we observe a statistical feature of a set (think stream) of elements The feature: the number of trailing zeroes of the value of a very good hash function We keep track of the maximum m (log log n bits!)
  290. 290. HyperLogLog counters Instead of actually counting, we observe a statistical feature of a set (think stream) of elements The feature: the number of trailing zeroes of the value of a very good hash function We keep track of the maximum m (log log n bits!) The number of distinct elements ∝ 2m
  291. 291. HyperLogLog counters Instead of actually counting, we observe a statistical feature of a set (think stream) of elements The feature: the number of trailing zeroes of the value of a very good hash function We keep track of the maximum m (log log n bits!) The number of distinct elements ∝ 2m Important: the counter of stream AB is simply the maximum of the counters of A and B!
  292. 292. Other ideas
  293. 293. Other ideas We keep track of modifications: we do not maximize with unmodified counters
  294. 294. Other ideas We keep track of modifications: we do not maximize with unmodified counters Systolic computation: each modified set signals back to predecessors that something is going to happen (much fewer updates!)
  295. 295. Other ideas We keep track of modifications: we do not maximize with unmodified counters Systolic computation: each modified set signals back to predecessors that something is going to happen (much fewer updates!) Multicore exploitation by decomposition: a task is updating just 1000 counters (almost linear scaling)
  296. 296. Footprint
  297. 297. Footprint Scalability: a minimum of 20 bytes per node
  298. 298. Footprint Scalability: a minimum of 20 bytes per node On a 2TiB machine, 100 billion nodes
  299. 299. Footprint Scalability: a minimum of 20 bytes per node On a 2TiB machine, 100 billion nodes Graph structure is accessed by memory-mapping in a compressed form (WebGraph)
  300. 300. Footprint Scalability: a minimum of 20 bytes per node On a 2TiB machine, 100 billion nodes Graph structure is accessed by memory-mapping in a compressed form (WebGraph) Pointers to the graph are stored using succinct lists (Elias-Fano representation)
  301. 301. Performance
  302. 302. Performance On a 177K nodes
  303. 303. Performance On a 177K nodes Hadoop: 2875s per iteration [Kang, Papadimitriou, Sun and H. Tong, 2011]
  304. 304. Performance On a 177K nodes Hadoop: 2875s per iteration [Kang, Papadimitriou, Sun and H. Tong, 2011] HyperBall on this laptop: 70s per iteration
  305. 305. Performance On a 177K nodes Hadoop: 2875s per iteration [Kang, Papadimitriou, Sun and H. Tong, 2011] HyperBall on this laptop: 70s per iteration On a 32-core workstation: 23s per iteration
  306. 306. Performance On a 177K nodes Hadoop: 2875s per iteration [Kang, Papadimitriou, Sun and H. Tong, 2011] HyperBall on this laptop: 70s per iteration On a 32-core workstation: 23s per iteration On ClueWeb09 (4.8G nodes, 8G arcs) on a 40-core workstation: 141m (avg. 40s per iteration)
  307. 307. Convergence 5 15 25 35 45 55 65 75 85 95 0.0000.0020.0040.006 Harmonic centrality # runs Relativeerror
  308. 308. And when the Force is not enough?
  309. 309. And when the Force is not enough? Diffusion processes are easily distributable
  310. 310. And when the Force is not enough? Diffusion processes are easily distributable A Pregel-like implementation of HyperANF is underway (by Sebastian Schelter, TU Berlin)
  311. 311. Did you see the light?
  312. 312. Did you see the light? No? Me neither... But we have a path
  313. 313. Did you see the light? No? Me neither... But we have a path Some things have been ruled out, at least
  314. 314. Did you see the light? No? Me neither... But we have a path Some things have been ruled out, at least Some others, that were neglected, have found revenge
  315. 315. Did you see the light? No? Me neither... But we have a path Some things have been ruled out, at least Some others, that were neglected, have found revenge Some new ones seem to be promising
  316. 316. Questions?

×