PageRank on an evolving graph - Yanzhao Yang : NOTES

2013/2/12
1
1
PAGERANK ON AN
PAGERANK ON AN
EVOLVING GRAPH
Bahman Bahmani(Stanford)
Ravi Kumar(Google)
Mohammad Mahdian(Google)
Eli Upfal(Brown)
Present by
Present by
Yanzhao Yang

2013/2/12
2
Evolving Graph(Web Graph)
g p ( p )
2
 The directed links between web pages
 The directed links between web pages
 Used for computing the PageRank of the WWW
pages [4]
pages [4]

2013/2/12
3
Page Rank
g
3
 Classic link analysis algorithm based on the web
 Classic link analysis algorithm based on the web
graph
 A page that is linked to by many pages receives a
 A page that is linked to by many pages receives a
high rank itself. Otherwise, it receives a low rank.
 The rank value indicates an importance of a
 The rank value indicates an importance of a
particular page. [5]
 Very effective measure of reputation for both web
 Very effective measure of reputation for both web
graphs and social networks.

2013/2/12
5
Problem
5
 Traditional algorithm paradigm is inadequate for
 Traditional algorithm paradigm is inadequate for
evolving data

2013/2/12
6
Traditional Paradigm
g
6
 Stationary dataset input- inadequate for
 Stationary dataset input inadequate for
current social networks
 It is necessary for algorithm to probe the
Data
 It is necessary for algorithm to probe the
input continually and produce solutions at
any point in time that are close to the
Al i h
y p
correct solution for the then-current input.
Algorithm
Output

2013/2/12
7
Motivating examples
g p
7
 Web pages
 Millions of hyperlinks modified each day
f f
 Which portions of the web should a crawler focus
most?
Social networks
 Social networks
 Millions of social links modified each day
 Which users should a third party site track in
 Which users should a third-party site track in
order to recompute, eg, global reputation?

2013/2/12
8
Motivating examples
g p
8
 In fact, Pagerank may be always imprecise.
 In fact, Pagerank may be always imprecise.
e.g. Learn about changes->
crawling webs >
crawling webs->
limit of crawling capacity->
l i f h >
stale image of graph ->
graph structure->
Pagerank

2013/2/12
9
Objective Algorithm
j g
9
 Design an algorithm that decides which pages to
 Design an algorithm that decides which pages to
crawl and computes the PageRank using the
obtained information
 Maintains a good approximation of the true
PageRank values of the underlying evolving graph
g y g g g p
 Which pages to crawl
 The error is bounded at any point in time
 The error is bounded at any point in time

2013/2/12
10
Page Rank algorithm categories
g g g
10
 Linear algebraic methods[3]
 Linear algebraic methods[3]
-Power iteration speed up.
E.g, web graph.
E.g, web graph.
 Monte carlo methods[6]
-efficient and highly scalable
efficient and highly scalable
E.g, data streaming anfd map reduce.

2013/2/12
11
Evolving graph model
g g p
11
 A sequence of directed graphs over time
 Gt = (V, Et) = graph at time t
 Nodes do not change (for simplicity)
A |E E | i ll
 Assume |Et+1 – Et| is small
 Choose t fine enough
 No change model assumed
 No change model assumed
 At time t, algorithm can probe a node u to get N(u),
i.e, all edges in Et of the form (u, v)
 No constraints on running time or storage space
 Probing strategy focus on which node to probe
each time

2013/2/12
12
PageRank on evolving graphs
g g g p
12
 Teleport probability-ε
p p y
 Probability of jumping to a random node
 Stationary distribution of random walk:
-walk with ε: move to a node chosen uniformly at random
-walk with 1-ε:choose one of the outgoing edges of the current
node uniformly at random and move to the head of that
node uniformly at random and move to the head of that
edge
 is PageRank of node u in G
t
u

 is in-degree of node u
 is out-degree of node u
t
u
in
t
u
out

2013/2/12
13
Baseline probing methods
p g
13
 Random probing(randomized)
p g( )
Probe a node v chosen uniformly at random at
each time step
p
 Round-robin probing(deterministic)
Cycle through all nodes and probe each in a
Cy g p
round-robin manner
We can vary the ratio of change rate and probing
y g p g
rate

2013/2/12
14
Propotional Probing
p g
14
 At each step t, pick a node v to probe with
 At each step t, pick a node v to probe with
probability proportional to the PageRank of v in the
algorithm's current image of the graph.
g g g p
 The output is the PageRank vector of the current
image.
g

2013/2/12
15
Priority Probing
y g
15
do
t
step
every time
for
0
Priority
do
u
node
all
for

u

2013/2/12
16
Experiment
p
16
 Dataset
 Dataset
 AS(Autonom ous Systems, graph of routers)
 CAIDA(communication patterns of the routers)
 CAIDA(communication patterns of the routers)
 RAND (generated randomly)

2013/2/12
17
Experiment
p
17
 Random Probing serves as a baseline for
Proportional Probing
 Round-Robin serves as a baseline for Priority
Probing
 Hybird algorithm between Proportional Probing and
Round-Robin Probing is parametrized by
 Metric  
1
,
0

  
,

     
u
u t
t
V
u
t



 -
max
,
L
metric
L t


 
     
t
t
t


L
t i
L t

     
u
u t
t
V
u
t



 -
,
L
metric
L t
1
1 



2013/2/12
18
Results( AS & CAIDA )
( )
18
 Propotional Probing is better than Random Probing
p g g
 Priority Probing is better than Round-Robin Probing
 The algorithm perform better when they probe more
 The algorithm perform better when they probe more
frequently

2013/2/12
19
AS graph (L1 errors)
g p ( )
19

2013/2/12
20
AS graph (L∞ errors)
g p ( )
20

2013/2/12
21
CAIDA graph (L1 errors)
g p ( )
21

2013/2/12
22
CAIDA graph (L∞ errors)
g p ( )
22

2013/2/12
23
Effect of probing rate α
p g
23

2013/2/12
24
Algorithm's image vs truth(1)
g g ( )
24

2013/2/12
25
Algorithm's image vs truth(2)
g g ( )
25

2013/2/12
26
Hybird Algorithm (L1 &L∞)
y g ( )
26

2013/2/12
27
Hybird Algorithm (β=01. or 0.9)
y g (β )
27

2013/2/12
28
Analysis(1)
y ( )
28

2013/2/12
29
Analysis(2)
y ( )
29

2013/2/12
30
Conclusion
30
 Obtain simple effective algorithm
 Obtain simple effective algorithm
 Evaluate algorithms empirically on real and
randomly generated datasets.
randomly generated datasets.
 Proved theoretical results in a simplified model
 Analyze the theoretical error bounds of the
 Analyze the theoretical error bounds of the
algorithm
 Challenge: extend our theoretical analysis to other
 Challenge: extend our theoretical analysis to other
models of graph evolution.

2013/2/12
31
Reference
31
 1. S. Brin, L. Page, Computer Networks and ISDN Systems 30, 107
g p y
(1998)
 2. Glen Jeh and Jennifer Widom. 2003. Scaling personalized web
search WWW '03 http://doi acm org/10 1145/775152 775191
search. WWW 03 http://doi.acm.org/10.1145/775152.775191
 3. Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
The PageRank Citation Ranking: Bringing Order to the Web.
 4. http://en.wikipedia.org/wiki/Webgraph
 5. http://en.wikipedia.org/wiki/PageRank#cite_note-1
6 K A h k N Lit k D N i k d N O i M t
 6. K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova. Monte
Carlo methods in Pagerank computation: When one iteration is sucient.
SIAM J.Numer. Anal., 45(2):890-904, 2007.

PageRank on an evolving graph - Yanzhao Yang : NOTES

More Related Content

Similar to PageRank on an evolving graph - Yanzhao Yang : NOTES

More from Subhajit Sahu

Recently uploaded

PageRank on an evolving graph - Yanzhao Yang : NOTES