2013/2/12
1
1
PAGERANK ON AN
PAGERANK ON AN
EVOLVING GRAPH
Bahman Bahmani(Stanford)
Ravi Kumar(Google)
Mohammad Mahdian(Google)
Eli Upfal(Brown)
Present by
Present by
Yanzhao Yang
2013/2/12
2
Evolving Graph(Web Graph)
g p ( p )
2
 The directed links between web pages
 The directed links between web pages
 Used for computing the PageRank of the WWW
pages [4]
pages [4]
2013/2/12
3
Page Rank
g
3
 Classic link analysis algorithm based on the web
 Classic link analysis algorithm based on the web
graph
 A page that is linked to by many pages receives a
 A page that is linked to by many pages receives a
high rank itself. Otherwise, it receives a low rank.
 The rank value indicates an importance of a
 The rank value indicates an importance of a
particular page. [5]
 Very effective measure of reputation for both web
 Very effective measure of reputation for both web
graphs and social networks.
2013/2/12
4
Example
p
4
2013/2/12
5
Problem
5
 Traditional algorithm paradigm is inadequate for
 Traditional algorithm paradigm is inadequate for
evolving data
2013/2/12
6
Traditional Paradigm
g
6
 Stationary dataset input- inadequate for
 Stationary dataset input inadequate for
current social networks
 It is necessary for algorithm to probe the
Data
 It is necessary for algorithm to probe the
input continually and produce solutions at
any point in time that are close to the
Al i h
y p
correct solution for the then-current input.
Algorithm
Output
2013/2/12
7
Motivating examples
g p
7
 Web pages
 Millions of hyperlinks modified each day
f f
 Which portions of the web should a crawler focus
most?
Social networks
 Social networks
 Millions of social links modified each day
 Which users should a third party site track in
 Which users should a third-party site track in
order to recompute, eg, global reputation?
2013/2/12
8
Motivating examples
g p
8
 In fact, Pagerank may be always imprecise.
 In fact, Pagerank may be always imprecise.
e.g. Learn about changes->
crawling webs >
crawling webs->
limit of crawling capacity->
l i f h >
stale image of graph ->
graph structure->
Pagerank
2013/2/12
9
Objective Algorithm
j g
9
 Design an algorithm that decides which pages to
 Design an algorithm that decides which pages to
crawl and computes the PageRank using the
obtained information
 Maintains a good approximation of the true
PageRank values of the underlying evolving graph
g y g g g p
 Which pages to crawl
 The error is bounded at any point in time
 The error is bounded at any point in time
2013/2/12
10
Page Rank algorithm categories
g g g
10
 Linear algebraic methods[3]
 Linear algebraic methods[3]
-Power iteration speed up.
E.g, web graph.
E.g, web graph.
 Monte carlo methods[6]
-efficient and highly scalable
efficient and highly scalable
E.g, data streaming anfd map reduce.
2013/2/12
11
Evolving graph model
g g p
11
 A sequence of directed graphs over time
 Gt = (V, Et) = graph at time t
 Nodes do not change (for simplicity)
A |E E | i ll
 Assume |Et+1 – Et| is small
 Choose t fine enough
 No change model assumed
 No change model assumed
 At time t, algorithm can probe a node u to get N(u),
i.e, all edges in Et of the form (u, v)
 No constraints on running time or storage space
 Probing strategy focus on which node to probe
each time
2013/2/12
12
PageRank on evolving graphs
g g g p
12
 Teleport probability-ε
p p y
 Probability of jumping to a random node
 Stationary distribution of random walk:
-walk with ε: move to a node chosen uniformly at random
-walk with 1-ε:choose one of the outgoing edges of the current
node uniformly at random and move to the head of that
node uniformly at random and move to the head of that
edge
 is PageRank of node u in G
t
u

 is in-degree of node u
 is out-degree of node u
t
u
in
t
u
out
2013/2/12
13
Baseline probing methods
p g
13
 Random probing(randomized)
p g( )
Probe a node v chosen uniformly at random at
each time step
p
 Round-robin probing(deterministic)
Cycle through all nodes and probe each in a
Cy g p
round-robin manner
We can vary the ratio of change rate and probing
y g p g
rate
2013/2/12
14
Propotional Probing
p g
14
 At each step t, pick a node v to probe with
 At each step t, pick a node v to probe with
probability proportional to the PageRank of v in the
algorithm's current image of the graph.
g g g p
 The output is the PageRank vector of the current
image.
g
2013/2/12
15
Priority Probing
y g
15
do
t
step
every time
for
0
Priority
do
u
node
all
for

u
2013/2/12
16
Experiment
p
16
 Dataset
 Dataset
 AS(Autonom ous Systems, graph of routers)
 CAIDA(communication patterns of the routers)
 CAIDA(communication patterns of the routers)
 RAND (generated randomly)
2013/2/12
17
Experiment
p
17
 Random Probing serves as a baseline for
Proportional Probing
 Round-Robin serves as a baseline for Priority
Probing
 Hybird algorithm between Proportional Probing and
Round-Robin Probing is parametrized by
 Metric  
1
,
0

  
,

     
u
u t
t
V
u
t



 -
max
,
L
metric
L t


 
     
t
t
t


L
t i
L t

     
u
u t
t
V
u
t



 -
,
L
metric
L t
1
1 


2013/2/12
18
Results( AS & CAIDA )
( )
18
 Propotional Probing is better than Random Probing
p g g
 Priority Probing is better than Round-Robin Probing
 The algorithm perform better when they probe more
 The algorithm perform better when they probe more
frequently
2013/2/12
19
AS graph (L1 errors)
g p ( )
19
2013/2/12
20
AS graph (L∞ errors)
g p ( )
20
2013/2/12
21
CAIDA graph (L1 errors)
g p ( )
21
2013/2/12
22
CAIDA graph (L∞ errors)
g p ( )
22
2013/2/12
23
Effect of probing rate α
p g
23
2013/2/12
24
Algorithm's image vs truth(1)
g g ( )
24
2013/2/12
25
Algorithm's image vs truth(2)
g g ( )
25
2013/2/12
26
Hybird Algorithm (L1 &L∞)
y g ( )
26
2013/2/12
27
Hybird Algorithm (β=01. or 0.9)
y g (β )
27
2013/2/12
28
Analysis(1)
y ( )
28
2013/2/12
29
Analysis(2)
y ( )
29
2013/2/12
30
Conclusion
30
 Obtain simple effective algorithm
 Obtain simple effective algorithm
 Evaluate algorithms empirically on real and
randomly generated datasets.
randomly generated datasets.
 Proved theoretical results in a simplified model
 Analyze the theoretical error bounds of the
 Analyze the theoretical error bounds of the
algorithm
 Challenge: extend our theoretical analysis to other
 Challenge: extend our theoretical analysis to other
models of graph evolution.
2013/2/12
31
Reference
31
 1. S. Brin, L. Page, Computer Networks and ISDN Systems 30, 107
g p y
(1998)
 2. Glen Jeh and Jennifer Widom. 2003. Scaling personalized web
search WWW '03 http://doi acm org/10 1145/775152 775191
search. WWW 03 http://doi.acm.org/10.1145/775152.775191
 3. Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
The PageRank Citation Ranking: Bringing Order to the Web.
 4. http://en.wikipedia.org/wiki/Webgraph
 5. http://en.wikipedia.org/wiki/PageRank#cite_note-1
6 K A h k N Lit k D N i k d N O i M t
 6. K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova. Monte
Carlo methods in Pagerank computation: When one iteration is sucient.
SIAM J.Numer. Anal., 45(2):890-904, 2007.
2013/2/12
32
Thank you
y
32

PageRank on an evolving graph - Yanzhao Yang : NOTES

  • 1.
    2013/2/12 1 1 PAGERANK ON AN PAGERANKON AN EVOLVING GRAPH Bahman Bahmani(Stanford) Ravi Kumar(Google) Mohammad Mahdian(Google) Eli Upfal(Brown) Present by Present by Yanzhao Yang
  • 2.
    2013/2/12 2 Evolving Graph(Web Graph) gp ( p ) 2  The directed links between web pages  The directed links between web pages  Used for computing the PageRank of the WWW pages [4] pages [4]
  • 3.
    2013/2/12 3 Page Rank g 3  Classiclink analysis algorithm based on the web  Classic link analysis algorithm based on the web graph  A page that is linked to by many pages receives a  A page that is linked to by many pages receives a high rank itself. Otherwise, it receives a low rank.  The rank value indicates an importance of a  The rank value indicates an importance of a particular page. [5]  Very effective measure of reputation for both web  Very effective measure of reputation for both web graphs and social networks.
  • 4.
  • 5.
    2013/2/12 5 Problem 5  Traditional algorithmparadigm is inadequate for  Traditional algorithm paradigm is inadequate for evolving data
  • 6.
    2013/2/12 6 Traditional Paradigm g 6  Stationarydataset input- inadequate for  Stationary dataset input inadequate for current social networks  It is necessary for algorithm to probe the Data  It is necessary for algorithm to probe the input continually and produce solutions at any point in time that are close to the Al i h y p correct solution for the then-current input. Algorithm Output
  • 7.
    2013/2/12 7 Motivating examples g p 7 Web pages  Millions of hyperlinks modified each day f f  Which portions of the web should a crawler focus most? Social networks  Social networks  Millions of social links modified each day  Which users should a third party site track in  Which users should a third-party site track in order to recompute, eg, global reputation?
  • 8.
    2013/2/12 8 Motivating examples g p 8 In fact, Pagerank may be always imprecise.  In fact, Pagerank may be always imprecise. e.g. Learn about changes-> crawling webs > crawling webs-> limit of crawling capacity-> l i f h > stale image of graph -> graph structure-> Pagerank
  • 9.
    2013/2/12 9 Objective Algorithm j g 9 Design an algorithm that decides which pages to  Design an algorithm that decides which pages to crawl and computes the PageRank using the obtained information  Maintains a good approximation of the true PageRank values of the underlying evolving graph g y g g g p  Which pages to crawl  The error is bounded at any point in time  The error is bounded at any point in time
  • 10.
    2013/2/12 10 Page Rank algorithmcategories g g g 10  Linear algebraic methods[3]  Linear algebraic methods[3] -Power iteration speed up. E.g, web graph. E.g, web graph.  Monte carlo methods[6] -efficient and highly scalable efficient and highly scalable E.g, data streaming anfd map reduce.
  • 11.
    2013/2/12 11 Evolving graph model gg p 11  A sequence of directed graphs over time  Gt = (V, Et) = graph at time t  Nodes do not change (for simplicity) A |E E | i ll  Assume |Et+1 – Et| is small  Choose t fine enough  No change model assumed  No change model assumed  At time t, algorithm can probe a node u to get N(u), i.e, all edges in Et of the form (u, v)  No constraints on running time or storage space  Probing strategy focus on which node to probe each time
  • 12.
    2013/2/12 12 PageRank on evolvinggraphs g g g p 12  Teleport probability-ε p p y  Probability of jumping to a random node  Stationary distribution of random walk: -walk with ε: move to a node chosen uniformly at random -walk with 1-ε:choose one of the outgoing edges of the current node uniformly at random and move to the head of that node uniformly at random and move to the head of that edge  is PageRank of node u in G t u   is in-degree of node u  is out-degree of node u t u in t u out
  • 13.
    2013/2/12 13 Baseline probing methods pg 13  Random probing(randomized) p g( ) Probe a node v chosen uniformly at random at each time step p  Round-robin probing(deterministic) Cycle through all nodes and probe each in a Cy g p round-robin manner We can vary the ratio of change rate and probing y g p g rate
  • 14.
    2013/2/12 14 Propotional Probing p g 14 At each step t, pick a node v to probe with  At each step t, pick a node v to probe with probability proportional to the PageRank of v in the algorithm's current image of the graph. g g g p  The output is the PageRank vector of the current image. g
  • 15.
    2013/2/12 15 Priority Probing y g 15 do t step everytime for 0 Priority do u node all for  u
  • 16.
    2013/2/12 16 Experiment p 16  Dataset  Dataset AS(Autonom ous Systems, graph of routers)  CAIDA(communication patterns of the routers)  CAIDA(communication patterns of the routers)  RAND (generated randomly)
  • 17.
    2013/2/12 17 Experiment p 17  Random Probingserves as a baseline for Proportional Probing  Round-Robin serves as a baseline for Priority Probing  Hybird algorithm between Proportional Probing and Round-Robin Probing is parametrized by  Metric   1 , 0     ,        u u t t V u t     - max , L metric L t           t t t   L t i L t        u u t t V u t     - , L metric L t 1 1   
  • 18.
    2013/2/12 18 Results( AS &CAIDA ) ( ) 18  Propotional Probing is better than Random Probing p g g  Priority Probing is better than Round-Robin Probing  The algorithm perform better when they probe more  The algorithm perform better when they probe more frequently
  • 19.
    2013/2/12 19 AS graph (L1errors) g p ( ) 19
  • 20.
    2013/2/12 20 AS graph (L∞errors) g p ( ) 20
  • 21.
    2013/2/12 21 CAIDA graph (L1errors) g p ( ) 21
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
    2013/2/12 30 Conclusion 30  Obtain simpleeffective algorithm  Obtain simple effective algorithm  Evaluate algorithms empirically on real and randomly generated datasets. randomly generated datasets.  Proved theoretical results in a simplified model  Analyze the theoretical error bounds of the  Analyze the theoretical error bounds of the algorithm  Challenge: extend our theoretical analysis to other  Challenge: extend our theoretical analysis to other models of graph evolution.
  • 31.
    2013/2/12 31 Reference 31  1. S.Brin, L. Page, Computer Networks and ISDN Systems 30, 107 g p y (1998)  2. Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search WWW '03 http://doi acm org/10 1145/775152 775191 search. WWW 03 http://doi.acm.org/10.1145/775152.775191  3. Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd. The PageRank Citation Ranking: Bringing Order to the Web.  4. http://en.wikipedia.org/wiki/Webgraph  5. http://en.wikipedia.org/wiki/PageRank#cite_note-1 6 K A h k N Lit k D N i k d N O i M t  6. K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova. Monte Carlo methods in Pagerank computation: When one iteration is sucient. SIAM J.Numer. Anal., 45(2):890-904, 2007.
  • 32.