Highlighted notes while research with Prof. Dip Sankar Banerjee, Prof. Kishore Kothapalli:
PageRank on an evolving graph - Yanzhao Yang.
https://theory.utdallas.edu/seminar/G2S13/YY/Pagerank%20on%20evolving%20graph-Yanzhao%20Yang.pdf
Pagerank may be always imprecise, due to lack of knowledge of up-to-date/complete graph. Millions of hyperlinks/social-links modified each day. Which portions of the web should a crawler focus most (probing strategy)? Probing techniques discussed are Random probing, Round-robin probing, Proportional probing (random, proportional to node's pagerank), Priority probing (deterministic, pick node with highest cumulative pagerank sum), Hybrid probing (proportional + round-robin).
Professional Resume Template for Software Developers
PageRank on an evolving graph - Yanzhao Yang : NOTES
1. 2013/2/12
1
1
PAGERANK ON AN
PAGERANK ON AN
EVOLVING GRAPH
Bahman Bahmani(Stanford)
Ravi Kumar(Google)
Mohammad Mahdian(Google)
Eli Upfal(Brown)
Present by
Present by
Yanzhao Yang
2. 2013/2/12
2
Evolving Graph(Web Graph)
g p ( p )
2
The directed links between web pages
The directed links between web pages
Used for computing the PageRank of the WWW
pages [4]
pages [4]
3. 2013/2/12
3
Page Rank
g
3
Classic link analysis algorithm based on the web
Classic link analysis algorithm based on the web
graph
A page that is linked to by many pages receives a
A page that is linked to by many pages receives a
high rank itself. Otherwise, it receives a low rank.
The rank value indicates an importance of a
The rank value indicates an importance of a
particular page. [5]
Very effective measure of reputation for both web
Very effective measure of reputation for both web
graphs and social networks.
6. 2013/2/12
6
Traditional Paradigm
g
6
Stationary dataset input- inadequate for
Stationary dataset input inadequate for
current social networks
It is necessary for algorithm to probe the
Data
It is necessary for algorithm to probe the
input continually and produce solutions at
any point in time that are close to the
Al i h
y p
correct solution for the then-current input.
Algorithm
Output
7. 2013/2/12
7
Motivating examples
g p
7
Web pages
Millions of hyperlinks modified each day
f f
Which portions of the web should a crawler focus
most?
Social networks
Social networks
Millions of social links modified each day
Which users should a third party site track in
Which users should a third-party site track in
order to recompute, eg, global reputation?
8. 2013/2/12
8
Motivating examples
g p
8
In fact, Pagerank may be always imprecise.
In fact, Pagerank may be always imprecise.
e.g. Learn about changes->
crawling webs >
crawling webs->
limit of crawling capacity->
l i f h >
stale image of graph ->
graph structure->
Pagerank
9. 2013/2/12
9
Objective Algorithm
j g
9
Design an algorithm that decides which pages to
Design an algorithm that decides which pages to
crawl and computes the PageRank using the
obtained information
Maintains a good approximation of the true
PageRank values of the underlying evolving graph
g y g g g p
Which pages to crawl
The error is bounded at any point in time
The error is bounded at any point in time
10. 2013/2/12
10
Page Rank algorithm categories
g g g
10
Linear algebraic methods[3]
Linear algebraic methods[3]
-Power iteration speed up.
E.g, web graph.
E.g, web graph.
Monte carlo methods[6]
-efficient and highly scalable
efficient and highly scalable
E.g, data streaming anfd map reduce.
11. 2013/2/12
11
Evolving graph model
g g p
11
A sequence of directed graphs over time
Gt = (V, Et) = graph at time t
Nodes do not change (for simplicity)
A |E E | i ll
Assume |Et+1 – Et| is small
Choose t fine enough
No change model assumed
No change model assumed
At time t, algorithm can probe a node u to get N(u),
i.e, all edges in Et of the form (u, v)
No constraints on running time or storage space
Probing strategy focus on which node to probe
each time
12. 2013/2/12
12
PageRank on evolving graphs
g g g p
12
Teleport probability-ε
p p y
Probability of jumping to a random node
Stationary distribution of random walk:
-walk with ε: move to a node chosen uniformly at random
-walk with 1-ε:choose one of the outgoing edges of the current
node uniformly at random and move to the head of that
node uniformly at random and move to the head of that
edge
is PageRank of node u in G
t
u
is in-degree of node u
is out-degree of node u
t
u
in
t
u
out
13. 2013/2/12
13
Baseline probing methods
p g
13
Random probing(randomized)
p g( )
Probe a node v chosen uniformly at random at
each time step
p
Round-robin probing(deterministic)
Cycle through all nodes and probe each in a
Cy g p
round-robin manner
We can vary the ratio of change rate and probing
y g p g
rate
14. 2013/2/12
14
Propotional Probing
p g
14
At each step t, pick a node v to probe with
At each step t, pick a node v to probe with
probability proportional to the PageRank of v in the
algorithm's current image of the graph.
g g g p
The output is the PageRank vector of the current
image.
g
16. 2013/2/12
16
Experiment
p
16
Dataset
Dataset
AS(Autonom ous Systems, graph of routers)
CAIDA(communication patterns of the routers)
CAIDA(communication patterns of the routers)
RAND (generated randomly)
17. 2013/2/12
17
Experiment
p
17
Random Probing serves as a baseline for
Proportional Probing
Round-Robin serves as a baseline for Priority
Probing
Hybird algorithm between Proportional Probing and
Round-Robin Probing is parametrized by
Metric
1
,
0
,
u
u t
t
V
u
t
-
max
,
L
metric
L t
t
t
t
L
t i
L t
u
u t
t
V
u
t
-
,
L
metric
L t
1
1
18. 2013/2/12
18
Results( AS & CAIDA )
( )
18
Propotional Probing is better than Random Probing
p g g
Priority Probing is better than Round-Robin Probing
The algorithm perform better when they probe more
The algorithm perform better when they probe more
frequently
30. 2013/2/12
30
Conclusion
30
Obtain simple effective algorithm
Obtain simple effective algorithm
Evaluate algorithms empirically on real and
randomly generated datasets.
randomly generated datasets.
Proved theoretical results in a simplified model
Analyze the theoretical error bounds of the
Analyze the theoretical error bounds of the
algorithm
Challenge: extend our theoretical analysis to other
Challenge: extend our theoretical analysis to other
models of graph evolution.
31. 2013/2/12
31
Reference
31
1. S. Brin, L. Page, Computer Networks and ISDN Systems 30, 107
g p y
(1998)
2. Glen Jeh and Jennifer Widom. 2003. Scaling personalized web
search WWW '03 http://doi acm org/10 1145/775152 775191
search. WWW 03 http://doi.acm.org/10.1145/775152.775191
3. Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
The PageRank Citation Ranking: Bringing Order to the Web.
4. http://en.wikipedia.org/wiki/Webgraph
5. http://en.wikipedia.org/wiki/PageRank#cite_note-1
6 K A h k N Lit k D N i k d N O i M t
6. K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova. Monte
Carlo methods in Pagerank computation: When one iteration is sucient.
SIAM J.Numer. Anal., 45(2):890-904, 2007.