Upcoming SlideShare
×

# A dynamical system for PageRank with time-dependent teleportation

706

Published on

A talk based on

Published in: Technology, Education
2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
706
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
10
0
Likes
2
Embeds 0
No embeds

No notes for slide

### A dynamical system for PageRank with time-dependent teleportation

1. 1. A dynamical systemfor PageRank withtime-dependentteleportationDavid F. Gleich!Computer Science"Purdue UniversityPaper http://arxiv.org/abs/1211.4266Code https://www.cs.purdue.edu/homes/dgleich/codes/dynsyspr-imRyan A. Rossi!Computer Science"Purdue University1David Gleich · Purdue ANL Seminar
2. 2. 1.  Perspectives on PageRank2.  PageRank as a dynamical system andtime-dependent teleportation3.  Predicting using PageRank4.  Applications to the power-grid?2David Gleich · Purdue ANL Seminar
3. 3. Given a graph, what are themost important nodes? 3David Gleich · Purdue ANL Seminar
4. 4. The random surfer model!At a node …1.  follow edges with prob α2.  do something else with prob (1-α)Google’s PageRank is onepossible answerPageRank by Google123456The Model1. follow edges uniformly withprobability , and2. randomly jump with probability1 , we’ll assume everywhere isequally likelyThe places we ﬁnd thesurfer most often are im-portant pages.The important pages are theplaces we are most likely to ﬁndthe random surfer4David Gleich · Purdue ANL Seminar
5. 5. The most important page on the web.!5David Gleich · Purdue ANL Seminar
6. 6. PageRank details123456!26641/6 1/2 0 0 0 01/6 0 0 1/3 0 01/6 1/2 0 1/3 0 01/6 0 1/2 0 0 01/6 0 1/2 1/3 0 11/6 0 0 0 1 03775| {z }PP j 0eT P=eT“jump” ! v = [ 1n... 1n ]T 0eT v=1Markov chainîP + (1 )veTóx = xunique x ) j 0, eT x = 1.Linear system ( P)x = (1 )vIgnored dangling nodes patched back to valgorithms laterDavid F. Gleich (Sandia) PageRank intro Purdue 6 / 36PageRank by Google123456The Model1. follow edges uniformly withprobability , and2. randomly jump with probability1 , we’ll assume everywhereequally likelyThe places we ﬁnd thesurfer most often are im-portant pages.David F. Gleich (Sandia) PageRank intro PurduePageRank via v is the jump vector.! vi 0, eTv = 16David Gleich · Purdue ANL Seminar
7. 7. My deﬁnition of PageRankA PageRank vector x is the solution of the linear system:(I – αP) x = (1 –α) vwhere P is a column stochastic matrix, 0 ≤ α< 1, and v is aprobability vector.tails!26641/6 1/2 0 0 0 01/6 0 0 1/3 0 01/6 1/2 0 1/3 0 01/6 0 1/2 0 0 01/6 0 1/2 1/3 0 11/6 0 0 0 1 03775| {z }PP j 0eT P=eTJust three ingredients!vi 0, eTv = 1↵ usually 0.5 to 0.997David Gleich · Purdue ANL Seminar
8. 8. This deﬁnition applies to aremarkable variety of problems1.  GeneRank 2.  ProteinRank 3.  FoodRank 4.  SportsRank 5.  HostRank 6.  TrustRank 7.  BadRank 8.  IsoRank 9.  SimRank 10.  ObjectRank 11.  ItemRank 12.  ArticleRank 13.  BookRank 14.  FutureRank 15.  TimedPageRank 16.  SocialPageRank 17.  DiffusionRank 18.  ImpressionRank 19.  TweetRank 20.  TwitterRank 21.  ReversePageRank 22.  PageTrust 23.  PopRank 24.  CiteRank 25.  FactRank 26.  InvestorRank 27.  ImageRank 28.  VisualRank 29.  QueryRank 30.  BookmarkRan31.  StoryRank 32.  PerturbationRank 33.  ChemicalRank 34.  RoadRank 35.  PaperRank36.  Etc…8David Gleich · Purdue ANL Seminar
9. 9. Richardson is a robust, simplealgorithm to compute PageRank(I ↵P)x = (1 ↵)vRichardson )x(k+1)= ↵Px(k)+ (1 ↵)verror = kx(k)xk1  2↵kGiven α, P, v9David Gleich · Purdue ANL Seminar
10. 10. The teleportation distribution vmodels where surfers “restart”What if this changes with time?10David Gleich · Purdue ANL Seminar
11. 11. First ideaResolve PageRank when v changes+ PageRank is fast to solve!+ Easy to understand– Need another model to incorporate the past– PageRank isn’t that fast to solve.Is there anything better?11David Gleich · Purdue ANL Seminar
12. 12. Let’s look at how PageRankevolves with iterationsx(k)= x(k+1)x(k)= ↵Px(k)+ (1 ↵)v x(k)= (1 ↵)v (I ↵P)x(k)x0(t) = (1 ↵)v (I ↵P)x(t)PageRank is the steady-state solution of the ODE12David Gleich · Purdue ANL Seminar
13. 13. A dynamical system for "time-dependent teleportation+ Easy to integrate+ Easy to understand+ Possible to treat analytically!– Need to “model time” (not dimensionless)– Still useful to have a data assimilation modelx0(t) = (1 ↵)v(t) (I ↵P)x(t)13David Gleich · Purdue ANL Seminar
14. 14. Need a self-stabilized ODEWe use a standard RK integrator "(ode45 in Matlab)We used the formulationto maintain x(t) as a probability distributionx0(t) = (1 ↵)v(t) ( I ↵P)x(t)= (1 ↵)eTv(t) + ↵eTx(t)14David Gleich · Purdue ANL Seminar
15. 15. Where is this model realistic?On Wikipedia, we havehourly visit data that providesa coarse measure of outsideinterest15David Gleich · Purdue ANL Seminar
16. 16. Now PageRank values aretime-series, not static scores1 MainPage 2 FrancisMag 311 501(c) 12 Searching 1EarthquakeAustralianEarthquakeoccurs!Main pageTime Time Importance16David Gleich · Purdue ANL Seminar
17. 17. Some quick theoryx(t) = exp[ (I ↵P)t]x(0)+ (1 ↵)Z t0exp[ (I ↵P)(t ⌧)]v(⌧) d⌧.x0(t) = (1 ↵)v(t) (I ↵P)x(t)Z t0exp[ (I ↵P)(t ⌧)]v(⌧) d⌧= (I ↵P) 1v exp[ (I ↵P)t](I ↵P) 1vx(t) = exp[ (I ↵P)t](x(0) x) + xForgeneralv(t)Forstaticv(t) = v The original "PageRank vector17David Gleich · Purdue ANL Seminar
18. 18. Thus we recover "the original PageRank vector "if interest stops changing.18David Gleich · Purdue ANL Seminar
19. 19. 0 5 10 15 200.10.20.30.40.5timeDynamicPageRankPage 1Page 2Page 3Page 4Cyclical behavior in the time-dependent PageRank scores12340 20 40 60 8000.050.10.150.2timeTime−dependentteleportationPage 1Page 2Page 3Page 419David Gleich · Purdue ANL Seminar
20. 20. Modeling cyclical behaviorCyclically switch between teleportation vectors vj v(t) =1kkXj=1vj⇣cos(t + (j 1)2⇡k ) + 1⌘0 20 40 60 8000.050.10.150.2timeTime−dependentteleportationPage 1Page 2Page 3Page 4v1 v2 v1 v220David Gleich · Purdue ANL Seminar
21. 21. Modeling cyclical behaviorCyclically switch between teleportation vectors vj v(t) =1kkXj=1vj⇣cos(t + (j 1)2⇡k ) + 1⌘x(t) = x + Re {s exp(ıt)}Then the eventual solution is (I ↵P)x = (1 ↵)1kVe(I ↵1+ı P)s= (1 ↵) 1k(1+ı) V exp(ıf)PageRank vector with average teleportationPageRank withcomplex teleportation21David Gleich · Purdue ANL Seminar
22. 22. Thus we can determine "the size of the oscillation "for the case of cyclicalteleportation22David Gleich · Purdue ANL Seminar
23. 23. Is it useful? Let’s try andpredict retweets on Twitter We crawled Twitter and gathered "a graph of who follows who and "how active each user is in a month This yields a graph and 6 vectors v!!Our goal is to predict how many tweets you’llsend next month based on the current month!23David Gleich · Purdue ANL Seminar
24. 24. First, how do we model time?v1, ... , vk ! V =⇥v1, ... , vk⇤v(t) = Ve(ﬂoor {t} + 1) = vﬂoor{t}+1 t=1 is one monthvs(t) = Ve(ﬂoor {t/s} + 1) = vﬂoor{t/s}+1Rescaling timet=s is one monthx(sj), j = 0, 1, ... These are the same time pointss=∞ yields a recomputed PageRank at each step!24David Gleich · Purdue ANL Seminar
25. 25. The effect of s on PageRankof one node is considerables = 1 s = 2 s = 6(a) timescale ss = 1 s = 2 s = 6Time PageRankx1(t)gray involves just recomputing PageRank at each changeData from Wikipedia25David Gleich · Purdue ANL Seminar
26. 26. Second, can we make it smooth?v1, ... , vk ! V =⇥v1, ... , vk⇤v(t) = Ve(ﬂoor {t} + 1) = vﬂoor{t}+1 t=1 is one month¯v(t; ✓) = v(t)| {z }new data+ (1 )¯v(t h; ✓)| {z }old data,¯v0(t; ✓) = ✓v(t) ✓¯v(t; ✓) Full ODEForward Euler "interpretation26David Gleich · Purdue ANL Seminar
27. 27. θ = 0.1 θ = 1 θ = 10(b) smoothing ✓The effect of theta on PageRankof one node is moderateTime PageRankx1(t)Only matters if there is a big jumpData from Wikipedia= 6 θ = 0.1 θ = 1 θ = 10(b) smoothing ✓27David Gleich · Purdue ANL Seminar
28. 28. Parameters of the predictionalpha – PageRank modeling parameters s – time-scaletheta - smoothing28David Gleich · Purdue ANL Seminar
29. 29. The prediction model⇥¯f(t 1) ¯f(t 2) ... ¯f(t w)⇤b ⇡ p(t)sMAPE =1|T||T|Xt=1|pt ˆpt |(pt + ˆpt )/2averaged over nodesLinear, one-step ahead predictionis evaluated using 29David Gleich · Purdue ANL Seminar
30. 30. The resultsDataset Type ✓ Error Ratios (timescale)1 2 6 1TWITTER stationary 0.01 0.635 0.929 0.913 0.9960.50 0.636 0.735 0.854 0.9391.00 0.522 0.562 0.710 0.963non-stationary 0.01 0.461 0.841 1.001 0.9920.50 0.261 0.608 0.585 0.9291.00 0.137 0.605 0.617 0.918Err Ratio = SMAPE of tweets + Time-dependent PR / SMAPE of tweets onlyIf this ratio < 1, then using Time-dependent PR helpsStationary nodes are those with small maximum change in scoresNon-stationary nodes are those with large maximum change in scores30David Gleich · Purdue ANL Seminar
31. 31. We tried the same experiment with Wikipedia, "but there was no meaningful change in the prediction error.31David Gleich · Purdue ANL Seminar
32. 32. Using Granger Causality to study linkrelationships on Wikipedia51 Greygoo 52 pageprotec 53 R61 Science 62 Gackt 63 T71 Madonna(en 72 Richtermag 73 T81 Livingpeop 82 Mathematic 83 S91 Categories 92 Germany 93 Mogy 20 Geographyatic 30 Biographyen(f 40 Earthquakeio 50 Raceandeth60 Football(sEarthquake Richter Mag.Causes?Of course! We build this into the model.32David Gleich · Purdue ANL Seminar
33. 33. But, the question is, which ofthese are preserved afterincorporating the effects ofpage view data?33David Gleich · Purdue ANL Seminar
34. 34. Using Granger Causality to ﬁnd theimportant links on WikipediaEarthquake Granger causes p-valueSeismic hazard 0.003535Extensional tectonics 0.003033Landslide dam 0.002406Earthquake preparedness 0.001157Richter magnitude scale 0.000584Fault (geology) 0.000437Aseismic creep 0.000419Seismometer 0.000284Epicenter 0.000020Seismology 0.00000134David Gleich · Purdue ANL Seminar
35. 35. Thus, these links “ﬁt” ourmodel, whereas the other linkson the page do not.35David Gleich · Purdue ANL Seminar
36. 36. Application to the power gridPrior work •  Kim, Obah, 2007; Jin et al., 2010; Adolf et al., 2011; Halappanavar etal., 2012has found that graph properties have importantcorrelations with power-grid vulnerabilities andcontingency analysis36David Gleich · Purdue ANL Seminar
37. 37. Each edge has a powerﬂow that satisﬁes somenon-linear power ﬂowequation.We use average dailyﬂows to study time-dependent PageRankon the line graph of theunderlying network.Lines with high variancemay be problematic?37David Gleich · Purdue ANL Seminar
38. 38. My questionsSample data to test this idea?Too simplistic?Time-dependent betweenness centralitywith cyclical teleportation?Other power-grid problems where similar ideasmay be able to help?38David Gleich · Purdue ANL Seminar
39. 39. A dynamical systemfor PageRank withtime-dependentteleportationDavid F. Gleich!Computer Science"Purdue UniversityPaper http://arxiv.org/abs/1211.4266Code https://www.cs.purdue.edu/homes/dgleich/codes/dynsyspr-imRyan A. Rossi!Computer Science"Purdue University39David Gleich · Purdue ANL Seminar
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.