A dynamical systemfor PageRank withtime-dependentteleportationDavid F. Gleich!Computer Science"Purdue UniversityPaper http...
1.  Perspectives on PageRank2.  PageRank as a dynamical system andtime-dependent teleportation3.  Predicting using PageRan...
Given a graph, what are themost important nodes? 3David Gleich · Purdue  ANL Seminar
The random surfer model!At a node …1.  follow edges with prob α2.  do something else with prob (1-α)Google’s PageRank is o...
The most important page on the web.!5David Gleich · Purdue  ANL Seminar
PageRank details123456!26641/6 1/2 0 0 0 01/6 0 0 1/3 0 01/6 1/2 0 1/3 0 01/6 0 1/2 0 0 01/6 0 1/2 1/3 0 11/6 0 0 0 1 0377...
My definition of PageRankA PageRank vector x is the solution of the linear system:(I – αP) x = (1 –α) vwhere P is a column ...
This definition applies to aremarkable variety of problems1.  GeneRank 2.  ProteinRank 3.  FoodRank 4.  SportsRank 5.  Host...
Richardson is a robust, simplealgorithm to compute PageRank(I ↵P)x = (1 ↵)vRichardson )x(k+1)= ↵Px(k)+ (1 ↵)verror = kx(k)...
The teleportation distribution vmodels where surfers “restart”What if this changes with time?10David Gleich · Purdue  ANL ...
First ideaResolve PageRank when v changes+ PageRank is fast to solve!+ Easy to understand– Need another model to incorpora...
Let’s look at how PageRankevolves with iterationsx(k)= x(k+1)x(k)= ↵Px(k)+ (1 ↵)v x(k)= (1 ↵)v (I ↵P)x(k)x0(t) = (1 ↵)v (I...
A dynamical system for "time-dependent teleportation+ Easy to integrate+ Easy to understand+ Possible to treat analyticall...
Need a self-stabilized ODEWe use a standard RK integrator "(ode45 in Matlab)We used the formulationto maintain x(t) as a p...
Where is this model realistic?On Wikipedia, we havehourly visit data that providesa coarse measure of outsideinterest15Dav...
Now PageRank values aretime-series, not static scores1 MainPage 2 FrancisMag 311 501(c) 12 Searching 1EarthquakeAustralian...
Some quick theoryx(t) = exp[ (I ↵P)t]x(0)+ (1 ↵)Z t0exp[ (I ↵P)(t ⌧)]v(⌧) d⌧.x0(t) = (1 ↵)v(t) (I ↵P)x(t)Z t0exp[ (I ↵P)(t...
Thus we recover "the original PageRank vector "if interest stops changing.18David Gleich · Purdue  ANL Seminar
0 5 10 15 200.10.20.30.40.5timeDynamicPageRankPage 1Page 2Page 3Page 4Cyclical behavior in the time-dependent PageRank sco...
Modeling cyclical behaviorCyclically switch between teleportation vectors vj v(t) =1kkXj=1vj⇣cos(t + (j 1)2⇡k ) + 1⌘0 20 4...
Modeling cyclical behaviorCyclically switch between teleportation vectors vj v(t) =1kkXj=1vj⇣cos(t + (j 1)2⇡k ) + 1⌘x(t) =...
Thus we can determine "the size of the oscillation "for the case of cyclicalteleportation22David Gleich · Purdue  ANL Semi...
Is it useful? Let’s try andpredict retweets on Twitter We crawled Twitter and gathered "a graph of who follows who and "ho...
First, how do we model time?v1, ... , vk ! V =⇥v1, ... , vk⇤v(t) = Ve(floor {t} + 1) = vfloor{t}+1 t=1 is one monthvs(t) = V...
The effect of s on PageRankof one node is considerables = 1 s = 2 s = 6(a) timescale ss = 1 s = 2 s = 6Time PageRankx1(t)g...
Second, can we make it smooth?v1, ... , vk ! V =⇥v1, ... , vk⇤v(t) = Ve(floor {t} + 1) = vfloor{t}+1 t=1 is one month¯v(t; ✓...
θ = 0.1 θ = 1 θ = 10(b) smoothing ✓The effect of theta on PageRankof one node is moderateTime PageRankx1(t)Only matters if...
Parameters of the predictionalpha – PageRank modeling parameters s – time-scaletheta - smoothing28David Gleich · Purdue  A...
The prediction model⇥¯f(t 1) ¯f(t 2) ... ¯f(t w)⇤b ⇡ p(t)sMAPE =1|T||T|Xt=1|pt ˆpt |(pt + ˆpt )/2averaged over nodesLinear...
The resultsDataset Type ✓ Error Ratios (timescale)1 2 6 1TWITTER stationary 0.01 0.635 0.929 0.913 0.9960.50 0.636 0.735 0...
We tried the same experiment with Wikipedia, "but there was no meaningful change in the prediction error.31David Gleich · ...
Using Granger Causality to study linkrelationships on Wikipedia51 Greygoo 52 pageprotec 53 R61 Science 62 Gackt 63 T71 Mad...
But, the question is, which ofthese are preserved afterincorporating the effects ofpage view data?33David Gleich · Purdue ...
Using Granger Causality to find theimportant links on WikipediaEarthquake Granger causes p-valueSeismic hazard 0.003535Exte...
Thus, these links “fit” ourmodel, whereas the other linkson the page do not.35David Gleich · Purdue  ANL Seminar
Application to the power gridPrior work •  Kim, Obah, 2007; Jin et al., 2010; Adolf et al., 2011; Halappanavar etal., 2012...
Each edge has a powerflow that satisfies somenon-linear power flowequation.We use average dailyflows to study time-dependent P...
My questionsSample data to test this idea?Too simplistic?Time-dependent betweenness centralitywith cyclical teleportation?...
A dynamical systemfor PageRank withtime-dependentteleportationDavid F. Gleich!Computer Science"Purdue UniversityPaper http...
Upcoming SlideShare
Loading in...5
×

A dynamical system for PageRank with time-dependent teleportation

706

Published on

A talk based on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
706
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

A dynamical system for PageRank with time-dependent teleportation

  1. 1. A dynamical systemfor PageRank withtime-dependentteleportationDavid F. Gleich!Computer Science"Purdue UniversityPaper http://arxiv.org/abs/1211.4266Code https://www.cs.purdue.edu/homes/dgleich/codes/dynsyspr-imRyan A. Rossi!Computer Science"Purdue University1David Gleich · Purdue ANL Seminar
  2. 2. 1.  Perspectives on PageRank2.  PageRank as a dynamical system andtime-dependent teleportation3.  Predicting using PageRank4.  Applications to the power-grid?2David Gleich · Purdue ANL Seminar
  3. 3. Given a graph, what are themost important nodes? 3David Gleich · Purdue ANL Seminar
  4. 4. The random surfer model!At a node …1.  follow edges with prob α2.  do something else with prob (1-α)Google’s PageRank is onepossible answerPageRank by Google123456The Model1. follow edges uniformly withprobability , and2. randomly jump with probability1 , we’ll assume everywhere isequally likelyThe places we find thesurfer most often are im-portant pages.The important pages are theplaces we are most likely to findthe random surfer4David Gleich · Purdue ANL Seminar
  5. 5. The most important page on the web.!5David Gleich · Purdue ANL Seminar
  6. 6. PageRank details123456!26641/6 1/2 0 0 0 01/6 0 0 1/3 0 01/6 1/2 0 1/3 0 01/6 0 1/2 0 0 01/6 0 1/2 1/3 0 11/6 0 0 0 1 03775| {z }PP j 0eT P=eT“jump” ! v = [ 1n... 1n ]T 0eT v=1Markov chainîP + (1 )veTóx = xunique x ) j 0, eT x = 1.Linear system ( P)x = (1 )vIgnored dangling nodes patched back to valgorithms laterDavid F. Gleich (Sandia) PageRank intro Purdue 6 / 36PageRank by Google123456The Model1. follow edges uniformly withprobability , and2. randomly jump with probability1 , we’ll assume everywhereequally likelyThe places we find thesurfer most often are im-portant pages.David F. Gleich (Sandia) PageRank intro PurduePageRank via v is the jump vector.! vi 0, eTv = 16David Gleich · Purdue ANL Seminar
  7. 7. My definition of PageRankA PageRank vector x is the solution of the linear system:(I – αP) x = (1 –α) vwhere P is a column stochastic matrix, 0 ≤ α< 1, and v is aprobability vector.tails!26641/6 1/2 0 0 0 01/6 0 0 1/3 0 01/6 1/2 0 1/3 0 01/6 0 1/2 0 0 01/6 0 1/2 1/3 0 11/6 0 0 0 1 03775| {z }PP j 0eT P=eTJust three ingredients!vi 0, eTv = 1↵ usually 0.5 to 0.997David Gleich · Purdue ANL Seminar
  8. 8. This definition applies to aremarkable variety of problems1.  GeneRank 2.  ProteinRank 3.  FoodRank 4.  SportsRank 5.  HostRank 6.  TrustRank 7.  BadRank 8.  IsoRank 9.  SimRank 10.  ObjectRank 11.  ItemRank 12.  ArticleRank 13.  BookRank 14.  FutureRank 15.  TimedPageRank 16.  SocialPageRank 17.  DiffusionRank 18.  ImpressionRank 19.  TweetRank 20.  TwitterRank 21.  ReversePageRank 22.  PageTrust 23.  PopRank 24.  CiteRank 25.  FactRank 26.  InvestorRank 27.  ImageRank 28.  VisualRank 29.  QueryRank 30.  BookmarkRan31.  StoryRank 32.  PerturbationRank 33.  ChemicalRank 34.  RoadRank 35.  PaperRank36.  Etc…8David Gleich · Purdue ANL Seminar
  9. 9. Richardson is a robust, simplealgorithm to compute PageRank(I ↵P)x = (1 ↵)vRichardson )x(k+1)= ↵Px(k)+ (1 ↵)verror = kx(k)xk1  2↵kGiven α, P, v9David Gleich · Purdue ANL Seminar
  10. 10. The teleportation distribution vmodels where surfers “restart”What if this changes with time?10David Gleich · Purdue ANL Seminar
  11. 11. First ideaResolve PageRank when v changes+ PageRank is fast to solve!+ Easy to understand– Need another model to incorporate the past– PageRank isn’t that fast to solve.Is there anything better?11David Gleich · Purdue ANL Seminar
  12. 12. Let’s look at how PageRankevolves with iterationsx(k)= x(k+1)x(k)= ↵Px(k)+ (1 ↵)v x(k)= (1 ↵)v (I ↵P)x(k)x0(t) = (1 ↵)v (I ↵P)x(t)PageRank is the steady-state solution of the ODE12David Gleich · Purdue ANL Seminar
  13. 13. A dynamical system for "time-dependent teleportation+ Easy to integrate+ Easy to understand+ Possible to treat analytically!– Need to “model time” (not dimensionless)– Still useful to have a data assimilation modelx0(t) = (1 ↵)v(t) (I ↵P)x(t)13David Gleich · Purdue ANL Seminar
  14. 14. Need a self-stabilized ODEWe use a standard RK integrator "(ode45 in Matlab)We used the formulationto maintain x(t) as a probability distributionx0(t) = (1 ↵)v(t) ( I ↵P)x(t)= (1 ↵)eTv(t) + ↵eTx(t)14David Gleich · Purdue ANL Seminar
  15. 15. Where is this model realistic?On Wikipedia, we havehourly visit data that providesa coarse measure of outsideinterest15David Gleich · Purdue ANL Seminar
  16. 16. Now PageRank values aretime-series, not static scores1 MainPage 2 FrancisMag 311 501(c) 12 Searching 1EarthquakeAustralianEarthquakeoccurs!Main pageTime Time Importance16David Gleich · Purdue ANL Seminar
  17. 17. Some quick theoryx(t) = exp[ (I ↵P)t]x(0)+ (1 ↵)Z t0exp[ (I ↵P)(t ⌧)]v(⌧) d⌧.x0(t) = (1 ↵)v(t) (I ↵P)x(t)Z t0exp[ (I ↵P)(t ⌧)]v(⌧) d⌧= (I ↵P) 1v exp[ (I ↵P)t](I ↵P) 1vx(t) = exp[ (I ↵P)t](x(0) x) + xForgeneralv(t)Forstaticv(t) = v The original "PageRank vector17David Gleich · Purdue ANL Seminar
  18. 18. Thus we recover "the original PageRank vector "if interest stops changing.18David Gleich · Purdue ANL Seminar
  19. 19. 0 5 10 15 200.10.20.30.40.5timeDynamicPageRankPage 1Page 2Page 3Page 4Cyclical behavior in the time-dependent PageRank scores12340 20 40 60 8000.050.10.150.2timeTime−dependentteleportationPage 1Page 2Page 3Page 419David Gleich · Purdue ANL Seminar
  20. 20. Modeling cyclical behaviorCyclically switch between teleportation vectors vj v(t) =1kkXj=1vj⇣cos(t + (j 1)2⇡k ) + 1⌘0 20 40 60 8000.050.10.150.2timeTime−dependentteleportationPage 1Page 2Page 3Page 4v1 v2 v1 v220David Gleich · Purdue ANL Seminar
  21. 21. Modeling cyclical behaviorCyclically switch between teleportation vectors vj v(t) =1kkXj=1vj⇣cos(t + (j 1)2⇡k ) + 1⌘x(t) = x + Re {s exp(ıt)}Then the eventual solution is (I ↵P)x = (1 ↵)1kVe(I ↵1+ı P)s= (1 ↵) 1k(1+ı) V exp(ıf)PageRank vector with average teleportationPageRank withcomplex teleportation21David Gleich · Purdue ANL Seminar
  22. 22. Thus we can determine "the size of the oscillation "for the case of cyclicalteleportation22David Gleich · Purdue ANL Seminar
  23. 23. Is it useful? Let’s try andpredict retweets on Twitter We crawled Twitter and gathered "a graph of who follows who and "how active each user is in a month This yields a graph and 6 vectors v!!Our goal is to predict how many tweets you’llsend next month based on the current month!23David Gleich · Purdue ANL Seminar
  24. 24. First, how do we model time?v1, ... , vk ! V =⇥v1, ... , vk⇤v(t) = Ve(floor {t} + 1) = vfloor{t}+1 t=1 is one monthvs(t) = Ve(floor {t/s} + 1) = vfloor{t/s}+1Rescaling timet=s is one monthx(sj), j = 0, 1, ... These are the same time pointss=∞ yields a recomputed PageRank at each step!24David Gleich · Purdue ANL Seminar
  25. 25. The effect of s on PageRankof one node is considerables = 1 s = 2 s = 6(a) timescale ss = 1 s = 2 s = 6Time PageRankx1(t)gray involves just recomputing PageRank at each changeData from Wikipedia25David Gleich · Purdue ANL Seminar
  26. 26. Second, can we make it smooth?v1, ... , vk ! V =⇥v1, ... , vk⇤v(t) = Ve(floor {t} + 1) = vfloor{t}+1 t=1 is one month¯v(t; ✓) = v(t)| {z }new data+ (1 )¯v(t h; ✓)| {z }old data,¯v0(t; ✓) = ✓v(t) ✓¯v(t; ✓) Full ODEForward Euler "interpretation26David Gleich · Purdue ANL Seminar
  27. 27. θ = 0.1 θ = 1 θ = 10(b) smoothing ✓The effect of theta on PageRankof one node is moderateTime PageRankx1(t)Only matters if there is a big jumpData from Wikipedia= 6 θ = 0.1 θ = 1 θ = 10(b) smoothing ✓27David Gleich · Purdue ANL Seminar
  28. 28. Parameters of the predictionalpha – PageRank modeling parameters s – time-scaletheta - smoothing28David Gleich · Purdue ANL Seminar
  29. 29. The prediction model⇥¯f(t 1) ¯f(t 2) ... ¯f(t w)⇤b ⇡ p(t)sMAPE =1|T||T|Xt=1|pt ˆpt |(pt + ˆpt )/2averaged over nodesLinear, one-step ahead predictionis evaluated using 29David Gleich · Purdue ANL Seminar
  30. 30. The resultsDataset Type ✓ Error Ratios (timescale)1 2 6 1TWITTER stationary 0.01 0.635 0.929 0.913 0.9960.50 0.636 0.735 0.854 0.9391.00 0.522 0.562 0.710 0.963non-stationary 0.01 0.461 0.841 1.001 0.9920.50 0.261 0.608 0.585 0.9291.00 0.137 0.605 0.617 0.918Err Ratio = SMAPE of tweets + Time-dependent PR / SMAPE of tweets onlyIf this ratio < 1, then using Time-dependent PR helpsStationary nodes are those with small maximum change in scoresNon-stationary nodes are those with large maximum change in scores30David Gleich · Purdue ANL Seminar
  31. 31. We tried the same experiment with Wikipedia, "but there was no meaningful change in the prediction error.31David Gleich · Purdue ANL Seminar
  32. 32. Using Granger Causality to study linkrelationships on Wikipedia51 Greygoo 52 pageprotec 53 R61 Science 62 Gackt 63 T71 Madonna(en 72 Richtermag 73 T81 Livingpeop 82 Mathematic 83 S91 Categories 92 Germany 93 Mogy 20 Geographyatic 30 Biographyen(f 40 Earthquakeio 50 Raceandeth60 Football(sEarthquake Richter Mag.Causes?Of course! We build this into the model.32David Gleich · Purdue ANL Seminar
  33. 33. But, the question is, which ofthese are preserved afterincorporating the effects ofpage view data?33David Gleich · Purdue ANL Seminar
  34. 34. Using Granger Causality to find theimportant links on WikipediaEarthquake Granger causes p-valueSeismic hazard 0.003535Extensional tectonics 0.003033Landslide dam 0.002406Earthquake preparedness 0.001157Richter magnitude scale 0.000584Fault (geology) 0.000437Aseismic creep 0.000419Seismometer 0.000284Epicenter 0.000020Seismology 0.00000134David Gleich · Purdue ANL Seminar
  35. 35. Thus, these links “fit” ourmodel, whereas the other linkson the page do not.35David Gleich · Purdue ANL Seminar
  36. 36. Application to the power gridPrior work •  Kim, Obah, 2007; Jin et al., 2010; Adolf et al., 2011; Halappanavar etal., 2012has found that graph properties have importantcorrelations with power-grid vulnerabilities andcontingency analysis36David Gleich · Purdue ANL Seminar
  37. 37. Each edge has a powerflow that satisfies somenon-linear power flowequation.We use average dailyflows to study time-dependent PageRankon the line graph of theunderlying network.Lines with high variancemay be problematic?37David Gleich · Purdue ANL Seminar
  38. 38. My questionsSample data to test this idea?Too simplistic?Time-dependent betweenness centralitywith cyclical teleportation?Other power-grid problems where similar ideasmay be able to help?38David Gleich · Purdue ANL Seminar
  39. 39. A dynamical systemfor PageRank withtime-dependentteleportationDavid F. Gleich!Computer Science"Purdue UniversityPaper http://arxiv.org/abs/1211.4266Code https://www.cs.purdue.edu/homes/dgleich/codes/dynsyspr-imRyan A. Rossi!Computer Science"Purdue University39David Gleich · Purdue ANL Seminar
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×