Upcoming SlideShare
×

# PageRank Centrality of dynamic graph structures

867
-1

Published on

A talk I gave at the SIAM Annual Meeting Mini-symposium on the mathematics of the power grid organized by Mahantesh Halappanavar. I discuss a few ideas on how our dynamic centrality could help analyze such situations.

Published in: Technology, Education
2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
867
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
16
0
Likes
2
Embeds 0
No embeds

No notes for slide

### PageRank Centrality of dynamic graph structures

1. 1. (PageRank) Centrality of dynamic graph structures David F. Gleich! Computer Science" Purdue University 1 David Gleich · Purdue AN14 · MS59
2. 2. Models and algorithms for high performance ! matrix and network computations AN14 · MS59 David Gleich · Purdue 2 1 error 1 std 0 2 (b) Std, s = 0.39 cm 10 error 0 0 10 std 0 20 (d) Std, s = 1.95 cm model compared to the prediction standard de- bble locations at the ﬁnal time for two values of = 1.95 cm. (Colors are visible in the electronic approximately twenty minutes to construct using s. ta involved a few pre- and post-processing steps: m Aria, globally transpose the data, compute the nd errors. The preprocessing steps took approx- recise timing information, but we do not report Tensor eigenvalues" and a power method FIGURE 6 – Previous work from the PI tackled net- work alignment with ma- trix methods for edge overlap: i j j0 i0 OverlapOverlap A L B This proposal is for match- ing triangles using tensor methods: j i k j0 i0 k0 TriangleTriangle A L B t r o s. g n. o n s s- g maximize P ijk Tijk xi xj xk subject to kxk2 = 1 where ! ensures the 2-norm [x(next) ]i = ⇢ · ( X jk Tijk xj xk + xi ) SSHOPM method due to " Kolda and Mayo Big data methods SIMAX ‘09, SISC ‘11,MapReduce ‘11, ICASSP ’12 Network alignment ICDM ‘09, SC ‘11, TKDE ‘13 Fast & Scalable" Network centrality SC ‘05, WAW ‘07, SISC ‘10, WWW ’10, … Data clustering WSDM ‘12, KDD ‘12, CIKM ’13 … Ax = b min kAx bk Ax = x Massive matrix " computations on multi-threaded and distributed architectures
3. 3. I hope to add power-grid networks soon! AN14 · MS59 David Gleich · Purdue 3
4. 4. Centrality measures “relative importance in a network” –Wikipedia “it’s a guess about what might be important” -Me They tell us something about a network considering it’s topology. They need to be deployed with extreme care! AN14 · MS59 David Gleich · Purdue 4 From Wikipedia
5. 5. Centrality measures of dynamic graphs Something about my network is changing, what should I do? 1.  Recompute at each change 2.  Batch up changes, and periodically recompute 3.  Efﬁciently update (i.e. recompute smartly!) 4.  Approximately update/compute 5.  Do something else. AN14 · MS59 David Gleich · Purdue 5
6. 6. What else to do??? “If the optimization is hard, you should be solving a different optimization problem” " –Cris Moore 1.  Des Higham et al. " Adopt the fundamentals to discrete time 2.  Use dynamical system generalizations, Gleich and Rossi 2012/2014; and " Des Higham et al. 2014 3.  Likely more too… AN14 · MS59 David Gleich · Purdue 6
7. 7. Smart centrality for the " smart grid? You need to adapt your centrality measure for your application! (Or try to get lucky!) AN14 · MS59 David Gleich · Purdue 7
8. 8. Application to the power grid Prior work •  Kim, Obah, 2007; Jin et al., 2010; Adolf et al., 2011; Halappanavar et al., 2012 has found that graph properties have important correlations with power-grid vulnerabilities and contingency analysis 8 David Gleich · Purdue AN14 · MS59
9. 9. 1.  Perspectives on PageRank 2.  PageRank as a dynamical system and time-dependent teleportation 3.  Predicting using PageRank 4.  Applications to the power-grid? 9 David Gleich · Purdue AN14 · MS59
10. 10. The random surfer model! At a node … 1.  follow edges with prob α 2.  do something else with prob (1-α) Google’s PageRank is one possible answer PageRank by Google 1 2 3 4 5 6 The Model 1. follow edges uniformly with probability , and 2. randomly jump with probability 1 , we’ll assume everywhere is equally likely The places we ﬁnd the surfer most often are im- portant pages. The important pages are the places we are most likely to ﬁnd the random surfer 10 David Gleich · Purdue AN14 · MS59
11. 11. My preferred version " of PageRank A PageRank vector x is the solution of the linear system: (I – αP) x = (1 – α) v where P is a column stochastic matrix, 0 ≤ α < 1, and v is a probability vector. tails ! 2 6 6 4 1/6 1/2 0 0 0 0 1/6 0 0 1/3 0 0 1/6 1/2 0 1/3 0 0 1/6 0 1/2 0 0 0 1/6 0 1/2 1/3 0 1 1/6 0 0 0 1 0 3 7 7 5 | {z } P P j 0 eT P=eT Just three ingredients! vi 0, eT v = 1 ↵ usually 0.5 to 0.99 11 David Gleich · Purdue AN14 · MS59
12. 12. This deﬁnition applies to a remarkable variety of problems 1.  GeneRank 2.  ProteinRank 3.  FoodRank 4.  SportsRank 5.  HostRank 6.  TrustRank 7.  BadRank 8.  ObjectRank 9.  ItemRank 10.  ArticleRank 11.  BookRank 12.  FutureRank 13.  TimedPageRank 14.  SocialPageRank 15.  DiffusionRank 16.  ImpressionRank 17.  TweetRank 18.  TwitterRank 19.  ReversePageRank 20.  PageTrust 21.  PopRank 22.  CiteRank 23.  FactRank 24.  InvestorRank 25.  ImageRank 26.  VisualRank 27.  QueryRank 28.  BookmarkRank 29.  StoryRank 30.  PerturbationRank 31.  ChemicalRank 32.  RoadRank 33.  PaperRank 34.  Etc… 12 David Gleich · Purdue AN14 · MS59
13. 13. The teleportation distribution v models where surfers “restart” What if this changes with time? 13 David Gleich · Purdue AN14 · MS59
14. 14. Let’s look at how PageRank evolves with iterations x(k) = x(k+1) x(k) = ↵Px(k) + (1 ↵)v x(k) = (1 ↵)v (I ↵P)x(k) x0 (t) = (1 ↵)v (I ↵P)x(t) PageRank is the steady-state solution of the ODE 14 David Gleich · Purdue AN14 · MS59
15. 15. A dynamical system for " time-dependent teleportation + Easy to integrate + Easy to understand + Possible to treat analytically! – Need to “model time” (not dimensionless) – Still useful to have a data assimilation model x0 (t) = (1 ↵)v(t) (I ↵P)x(t) 15 David Gleich · Purdue AN14 · MS59
16. 16. Need a symplectic integrator (or self-correcting…) We use a standard RK integrator " (ode45 in Matlab) We used the formulation to maintain x(t) as a probability distribution x0 (t) = (1 ↵)v(t) ( I ↵P)x(t) = (1 ↵)eT v(t) + ↵eT x(t) 16 David Gleich · Purdue AN14 · MS59
17. 17. Where is this model realistic? On Wikipedia, we have hourly visit data that provides a coarse measure of outside interest 17 David Gleich · Purdue AN14 · MS59
18. 18. Now PageRank values are time-series, not static scores 1 MainPage 2 FrancisMag 3 11 501(c) 12 Searching 1 Earthquake Australian Earthquake occurs! Main page Time Time Importance 18 David Gleich · Purdue AN14 · MS59
19. 19. Some quick theory x(t) = exp[ (I ↵P)t]x(0) + (1 ↵) Z t 0 exp[ (I ↵P)(t ⌧)]v(⌧) d⌧. x0 (t) = (1 ↵)v(t) (I ↵P)x(t) Z t 0 exp[ (I ↵P)(t ⌧)]v(⌧) d⌧ = (I ↵P) 1 v exp[ (I ↵P)t](I ↵P) 1 v x(t) = exp[ (I ↵P)t](x(0) x) + x For general v(t) For static v(t) = v The original " PageRank vector 19 David Gleich · Purdue AN14 · MS59
20. 20. Thus we recover " the original PageRank vector " if interest stops changing. 20 David Gleich · Purdue AN14 · MS59
21. 21. Modeling cyclical behavior Cyclically switch between teleportation vectors vj v(t) = 1 k kX j=1 vj ⇣ cos(t + (j 1)2⇡ k ) + 1 ⌘ 0 20 40 60 80 0 0.05 0.1 0.15 0.2 time Time−dependentteleportation Page 1 Page 2 Page 3 Page 4 v1 v2 v1 v2 21 David Gleich · Purdue AN14 · MS59
22. 22. 0 5 10 15 20 0.1 0.2 0.3 0.4 0.5 time DynamicPageRank Page 1 Page 2 Page 3 Page 4 Cyclical behavior in the time- dependent PageRank scores 1 2 3 4 0 20 40 60 80 0 0.05 0.1 0.15 0.2 time Time−dependentteleportation Page 1 Page 2 Page 3 Page 4 22 David Gleich · Purdue AN14 · MS59
23. 23. Modeling cyclical behavior Cyclically switch between teleportation vectors vj v(t) = 1 k kX j=1 vj ⇣ cos(t + (j 1)2⇡ k ) + 1 ⌘ x(t) = x + Re {s exp(ıt)} Then the eventual solution is (I ↵P)x = (1 ↵) 1 k Ve (I ↵ 1+ı P)s = (1 ↵) 1 k(1+ı) V exp(ıf) PageRank vector with average teleportation PageRank with complex teleportation 23 David Gleich · Purdue AN14 · MS59
24. 24. Summary If you have cyclical interest on a node, we have a NEW centrality measure that provides the magnitude of the oscillation based on PageRank with complex valued “teleportation.” AN14 · MS59 David Gleich · Purdue 24
25. 25. Thus we can determine " the size of the oscillation " for the case of cyclical teleportation 25 David Gleich · Purdue AN14 · MS59
26. 26. Is it useful? Let’s try and predict retweets on Twitter We crawled Twitter and gathered " a graph of who follows who and " how active each user is in a month This yields a graph and 6 vectors v! ! Our goal is to predict how many tweets you’ll send next month based on the current month! 26 David Gleich · Purdue AN14 · MS59
27. 27. … and then there are details I can go into … AN14 · MS59 David Gleich · Purdue 27
28. 28. The results Dataset Type ✓ Error Ratio s (timescale) 1 2 6 1 TWITTER stationary 0.01 0.635 0.929 0.913 0.996 0.50 0.636 0.735 0.854 0.939 1.00 0.522 0.562 0.710 0.963 non-stationary 0.01 0.461 0.841 1.001 0.992 0.50 0.261 0.608 0.585 0.929 1.00 0.137 0.605 0.617 0.918 Err Ratio = SMAPE of tweets + Time-dependent PR / SMAPE of tweets only If this ratio < 1, then using Time-dependent PR helps Stationary nodes are those with small maximum change in scores Non-stationary nodes are those with large maximum change in scores 28 David Gleich · Purdue AN14 · MS59
29. 29. Using Granger Causality to study link relationships on Wikipedia 51 Greygoo 52 pageprotec 53 R 61 Science 62 Gackt 63 T 71 Madonna(en 72 Richtermag 73 T 81 Livingpeop 82 Mathematic 83 S 91 Categories 92 Germany 93 M ogy 20 Geography atic 30 Biography en(f 40 Earthquake io 50 Raceandeth 60 Football(s Earthquake Richter Mag. Causes? Of course! We build this into the model. But, the question is, which of these are preserved after incorporating the effects of page view data? 29 David Gleich · Purdue AN14 · MS59
30. 30. To the power grid … Line failures in the grid can be anticipated via linearized DC dynamics Hines el al.? AN14 · MS59 David Gleich · Purdue 30 c = diag(B (L)+ BT )
31. 31. The PageRank problem & " the Laplacian Combinatorial " Laplacian AN14 · MS59 David Gleich · Purdue 31 1. (I ↵AD 1 )x = (1 ↵)v; 2. (I ↵A)y = (1 ↵)D 1/2 v, where A = D 1/2 AD 1/2 and x = D1/2 y; and 3. [ D + L]z = v where ↵ = 1/(1 + ) and x = Dz. Let x(↵) solve PageRank and let vT e = 0. Then lim↵!1 x(↵) ! SL+ v where S is a scaling matrix.
32. 32. Some potential applications 1.  PageRank can be thought of as a type of regularization; often helps improve on simple centrality baselines 2.  Limits of PageRank interpolate between centrality and spectral clustering [Mahoney, Orecchia, and Vishnoi] 3.  Time dependent teleportation models; adaptations to node dropouts possible. 4.  Use PageRank on the line graph? AN14 · MS59 David Gleich · Purdue 32
33. 33. Results on the power grid … pending … AN14 · MS59 David Gleich · Purdue 33
34. 34. Questions, Conclusions, and References! Questions! How to validate some of these ideas? Too simplistic? Other power-grid problems where similar ideas may be able to help? Collaborators????? 34 David Gleich · Purdue AN14 · MS59 Dear David, Please remember to repeat the question! Paper Gleich & Rossi, Internet Mathematics, 2014 Code https://www.cs.purdue.edu/homes/dgleich/codes/dynsyspr-im Conclusions! Centrality is more complicated than just one method. It’s possible to tune centrality measures to different structures and this makes it a ﬂexible setup."
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.