Successfully reported this slideshow.
Upcoming SlideShare
×

# Dynamic PageRank using Evolving Teleportation

11,973 views

Published on

WAW12

Published in: Technology, Education
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Dynamic PageRank using Evolving Teleportation

1. 1. ⋯ ⋯ ⋯ Time Dynamic PageRank using Evolving Teleportation Ryan A. Rossi David F. Gleich Tunisia Egypt Libya Tunisia Egypt Libya Tunisia Egypt Libya Tunisia Egypt Libya
2. 2. Problem: Importance of nodes is NOT static (static PageRank) Evolving in reality!Ryan Rossi (Purdue) Dynamic PageRank
3. 3. Problem: Importance of nodes is NOT static Formulate PageRank as Dynamical System! Evolving in reality! Importance of 100 nodes changing over time Dynamic Generalization of PageRank Helps in prediction!Ryan Rossi (Purdue) Dynamic PageRank
4. 4. Detecting dynamic anomaliesDynamic Ranks Australian Spike! Earthquake occurs! Earthquake Prediction observed future values values TV shows/ “American Idol” Earthquake Modeling Causes? 1 TimAllen 2 TheOffice( 3 DrivingMis human dynamics 4 JoannaPaci 5 AmericanId 6 BloodDrive 11 KrisAllen 12 KatharineM 13 AmericanId 14 DavidFoste 15 ListofTheO 16 TheOffice Clustering nodes with similar 21 TheOffice( 22 TheLastHou 23 AmericanId 24 TheLastHou 25 JasonKay 26 CamillaBel time-series patterns Richter Mag. 31 AsherRoth 32 DwightSchr 33 B.J.Novak 34 PromNight( 35 JennaFisch 36 RashidaJonRyan Rossi (Purdue) Dynamic PageRank
5. 5. 1 2 Static PageRank Model. At a node, a random surfer can: 3 4 1. follow edges uniformly with probability α, and 5 2. randomly jump with probability 1 − α (for now, assume vi = 1/n) The nodes that are visited most often are important! Induces a Markov chain model (random walk) Or the linear systemwhereRyan Rossi (Purdue) Dynamic PageRank
6. 6. 1 2 Static PageRank Model. At a node, a random surfer can: 3 4 1. follow edges uniformly with probability α, and 5 2. randomly jump with probability 1 − α (for now, assume vi = 1/n) Too simplistic! that important! most The nodes often are are visited Graph & attributes evolve! Importance continuously changes! Induces a Markov chain model (random walk) Or the linear systemwhereRyan Rossi (Purdue) Dynamic PageRank
7. 7. Majority of work focuses on static networks! Combine PageRank with crawling process S. Abiteboul, M. Preda, & G. Cobena: Adaptive on-line page importance computation Walks on dynamic graphs P. Grindrod, D. Higham, M. Parsons, & E. Estrada: Communicability Across Evolving Networks Other work: J. O’Madadhain & P. Smyth, EventRank: A framework for ranking time-varying networksRyan Rossi (Purdue) Dynamic PageRank
8. 8. All of these techniques are not placed in the context of a dynamical system We want to gain additional flexibility by adapting these problems as continuous dynamical systemsRyan Rossi (Purdue) Dynamic PageRank
9. 9. Evolving teleportation 96 (e.g. pageviews) 105 281 42 11 27 ⋯ time Importance continuously changes as the external influence evolves! Dynamic PageRank ⋯ ⋯Ryan Rossi (Purdue) Dynamic PageRank
10. 10. Evolving teleportation 96 113 (e.g. pageviews) 105 139 281 397 42 64 11 16 27 21 ⋯ time time Importance continuously changes as the external influence evolves! Dynamic PageRank ⋯ ⋯Ryan Rossi (Purdue) Dynamic PageRank
11. 11. 96 113 103 105 139 125 281 397 331 42 64 53 11 16 12 27 21 39 ⋯ ⋯ ⋯ time Importance continuously changes as the external influence evolves! Dynamic PageRank ⋯ ⋯Ryan Rossi (Purdue) Dynamic PageRank
12. 12. Changes in PageRank values evolve Dynamical System Dynamic TeleportationRyan Rossi (Purdue) Dynamic PageRank
13. 13. Dynamic Teleportation Model Generalization of static PageRank. If v(t) = v stops changing, then we recover the original PageRank vector x as the steady-state solution:Ryan Rossi (Purdue) Dynamic PageRank
14. 14.  A principled dynamical system framework for studying these problems  Flexibility to choose our algorithm to solve it  Determines the effective length scale  Seamlessly generalizes PageRank for dynamics  We can easily and naturally incorporate the complete set of dynamic componentsRyan Rossi (Purdue) Dynamic PageRank
15. 15. Evolve the dynamical system, Select any standard method! forward Euler Family of Runge-Kutta … methods Many others! Classical methods Adaptive methods RK2,…,RK4,…Ryan Rossi (Purdue) Dynamic PageRank
16. 16. Evolve the dynamical system, Forward EulerRyan Rossi (Purdue) Dynamic PageRank
17. 17. How we map updates to v into the dynamical system time determines the effective length-scale that we are looking at time-scale of dynamical system Relationship? time-scale of application x(1)?  1 sec, 1 min,...?Ryan Rossi (Purdue) Dynamic PageRank
18. 18. How we map updates to v into the dynamical system time determines the effective length-scale that we are looking at Equivalent to running the time-scale of dynamical system Relationship? power-method until time-scale of application convergence each hour! x(1)?  1 sec, 1 min,...? time in application h=1 t=1 60 iterations time-scale = 1 (1 min) (application) between each hour h=1 t=1 3 iterations after time-scale = 1 (20 min) (application) each hourly changeRyan Rossi (Purdue) Dynamic PageRank
19. 19. v(t) changes at fixed intervals Better idea might be to smooth out these “jumps”! Feature of the new model! 0.2 0.18 Utilize this informationh=1 0.16 from the evolution 1 (12 min) t= time-scale = 1 hourConvergence Measure 0.14 (application) 0.12 0.1 0.08 0.06 0.04 5 iterations after 0.02 each hourly change 0 0 5 10 15 20 25 30 35 40 45 50 IterationRyan Rossi (Purdue) Dynamic PageRank
20. 20.  Transient — Instantaneous values of  Summary & Cumulative — Any summary function s(⋅) of the time-series: integral, min, max, variance  Difference Rank Among many others...Ryan Rossi (Purdue) Dynamic PageRank
21. 21.  Wikipedia — Hyperlink graph — Hourly pageviews  Twitter — Who-follows-whom — Tweet rates (monthly) Dataset Nodes Edges tmax Period Average pi Max pi Wikipedia 4,143,840 72,718,664 20 hours 1.3225 334,650 Twitter 465,022 835,424 6 months 0.5569 1056Ryan Rossi (Purdue) Dynamic PageRank
22. 22. Nope, pageviews and degree uncorrelated! 8 correlation=0.02 7 High degree,In Degree (Log) 6 Low pageviews 5 4 3 2 High pageviews, 1 Low degree 0 0 1 2 3 4 5 6 7 8 9 Total Pageviews (Log)Ryan Rossi (Purdue) Dynamic PageRank
23. 23. Main Finding: Combing the external influence with the graph, produces something new, that is not captured by the other methodsRyan Rossi (Purdue) Dynamic PageRank
24. 24. Learn model as (Exponential moving avg)Predicts p(t+1) asEvaluate models (total errors) asRyan Rossi (Purdue) Dynamic PageRank
25. 25. Base Model. Only pageviews (or tweet-rates) Dynamic PageRank. Pageviews and Dynamic PageRank time-series Dataset Forecasting Dynamic PageRank Base Model Non-stationary 0.4349 0.5028 Wikipedia Stationary 0.3672 0.4373 Non-stationary 0.4852 1.2333 Twitter Stationary 0.6690 0.9180 Main Finding. Dynamic PageRank time-series provides valuable information for forecasting future pageviews (or tweet-rates)Ryan Rossi (Purdue) Dynamic PageRank
26. 26. Many applications such as Base Model. Only pageviews (or tweet-rates) systems • Actively adapting caches in large DB Dynamic PageRank. Pageviews and Dynamic PageRank time-series • Dynamically recommending pages Dataset Forecasting Dynamic PageRank Base Model Non-stationary 0.4349 0.5028 Wikipedia Stationary 0.3672 0.4373 Non-stationary 0.4852 1.2333 Twitter Stationary 0.6690 0.9180Ryan Rossi (Purdue) Dynamic PageRank
27. 27. Top 100 pages that fluctuate the most! Dynamic PageRank identifies interesting pages that pertain to recent external interest.Ryan Rossi (Purdue) Dynamic PageRank
28. 28. Top 100 pages that fluctuate the most! Pages related to a recent Australian earthquake!Ryan Rossi (Purdue) Dynamic PageRank
29. 29. Top 100 pages that fluctuate the most! Just released movie “Watchmen”Ryan Rossi (Purdue) Dynamic PageRank
30. 30. Top 100 pages that fluctuate the most! Famous co- host/musician that diedRyan Rossi (Purdue) Dynamic PageRank
31. 31. Top 100 pages that fluctuate the most! Recent “American Idol” gossipRyan Rossi (Purdue) Dynamic PageRank
32. 32. Top 100 pages that fluctuate the most! A remembrance of Eve Carson from a contestant on “American Idol” Recent “American Idol” gossipRyan Rossi (Purdue) Dynamic PageRank
33. 33. Top 100 pages that fluctuate the most! Main Finding. These examples reveal the ability of our Dynamic PageRank to mesh the network structure with changes in external interest!Ryan Rossi (Purdue) Dynamic PageRank
34. 34.  Clustering PageRank trends  Granger Causality  Better algorithms (RK4,…)  Put more theoretical teeth behind these resultsRyan Rossi (Purdue) Dynamic PageRank
35. 35. 0.25 Well-separated and unique! Temporal Pattern1 Temporal Pattern2Normalized Dynamic PageRank Temporal Pattern3 Temporal Pattern4 0.2 Temporal Pattern5 Centroids! 0.15 Most nodes stationary! 0.1 0.05 0 0 2 4 6 8 10 12 14 16 18 20 Time Ryan Rossi (Purdue) Dynamic PageRank
36. 36. Non-stationary nodes (and clusters) Potential Anomalies: Large-scale disasters, breaking news 0.25 Temporal Pattern1 Temporal Pattern2Normalized Dynamic PageRank Temporal Pattern3 Temporal Pattern4 0.2 Temporal Pattern5 Centroids! 0.15 Most nodes stationary! 0.1 0.05 0 0 2 4 6 8 10 12 14 16 18 20 Time Ryan Rossi (Purdue) Dynamic PageRank
37. 37. 1 TimAllen 2 TheOffice( 3 DrivingMis 4 Jo 11 KrisAllen 12 KatharineM 13 AmericanId 14 D Allows us identify nodes that become 21 TheOffice( 22 TheLastHou 23 AmericanId 24 T important around similar times (nodes 31 AsherRoth 32 DwightSchr 33 B.J.Novak 34 P w/ similar trends of importance may be 41 TheOffice( 42 SeanHannit 43 Drake(ente 44 P related) 51 SaraPaxton 52 BobbyBrown 53 Sting 54 61 CelticWoma 62 PaulWalker 63 TheHauntin 64 0.25 Temporal Pattern1 71 TracyMorga 72 YouSpinMeR 73 AnnCoulter 74 Temporal Pattern2Normalized Dynamic PageRank Temporal Pattern3 Temporal Pattern4 81 JoBethWill 82 AHaunting 83 Octopussy 84 0.2 Temporal Pattern5 91 MarcoPierr 92 Rebirth(Li 93 LietoMe(TV 94 T Centroids! 0.15 1 Chile 2 WorldWarII 3 Iraq 4 An 11 Jew 12 Brazil 13 Frenchlang 14 S 0.1 21 Caribbean 22 Judaism 23 RomanCatho 2 31 Rome 32 NaziGerman 33 2007 3 0.05 41 2005 42 Christiani 43 Christian 4 0 51 2004 52 Gold 53 2008 54 0 2 4 6 8 10 12 14 16 18 20 Time 61 God 62 Wiktionary 63 Mammal 64 Ryan Rossi (Purdue) Dynamic PageRank 71 LatinAmeri 72 Disappeare 73 Yearofbirt 74 Y
38. 38. Question: Does an earthquake at time t cause people to visit Richter magnitude page at t+1? Causes? Earthquake Richter Mag. Statement on Granger Causality (Stronger version) 1. cause must occur before the effect 2. cause contains information about the effect 3. cause and effect must be linked in the graphRyan Rossi (Purdue) Dynamic PageRank
39. 39. Multivariate regression lag vector of errors vector of response variables regression coefficients to estimate Granger Causality exists if the error by using the time-series x in the forecast model is smaller than without considering x: Significance of the difference in error is measured using the F-testRyan Rossi (Purdue) Dynamic PageRank
40. 40. 0.000406*** Significant! Earthquake Richter Mag. Caused by Earthquake in Australia p-value Earthquake preparedness 0.000607*** Aftershock 0.009619** Asperity 0.001601** Stick-slip phenomenon 0.002312** Landslide dam 0.004820** pval < 0.5 (*), 0.01 (**), 0.001 (***)Ryan Rossi (Purdue) Dynamic PageRank
41. 41. 0.000406*** Significant! Main Finding. Allows us to identify the Earthquake Richter Mag. pages that influence the others with regards to how users find information Caused by Earthquake in Australia p-value Earthquake preparedness 0.000607*** Aftershock 0.009619** Asperity 0.001601** Stick-slip phenomenon 0.002312** Landslide dam 0.004820** pval < 0.5 (*), 0.01 (**), 0.001 (***)Ryan Rossi (Purdue) Dynamic PageRank
42. 42.  Introduced dynamical system framework for PageRank  Stated a dynamic Generalization of PageRank  Dynamic PageRank can help in prediction  Useful for many other applicationsRyan Rossi (Purdue) Dynamic PageRank
43. 43. Thanks! Questions? rrossi@purdue.edu http://www.cs.purdue.edu/homes/rrossiRyan Rossi (Purdue) Dynamic PageRank
44. 44. Ryan Rossi (Purdue) Dynamic PageRank
45. 45. Hourly Pageviews Earthquake Preparedness Earthquake 132 172 time Richter 35 31 Mag. Charles RichterRyan Rossi (Purdue) Dynamic PageRank
46. 46. Earthquake Preparedness Earthquake 132 172 764 Spike in the number of pageviews for that given hour! time Richter 35 31 56 Mag. Charles RichterRyan Rossi (Purdue) Dynamic PageRank
47. 47. ΔPR importance substantially increases! Earthquake Preparedness Earthquake 132 172 764 Spike in the number of pageviews for that given hour! time Richter 35 31 56 Mag. Charles RichterRyan Rossi (Purdue) Dynamic PageRank
48. 48. ΔPR importance substantially increases! Earthquake Preparedness Earthquake 132 172 764After a few iterations,importance diffuses Spike in the number of pageviewsfrom Earthquake to for that given hour!Richter Mag!Direct result of meshing timegraph with pageviews! Richter 35 31 56 Mag. Charles RichterRyan Rossi (Purdue) Dynamic PageRank
49. 49. ΔPR importance substantially increases! Earthquake Preparedness Earthquake 132 172 764After a few iterations,importance diffuses Spike in the number of pageviewsfrom Earthquake to for that given hour!Richter Mag!Direct result of meshing timegraph with pageviews! Richter 35 31 56 becomes important Mag. at this time Hence, Richter magnitude receives a high dynamic PageRank score, becoming increasingly important at this Charles time, while its pageviews are not significantly increasing. RichterRyan Rossi (Purdue) Dynamic PageRank
50. 50. Earthquake Preparedness Earthquake 132 172 764 3406 time Richter 35 31 56 1447 Mag. In the next hour, we find that Charles the pageviews of Richter spike! Richter Reinforcing the importance!Ryan Rossi (Purdue) Dynamic PageRank
51. 51. Earthquake Preparedness Earthquake 132 172 764 3406 Dynamic PageRank is predictive (by definition)! Importance of Richter magnitude captured by dynamic PageRank an hour earlier than when it time actually became important (spike in pageviews) Richter 35 31 56 1447 Mag. In the next hour, we find that Charles the pageviews of Richter spike! Richter Reinforcing the importance!Ryan Rossi (Purdue) Dynamic PageRank
52. 52.  Real-world networks are naturally dynamic — Information Networks (e.g., Wikipedia: article-links-article) — Social Networks (e.g., Twitter: who-follows-whom) — Biological Networks … ⇒ Importance changes! Static methods fail to capture the temporal flow of information Lead to misleading or simply incorrect conclusionsRyan Rossi (Purdue) Dynamic PageRank
53. 53. Graph dynamic networks ⋯ ⋯ ⋯ timeRyan Rossi (Purdue) Dynamic PageRank
54. 54. Graph dynamic networks Attributes ✓ External Influence (e.g., pageviews) 96 113 103 139 125 281 397 331 42 64 53 11 16 12 27 21 39 ⋯ ⋯ ⋯ timeRyan Rossi (Purdue) Dynamic PageRank