Prediction in dynamic Graphs
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Prediction in dynamic Graphs

  • 2,649 views
Uploaded on

Emile Richard, 1000mercis ENS Cachan

Emile Richard, 1000mercis ENS Cachan

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
2,649
On Slideshare
1,844
From Embeds
805
Number of Embeds
18

Actions

Shares
Downloads
28
Comments
0
Likes
0

Embeds 805

http://previsions.blogspot.fr 474
http://previsions.blogspot.com 274
http://previsions.blogspot.de 14
http://previsions.blogspot.ch 8
http://previsions.blogspot.be 5
http://www.previsions.blogspot.fr 5
http://www.previsions.blogspot.com 4
http://previsions.blogspot.in 4
http://previsions.blogspot.it 3
http://www.directrss.co.il 3
http://previsions.blogspot.com.au 3
http://previsions.blogspot.co.uk 2
http://translate.googleusercontent.com 1
http://previsions.blogspot.co.at 1
http://previsions.blogspot.se 1
http://previsions.blogspot.pt 1
http://www.previsions.blogspot.co.uk 1
http://previsions.blogspot.com.es 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Prediction in Dynamic Graph Sequences Prediction in Dynamic Graph Sequences Emile Richard CMLA-ENS Cachan & 1000mercis Supervisors : Th. Evgeniou (INSEAD) and N. Vayatis (CMLA-ENS Cachan) January 20, 2012
  • 2. Prediction in Dynamic Graph SequencesTable of contents Context Motivation Data Description Problem Formulation Random Graph Models Link Prediction Heuristics Framework Algorithms Two-stage optimization Joint Optimization in W and S Variants Discussion References
  • 3. Prediction in Dynamic Graph Sequences Context
  • 4. Prediction in Dynamic Graph Sequences Context MotivationFrom Big Data to Business Decisions 1000mercis: interactive marketing and advertisement (emailing, mobile, viral games) 1. Send less ads: email is free → overwhelm consumers 2. Make consumers happy: serendipity 3. Act sustainably: avoid long-term fatigue 4. Earn more: up to 5 times!
  • 5. Prediction in Dynamic Graph Sequences Context MotivationPrediction in Relational Databases? Recommender systems Links: to select recommendations, offline fine-tuning Sales volumes: prepare or push trends Resource allocation Consumers and contributors in UGC[Zhang11], Stock management Understanding of data through relevant features extraction Returning 12 Sellers 11.5 Products Buyers 11 Commission Log 10.5 10 9.5 9 0 50 100 150 200 250 300 Time (week) Sellers Products New 12 Buyers Commission 10 8 Log 6 4 2 0 50 100 150 200 250 300 Time (week)
  • 6. Prediction in Dynamic Graph Sequences Context MotivationSimilar Problems The Netflix prize: 1M$ for a 10% improvement in accuracy Amazon: 35% sales generated by recommendation[Linden03] CRM optimization: acquisition, cross-selling, churn management, prediction of top-selling items etc.
  • 7. Prediction in Dynamic Graph Sequences Context MotivationOther Web Applications
  • 8. Prediction in Dynamic Graph Sequences Context MotivationSimilar Problems in Computational Biology1 Understanding the underlying mechanisms of biological systems Inference procedures for analysis of effects of biological pathways in cancer progression Study the effect of potential drugs/treatments on gene regulatory networks in cancer cells 1 After a discussion with Ali Shohaie
  • 9. Prediction in Dynamic Graph Sequences Context Data DescriptionCase Study Data: C-to-C website Recommendation newsletters and banners Management of promotional assets and pressure on users Domain users products daily sales Music 0.4M 60K 2K Books 1.2M 1.7M 18K Electronic 0.5M 60K 2K Video Games 0.9M 0.2M 9K
  • 10. Prediction in Dynamic Graph Sequences Context Data DescriptionHeterogeneous Domains Users side 1 0.8 Video Games Density Music 0.6 Electronic Devices 0.4 Books 0.2 0 −8 −7 −6 −5 −4 −3 −2 −1 0 log(Clustering Coefficient) Products side 1 0.8 Video Games Density 0.6 Music Electronic Devices 0.4 Books 0.2 0 −8 −7 −6 −5 −4 −3 −2 −1 0 log(Clustering Coefficient) user side product side user side product side 0.9 1 0.5 0.45 Video Games Video Games Video Games 0.8 Music Music 0.4 Music Video Games Electronic 0.8 Electronic 0.4 Electronic 0.7 Music 0.35 Books Books Books Electronic 0.6 0.3 Books Density 0.3 Density 0.6 Density Density 0.5 0.25 0.4 0.2 0.4 0.2 0.3 0.15 0.2 0.1 0.1 0.2 0.1 0.05 0 0 0 0 8 9 10 11 12 13 7 8 9 10 11 12 13 7 8 9 10 11 12 13 7 8 9 10 11 12 13 (2) (2) log(degree) log(degree) log(d /degree) log(d /degree) user side product side Books joint User x Product distribution Music joint User x Product distribution 0.5 0.45 Video Games Video Games 1.0 1.0 Music 0.4 Music 0.4 Electronic Electronic 0.35 0.8 0.8 Books Books Products(Decreasing degree) 0.3 Products(decreasing degree) 0.3 Density Density 0.6 0.6 0.25 0.2 0.4 0.4 0.2 0.15 0.2 0.2 0.1 0.1 0.05 0.0 0.0 0 0 7 8 9 10 11 12 13 7 8 9 10 11 12 13 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 (3) (2) (3) (2) log(d /d ) log(d /d ) Users (decreasing degree) Users (decreasing degree)
  • 11. Prediction in Dynamic Graph Sequences Problem Formulation Problem Formulation
  • 12. Prediction in Dynamic Graph Sequences Problem FormulationDynamic Graphs Nodes linked by Edges that appear over time Web applications, Economics, Biology, Drug discovery (Social networks users, Friendship) (Users and products, Purchases or clicks) (Websites, Hyperlinks) (Proteins, Interaction)
  • 13. Prediction in Dynamic Graph Sequences Problem FormulationPrediction at Descriptor (macro) and Edge (micro) Levels Network Effect: cause and symptom of the evolution of node features e.g. popularity, homophily, centrality, diffusion level Simultaneousely predict node features and future links
  • 14. Prediction in Dynamic Graph Sequences Problem FormulationComplex Networks? Degrees of freedom ∼ n2 , n: # nodes Latent factors r n , r : # latent factors Intrinsic dimensionality reduced to ∼ rn n2 Kepler’s Laws of networks
  • 15. Prediction in Dynamic Graph Sequences Problem Formulation Random Graph ModelsRandom Graph Models Erdos-Renyi[Bollobas01]: nodes connected with uniform probability. No prediction chance Preferential Attachment[Albert02]: reproduces power-law degree distributions. Rich-get-Richer Block-Models[Nowicki01]: k blocks or clusters form the structure of the graph. Community Structure Latent Factor Model[Hoff02, Krivitsky10] node latent factors zi , zj , pair-wise covariate descriptors xi,j P(Y |X , Z , θ) = P(Yi,j |Xi,j , Zi , Zj , θ) i=j log odd(yi,j = 1|xi,j , zi , zj , α, β) ∝ α − βxi,j + zi − zj 2 Parameter Estimation
  • 16. Prediction in Dynamic Graph Sequences Problem Formulation Random Graph ModelsExponential Random Graph Families[Wasserman96] Graph z: realization of a random variable Z Pθ (Z = z) = e θ ω(z)−Ψ(θ) θ ∈ RQ vector of parameters, ω sufficient statistics on the graph z : ω(z) ∈ RQ Ψ a normalization factor Parameter Estimation by Maximizing Log-likelihood
  • 17. Prediction in Dynamic Graph Sequences Problem Formulation Link Prediction HeuristicsNearest Neighbors and Walks Hypothesis: a graph G is partially observed, we aim to find the hidden edges[Kleinberg07] Friends of my friends are likely to be my friends. A ∈ {0, 1}n×n the social adjacency matrix n (A2 )i,j = k=1 Ai,k Ak,j = #paths of length 2 from i to j = #common friends of i and j Random Walks Take W = D −1 A where D is the diagonal matrix of degrees ∞ Katz = k=1 β k W k = (In − βW )−1 − In
  • 18. Prediction in Dynamic Graph Sequences Problem Formulation Link Prediction HeuristicsBipartite Graphs of Marketplaces p1 u1 p2 u2 p3 u3 p4 u4 p5 Who bought this also bought that. M ∈ {0, 1}#users×#products : transactions (MM M)i,j : number of times product j was purchased by users having purchased the same products as a given user i 0 M Random Walks Apply the unipartite formula to M 0
  • 19. Prediction in Dynamic Graph Sequences Problem Formulation Link Prediction HeuristicsLow-Rank A = Udiag(σi )V SVD Define X ∗ = i σi (X ) and Dτ (A) = Udiag max(σi − τ, 0)V : the Shrinkage operator Rank r matrix closest to A is Udiag(σ1 , · · · , σr , 0, · · · 0)V 1 Fact : argminX 2 X − A 2 + τ X ∗ = Dτ (A) F block−wise adjacency 0 10 20 30 40 50 60 0 10 20 30 40 50 60 nz = 1400 Matrix Completion[Srebro05, Candes08, Koltchinskii11] estimates A by minimizing 1 ω(A) − ω(X ) 2 + τ X 2 ∗ 2 for a linear mapping ω : R n×n → RQ
  • 20. Prediction in Dynamic Graph Sequences Problem Formulation Link Prediction HeuristicsLink Prediction: Statistical and Spectral Properties Statistics on number of triangles and length of paths in the graph are stable Spectral functions[Kunegis09] of the adjacency and stochastic matrices killing low eigenvalues If A = Udiag(σi )V is the SVD, Udiag(f (σi )i )V is called spectral function. Spectral Functions 1 0.9 2 0.8 σ ∝ (1−β σ)−1−1 0.7 max(σ − τ, 0) 0.6 f(σ) 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 σ
  • 21. Prediction in Dynamic Graph Sequences Problem Formulation Link Prediction HeuristicsLeading Insight Link Prediction heuristics implicitly suggest 1. Graph sequence fits to some slowly varying feature map 2. Spectrum of graphs is regular Define a regularization formulation of the problem in order to leverage the trade-offs and select the best features. Obstacle to matrix completion: ω(A) is to be predicted.
  • 22. Prediction in Dynamic Graph Sequences Problem Formulation FrameworkNotations Time steps t ∈ {1, 2, ..., T } Adjacency matrices At ∈ {0, 1}n×n graph sequence Feature map ω : Rn×n → RQ linear ω linear (degree, clusters) Q n2 Prediction of AT +1 : score matrix S ∈ Rn×n
  • 23. Prediction in Dynamic Graph Sequences Problem Formulation FrameworkAssumptions 1. Stationarity of successive feature vectors ∃f : RQ → RQ , ∀t, ω(At+1 ) = f (ω(At )) + t 2. Simplicity of S S low rank[Srebro05], Penalize the trace norm S ∗
  • 24. Prediction in Dynamic Graph Sequences Problem Formulation FrameworkQuantities to control 1. Features predictor T −1 J1 (f ) = (ω(At+1 ), f (ω(At )) + κ f H t=1 2. Predicted features matching the predicted graph features (coupling term) J2 (f , S) = (ω(S), f (ω(AT )) 3. Penalty on S J3 (S) = τ S ∗
  • 25. Prediction in Dynamic Graph Sequences Problem Formulation FrameworkConvex Optimization Problem Let     ω(A1 ) ω(A2 ) . . . . (T −1)×Q X = ,Y =  ∈R     . . ω(AT −1 ) ω(AT ) We take linear predictors, f (ω) = ω W and define the convex objective . L = J1 + J2 + J3 1 2 κ 2 1 2 = XW − Y F + W F + ω(AT ) W − ω(S) 2 +τ S ∗ 2 2 2
  • 26. Prediction in Dynamic Graph Sequences Algorithms Algorithms
  • 27. Prediction in Dynamic Graph Sequences AlgorithmsOptimization Strategies Goal : minimize L(S, W ) 1. Two-stage optimization 2. Joint optimization in W and S 3. Variant 1: graph regularization 4. Variant 2: sparsity constraint
  • 28. Prediction in Dynamic Graph Sequences Algorithms Two-stage optimizationTwo-stage Optimization [Richard10] . Solve W = argminW ∈RQ×Q J1 (W ) (regression) Minimize J2 (W , S) + J3 (S) Optimal algorithms due to Nesterov √ -optimal solution after O(1/ ) iterations instead of O(1/ 2 ) [Goldfarb09] (r ,noise)alg. Proposed Static P. A. Katz (5,0.000) 0.671±0.008 0.648 ± 0.008 0.627 ± 0.015 0.616 ± 0.015 (5,0.250) 0.675 ± 0.009 0.642 ± 0.007 0.602 ± 0.016 0.592 ± 0.016 (5,0.750) 0.519 ± 0.007 0.525 ± 0.005 0.497 ± 0.007 0.491 ± 0.007 (500,0.000) 0.592 ± 0.008 0.587 ± 0.007 0.671 ± 0.010 0.667 ± 0.009 (500,0.250) 0.607 ± 0.011 0.588 ± 0.009 0.649 ± 0.009 0.643 ± 0.009 (500,0.750) 0.601 ± 0.010 0.583 ± 0.007 0.645 ± 0.017 0.641 ± 0.017
  • 29. Prediction in Dynamic Graph Sequences Algorithms Two-stage optimizationSplit and Alternately Minimize . Splitting: Lη (S, S) = τ S ∗ + h(S, ν), subject to S = S Alternately minimize in S and S : 1 mG (S) = argminS τ S ∗ + h(S), S − S + 2µ S −S 2 F 1 mH (S) = argminS h(S, ν) + τ S ∗, S −S + 2µ S −S 2 F Algorithm 1 Link Discovery Algorithm Parameters: τ, ν, η Initialization: W0 = Z1 = AT , α1 = 0 for k = 1, 2, . . . do Sk ← mG (Zk ) and Sk ← mH (Sk ) 1 Wk ← (Sk + Sk ) 2 1 2 αk+1 ← (1 + 1 + 4αk ) 2 1 Zk+1 ← Wk + αk (Sk − Wk−1 ) − (Wk − Wk−1 ) αk+1 end for
  • 30. Prediction in Dynamic Graph Sequences Algorithms Joint Optimization in W and SMinimization of L by proximal gradient descent L(S, W ) = g (S, W ) + Γ(S, W ) . g (S, W ) = 1 XW − Y 2 + 1 ω(AT ) W − ω(S) 2 F 2 2 2 : smoothly differentiable fit-term . Γ(S, W ) = κ W 2 + τ S ∗ : convex penalty 2 F Explicit proximal . 1 2 1 2 proxθΓ (S, W ) = argmin(Z ,V ) θΓ(Z , V )+ S−Z F+ W −V F 2 2 = (Dθτ (S), W /(1 + θκ)) (Sk+1 , Wk+1 ) = proxθk Γ (Sk , Wk ) − θk gradg (Sk , Wk ) FISTA[Beck09] for optimal convergence rate
  • 31. Prediction in Dynamic Graph Sequences Algorithms VariantsVariant 1: Graph Regularization Constraint Want i ∼S j ⇒ f (i) ∼H f (j) Control the laplacian-like[Chen10] inner product J4 (f , S) = i,j Si,j f (i) − f (j) 2 H = S, f (i) − f (j) 2 H i,j i∼j f (i) ∼f (j) Other possibility: J4 (f , S) = S, Gram(f ) Lgraph regularization = L + λJ4 Issue: non-convex regularizers Algorithms: 1. Gradient descent with hyper-parameters that keep the objective inside the convexity domain 2. Projected gradient descent inside the convexity domain
  • 32. Prediction in Dynamic Graph Sequences Algorithms VariantsGradient Descent Convergence Area
  • 33. Prediction in Dynamic Graph Sequences Algorithms VariantsEmpirical Results Data Marketing Synthetic Method Error ∆Sales ∆Graph ∆Sales ∆Graph Our solution 0.62 0.28 0.13 ± .002 0.21± .003 Rank-free prediction 0.64 0.31 0.19 ± .008 0.24 ± .01 AR 0.80 - 0.66 ± .007 - ARIMA 0.78 - 0.17 ± .02 - VAR 1.02 - 0.42 ± .09 - MC with shrinkage - 0.38 - 0.22 ± .003 ω(AT +1 )−f (ω(AT )) 2 Sales Prediction metric: ∆Sales = ω(AT +1 ) 2 to be minimized AT +1 −S F Graph Completion metric: ∆Graph = AT +1 F to be minimized
  • 34. Prediction in Dynamic Graph Sequences Algorithms VariantsConvexity Domain 2 2 J4 κ |f| + ν|S−AT| λ J4 + κ |f|2 + ν|S−AT|2 30 sw2 + s2 + w2 16 14 14 25 12 s2 + w2 12 10 10 20 2 sw 8 8 15 6 6 4 10 4 2 2 2 2 1.5 5 0 1.5 1 2 4 1 0 3.5 0.5 0.5 4 0 1 3 3.5 0 4 2.5 0 3 3.5 2.5 −0.5 3 0 2 −0.5 2 2.5 1.5 −1 −1 2 1 1.5 1.5 −1 −1.5 1 −1.5 1 0.5 + = 0.5 0.5 −2 −2 −2 s 0 w s 0 w s 0 w J4 not jointly-convex in (S, f ) λJ4 + κ W 2 + ν S − AT 2 convex inside F F √ n×n νκ E= S ∈ R+ , W ∈ Rn×d W 2 F ≤ 2λ
  • 35. Prediction in Dynamic Graph Sequences Algorithms VariantsEmpirical Results Performance (ν) 1.4 HYBRID (Regression) 1.2 HYBRID (Graph Completion) Rank Free Regression 1 Rank Free Graph Completion Regression Only relative errors Graph Only 0.8 0.6 0.4 0.2 0 −8 −6 −4 −2 0 2 4 6 8 10 log(ν)
  • 36. Prediction in Dynamic Graph Sequences Algorithms VariantsVariant 2: Sparsity Constraint . Lsparse (S, W ) = L(S, W ) + γ S 1,1 (lasso) Split S onto S and S and add an equality constraint Synthetic data n = 100, Q = 15, T = 200 10 runs for cross validation 10 runs for test AUC on S reported Nearest Neighbors Static Low Rank Lsparse L 0.9767 ± 0.0076 0.9751 ± 0.0362 0.9812 ± 0.0008 0.9778 ± 0.0071
  • 37. Prediction in Dynamic Graph Sequences Discussion Discussion
  • 38. Prediction in Dynamic Graph Sequences DiscussionSynthetic Data Generation Let ∀k ∈ {1, · · · , r } −(t−µi,k )2 (i,k) 1 2σ 2 Ut =√ e i,k + i,k 2πσi,k quantify the taste of user i for feature k at t, and (i,k) Vt the weight of feature k for item i and take (i,j) At = 1{U (i) (t) > θ}1{V (j) (t) > θ} At is 1. Sparse 2. Rank at most r 3. Its latent factors evolve slowly provided σ’s are not too small.
  • 39. Prediction in Dynamic Graph Sequences DiscussionScalability Dτ (A) is dense, even for sparse A 1 2 2 Fact[Srebro05] : S ∗ = 2 minUV =S U F + V F Instead of fixing τ , fix r and take U, V ∈ Rn×r Define . J (U, V , W ) = 2 2 κ 2 λ 2 2 XW −Y F+ ω(AT ) W −ω(UV ) 2+ W F+ ( U F+ V F) 2 2 Parallel Stochastic Gradient Algorithms [Recht11]
  • 40. Prediction in Dynamic Graph Sequences DiscussionStore Recommendation Lists Each feature leads to a specific list of recommendation Store top-k lists Learn optimal combinations / aggregations ... work in progress
  • 41. Prediction in Dynamic Graph Sequences DiscussionConclusion Introduction of a regularization approach formulation for link prediction in graph sequences Several variants detailed and empirically tested Perspective for scalable algorithms Perspective for theoretical analysis and understanding of the problem
  • 42. Prediction in Dynamic Graph Sequences DiscussionThanks Mercis !
  • 43. Prediction in Dynamic Graph Sequences References Reka Albert and Albert-Laszlo Barab`si. a Statistical mechanics of complex networks. Reviews of Modern Physics, 74:4797, 2002. A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal of Imaging Sciences, 2(1):183–202, 2009. B. Bollobas. Random graphs, vol. 73 of Cambridge Studies in Advanced Mathematics. 2nd ed. Cambridge University Press, Cambridge, 2001. Emmanuel J. Cand`s and Terence Tao. e A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):1956–1982, 2008.
  • 44. Prediction in Dynamic Graph Sequences References Xi Chen, Seyoung Kim, Qihang Lin, Jaime G. Carbonell, and Eric P. Xing. Graph-structured multi-task regression and an efficient optimization method for general fused lasso. arXiv, 2010. Donald Goldfarb and Shiqlan Ma. Fast alternating linearization methods for minimizing the sum of two convex functions. Technical Report, Department of IEOR, Columbia University, 2009. P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis. Journal of the Royal Statistical Society, 97, 2002. David Liben-Nowell and Jon Kleinberg. The link-prediction problem for social networks.
  • 45. Prediction in Dynamic Graph Sequences References Journal of the American Society for Information Science and Technology, 58(7):1019–1031, 2007. Vladimir Koltchinskii, Karim Lounici, and Alexandre Tsybakov. Nuclear norm penalization and optimal rates for noisy matrix completion. Annals of Statistics, 2011. P. N. Krivitsky and M. S. Handcock. A Separable Model for Dynamic Networks. ArXiv e-prints, November 2010. J´rˆme Kunegis and Andreas Lommatzsch. eo Learning spectral graph transformations for link prediction. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 561–568, New York, NY, USA, 2009. ACM.
  • 46. Prediction in Dynamic Graph Sequences References G. Linden, B Smith, and J. York. Amazon.com recommendations : Item-to-item collaborative filtering. IEEE Internet Computing, 2003. K. Nowicki and T. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96:1077– 1087, 2001. Benjamin Recht and Christopher Re. Parallel stochastic gradient algorithms for large-scale matrix completion. Submitted for publication, 2011. Emile Richard, Nicolas Baskiotis, Theodoros Evgeniou, and Nicolas Vayatis. Link discovery using graph feature tracking.
  • 47. Prediction in Dynamic Graph Sequences References Proceedings of Neural Information Processing Systems (NIPS), 2010. Nathan Srebro, Jason D. M. Rennie, and Tommi S. Jaakkola. Maximum-margin matrix factorization. In Lawrence K. Saul, Yair Weiss, and L´on Bottou, editors, in e Proceedings of Neural Information Processing Systems 17, pages 1329–1336. MIT Press, Cambridge, MA, 2005. Stanley Wasserman and Philippa Pattison. Logit models and logistic regressions for social networks: I. an introduction to markov graphs and p ∗ . Psychometrika, 61(3):401–425, September 1996. K. Zhang, Th. Evgeniou, V. Padmanabhan, and E. Richard. Content contributor management and network effects in a ugc environment. Marketing Science, 2011.