Johan Suykens: "Models from Data: a Unifying Picture"

812 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
812
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Johan Suykens: "Models from Data: a Unifying Picture"

  1. 1. Models from Data: a Unifying Picture Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 10 B-3001 Leuven (Heverlee), Belgium Email: johan.suykens@esat.kuleuven.be http://www.esat.kuleuven.be/scd/ Grand Challenges of Computational Intelligence Nicosia Cyprus, Sept. 14, 2012 Models from data: a unifying picture - Johan Suykens
  2. 2. Overview• Models from data• Examples• Challenges for computational intelligence
  3. 3. High-quality predictive models are crucial biomedical bio-informaticsModels from data: a unifying picture - Johan Suykens 1
  4. 4. High-quality predictive models are crucial biomedical process industry energy bio-informaticsModels from data: a unifying picture - Johan Suykens 1
  5. 5. High-quality predictive models are crucial biomedical process industry energy brain-computer interfaces traffic networks bio-informaticsModels from data: a unifying picture - Johan Suykens 1
  6. 6. Classical Neural Networksx1 w1x2 w2 y w h(·)x3 w 3xn n b h(·) 1Multilayer Perceptron (MLP) properties:- Universal approximation- Learning from input-output patterns: off-line & on-line- Parallel network architecture, multiple inputs and outputs + Flexible and widely applicable: Feedforward & recurrent networks, supervised & unsupervised learning - Many local minima, trial and error for number of neuronsModels from data: a unifying picture - Johan Suykens 2
  7. 7. Support Vector Machines cost function cost function MLP SVM weights weights• Nonlinear classification and function estimation by convex optimization• Learning and generalization in high dimensional input spaces• Use of kernels: - linear, polynomial, RBF, MLP, splines, kernels from graphical models,... - application-specific kernels: e.g. bioinformatics, textmining [Vapnik, 1995; Sch¨lkopf & Smola, 2002; Shawe-Taylor & Cristianini, 2004] oModels from data: a unifying picture - Johan Suykens 3
  8. 8. Kernel-based models: different views SVM Some early history on RKHS: LS−SVM 1910-1920: Moore 1940: Aronszajn Kriging RKHS 1951: Krige 1970: Parzen Gaussian Processes 1971: Kimeldorf & Wahba Complementary insights from different perspectives: kernels are used in different methodologies- Support vector machines (SVM): optimization approach (primal/dual)- Reproducing kernel Hilbert spaces (RKHS): variational problem, functional analysis- Gaussian processes (GP): probabilistic/Bayesian approach Models from data: a unifying picture - Johan Suykens 4
  9. 9. SVMs: living in two worlds ... Primal space Parametric y = sign[wT ϕ(x) + b] ˆ ϕ(x) ϕ1 (x) x w1 ˆ y xx x x wnh x x x ϕnh (x) x o x K(xi , xj ) = ϕ(xi )T ϕ(xj ) (“Kernel trick”) o x o oo o Dual space o Input space o Nonparametric P#sv y = sign[ ˆ i=1 αi yiK(x, xi ) + b] K(x, x1 ) Feature space α1 ˆ y x α#sv K(x, x#sv )Models from data: a unifying picture - Johan Suykens 5
  10. 10. SVMs: living in two worlds ... Primal space Parametric Parametric y = sign[wT ϕ(x) + b] ˆ ϕ(x) ϕ1 (x) x w1 ˆ y xx x x wnh x x x ϕnh (x) x o x K(xi , xj ) = ϕ(xi )T ϕ(xj ) (“Kernel trick”) o x o oo o Dual space o Non−parametric o Input space Nonparametric P#sv y = sign[ ˆ i=1 αi yiK(x, xi ) + b] K(x, x1 ) Feature space α1 ˆ y x α#sv K(x, x#sv )Models from data: a unifying picture - Johan Suykens 5
  11. 11. Fitting models to data: alternative views- Consider model y = f (x; w), given input/output data {(xi, yi)}N : ˆ i=1 N min wT w + γ (yi − f (xi; w))2 w i=1- Rewrite the problem as minw,e wT w + γ N (yi − f (xi; w))2 i=1 subject to ei = yi − f (xi; w), i = 1, ..., N- Construct Lagrangian and take condition for optimality- Express the solution and the model in terms of the Lagrange multipliersModels from data: a unifying picture - Johan Suykens 6
  12. 12. Fitting models to data: alternative views- Consider model y = f (x; w), given input/output data {(xi, yi)}N : ˆ i=1 N min wT w + γ (yi − f (xi; w))2 w i=1- Rewrite the problem as N min wT w + γ i=1 e2 i w,e subject to ei = yi − f (xi; w), i = 1, ..., N- Construct Lagrangian and take conditions for optimality- Express the solution and the model in terms of the Lagrange multipliersModels from data: a unifying picture - Johan Suykens 6
  13. 13. Linear model: solving in primal or dual?inputs x ∈ Rd, output y ∈ Rtraining set {(xi, yi)}N i=1 (P ) : y = wT x + b, ˆ w ∈ Rd ր Model ց T (D) : y = ˆ i αi xi x + b, α ∈ RNModels from data: a unifying picture - Johan Suykens 7
  14. 14. Linear model: solving in primal or dual?inputs x ∈ Rd, output y ∈ Rtraining set {(xi, yi)}N i=1 (P ) : y = wT x + b, ˆ w ∈ Rd ր Model ց (D) : y = ˆ i αi xT x + b, i α ∈ RNModels from data: a unifying picture - Johan Suykens 7
  15. 15. Linear model: solving in primal or dual?few inputs, many data points: (e.g. 20 × 1.000.000) primal : w ∈ R20dual: α ∈ R1.000.000 (kernel matrix: 1.000.000 × 1.000.000)Models from data: a unifying picture - Johan Suykens 7
  16. 16. Linear model: solving in primal or dual?many inputs, few data points: (e.g. 10.000 × 50)primal: w ∈ R10.000 dual : α ∈ R50 (kernel matrix: 50 × 50)Models from data: a unifying picture - Johan Suykens 7
  17. 17. Least Squares Support Vector Machines: “core models”• Regression min wT w + γ e2 s.t. yi = wT ϕ(xi) + b + ei, ∀i i w,b,e i• Classification min wT w + γ e2 s.t. yi(wT ϕ(xi) + b) = 1 − ei, ∀i i w,b,e i• Kernel pca (V = I), Kernel spectral clustering (V = D −1) min −wT w + γ vie2 s.t. ei = wT ϕ(xi) + b, ∀i i w,b,e i• Kernel canonical correlation analysis/partial least squares T T 2 ei = wT ϕ1(xi) + b min w w+v v+ν (ei − ri) s.t. w,v,b,d,e,r ri = v T ϕ2(yi ) + d i [Suykens & Vandewalle, 1999; Suykens et al., 2002; Alzate & Suykens, 2010]Models from data: a unifying picture - Johan Suykens 8
  18. 18. Core models regularization terms Core model + additional constraintsModels from data: a unifying picture - Johan Suykens 9
  19. 19. Core models regularization terms Core model + additional constraints optimal model representation model estimateModels from data: a unifying picture - Johan Suykens 9
  20. 20. Core models parametric model support vector machine least−squares support vector machine Parzen kernel model regularization terms Core model + additional constraints optimal model representation model estimateModels from data: a unifying picture - Johan Suykens 9
  21. 21. Overview• Models from data• Examples• Challenges for computational intelligence
  22. 22. Kernel spectral clustering• Underlying model: e∗ = wT ϕ(x∗) ˆ with q∗ = sign[ˆ∗] the estimated cluster indicator at any x∗ ∈ Rd. ˆ e• Primal problem: training on given data {xi }N i=1 N 1 1 min − wT w + γ vi e2 i w,e 2 2 i=1 subject to ei = wT ϕ(xi), i = 1, ..., N with weights vi (related to inverse degree matrix: V = D −1). Dual problem: Ωα = λDα with Ωij = K(xi, xj ) = ϕ(xi)T ϕ(xj ).• Kernel spectral clustering [Alzate & Suykens, IEEE-PAMI, 2010], related to spectral clustering [Fiedler, 1973; Shi & Malik, 2000; Ng et al. 2002; Chung, 1997]Models from data: a unifying picture - Johan Suykens 10
  23. 23. Primal and dual model representationsbias term b for centeringk clusters, k − 1 sets of constraints (index l = 1, ..., k − 1) (l) (l) T (P ) : sign[ˆ∗ ] e = sign[w ϕ(x∗) + bl] ր M ց (l) (l) (D) : sign[ˆ∗ ] = sign[ e j αj K(x∗ , xj ) + bl ]Advantages:- out-of-sample extensions- model selection- solving large scale problemsModels from data: a unifying picture - Johan Suykens 11
  24. 24. Out-of-sample extension and coding 8 8 6 6 4 4 2 2 0 0x(2) x(2) −2 −2 −4 −4 −6 −6 −8 −8 −10 −10 −12 −10 −8 −6 −4 −2 0 2 4 6 −12 −10 −8 −6 −4 −2 0 2 4 6 x(1) x(1) Models from data: a unifying picture - Johan Suykens 12
  25. 25. Out-of-sample extension and coding 8 8 6 6 4 4 2 2 0 0x(2) x(2) −2 −2 −4 −4 −6 −6 −8 −8 −10 −10 −12 −10 −8 −6 −4 −2 0 2 4 6 −12 −10 −8 −6 −4 −2 0 2 4 6 x(1) x(1) Models from data: a unifying picture - Johan Suykens 12
  26. 26. Example: image segmentation 4 3 2 i,val (3) 1 e 0 −1 −2 3 2 2 1 1 0 0 −1 −1 −2 −2 −3 (2) −3 −4 (1) e −5 e i,val i,valModels from data: a unifying picture - Johan Suykens 13
  27. 27. Example: image segmentation 1 0.9 0.8 Fisher criterion 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2 4 6 8 10 12 14 Number of clusters k 0.1 0.05 0 αi,val −0.05 (2) −0.1 −0.15 −0.2 −0.25 −0.04 −0.02 0 (1)0.04 0.02 0.06 0.08 0.1 αi,valModels from data: a unifying picture - Johan Suykens 14
  28. 28. Kernel spectral clustering: sparse kernel models original image binary clusteringIncomplete Cholesky decomposition: Ω − GGT 2 ≤ η with G ∈ RN ×R and R ≪ N Image (Berkeley image dataset): 321 × 481 (154, 401 pixels), 175 SV (l) (l) e∗ = i∈SSV αi K(xi , x∗ ) + blModels from data: a unifying picture - Johan Suykens 15
  29. 29. Kernel spectral clustering: sparse kernel models original image sparse kernel modelIncomplete Cholesky decomposition: Ω − GGT 2 ≤ η with G ∈ RN ×R and R ≪ N Image (Berkeley image dataset): 321 × 481 (154, 401 pixels), 175 SV (l) (l) e∗ = i∈SSV αi K(xi , x∗ ) + blModels from data: a unifying picture - Johan Suykens 15
  30. 30. Highly sparse kernel models: image segmentation * * * * * * * * *** * * * * * ** * ** * * ** * * * * * * * * * * * * * ** * * * * * * * * * * * * * ** * * * ** * * * * * * ** * * * * * * * * * * * * ** * * ** ** * * * * * * * * * * *** ** * * * * * * * * * * * * * ** ** * * *** * ** * * * * * * * ** * * * * * * ** * * * * * * ** ** * * * * * * * * * * ** * * * ** * * * * * * * * * * * * * ** * * * * * * * * * * * ** * * * * * ** ** * * * * * * * * ** * * * *** * * * * * * * * * * ** * * *** * * * * * * ** * ** * * ** * * * ** * * * ** * * * * * * * * * * * * ** * * *** * * ** * * ** * * * * * ** * * * * ** * * * * ** * * * * ** * ** * * * * * * * * * * ** * ** * * * * * * * * * * ** * * * * ** * 0.5 * * ** * * *** * * * ** * * ** * * ** * ** * * * * * * * **** * * * * * * * ** ** * * ** * ** *** * * * * ** ** * * ** * 0 * * * * ** * ** * * * *** * * * * * * * ** * * ** * * * * * ** ** * ** * * * ** ** * * * * * * −0.5 * * ** * * * * * ** * * ** ** * ** * * * * * ** ** * * * ** * * * * * * * * * ** (3) * * * * * * ** * −1 * * * ** * * * * * ** * * * * * * * * * * * * ** ** * * ** * ei * * * * * * * −1.5 −2 −2.5 −3 2 1 0.5 0 0 −1 −0.5 −2 −1 (2) −3 −1.5 (1) ei eiModels from data: a unifying picture - Johan Suykens 16
  31. 31. Highly sparse kernel models: image segmentation * * * * * * * 0.5 0 * −0.5 ** (3) −1 * ei * −1.5 −2 −2.5 −3 2 1 0.5 0 0 −1 −0.5 −2 −1 (2) −3 −1.5 (1) ei ei only 3k = 12 support vectors [Alzate & Suykens, Neurocomputing, 2011]Models from data: a unifying picture - Johan Suykens 16
  32. 32. Kernel spectral clustering: adding prior knowledge• Pair of points x†, x‡: c = 1 must-link, c = −1 cannot-link• Primal problem [Alzate & Suykens, IJCNN 2009] k−1 k−1 1 (l) T (l) 1 (l)T min − w w + γle D −1e(l) w(l) ,e(l) ,bl 2 2 l=1 l=1 (1) (1) subject to e = ΦN ×nh w + b11N . . e(k−1) = ΦN ×nh w(k−1) + bk−11N T T w(1) ϕ(x†) = cw(1) ϕ(x‡) . . T T w(k−1) ϕ(x†) = cw(k−1) ϕ(x‡)• Dual problem: yields rank-one downdate of the kernel matrixModels from data: a unifying picture - Johan Suykens 17
  33. 33. Kernel spectral clustering: example original image without constraintsModels from data: a unifying picture - Johan Suykens 18
  34. 34. Kernel spectral clustering: example original image with constraintsModels from data: a unifying picture - Johan Suykens 19
  35. 35. Hierarchical kernel spectral clusteringHierarchical kernel spectral clustering:- looking at different scales- use of model selection and validation data[Alzate & Suykens, Neural Networks, 2012]Models from data: a unifying picture - Johan Suykens 20
  36. 36. Power grid: kernel spectral clustering of time-series 1 1 1 0.9 0.9 0.9 0.8 0.8 0.8normalized load normalized load normalized load 0.7 0.7 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 0 0 5 10 15 20 5 10 15 20 5 10 15 20 hour hour hour Electricity load: 245 substations in Belgian grid (1/2 train, 1/2 validation) xi ∈ R43.824: spectral clustering on high dimensional data (5 years) 3 of 7 detected clusters: - 1: Residential profile: morning and evening peaks - 2: Business profile: peaked around noon - 3: Industrial profile: increasing morning, oscillating afternoon and evening [Alzate, Espinoza, De Moor, Suykens, 2009] Models from data: a unifying picture - Johan Suykens 21
  37. 37. Dimensionality reduction and data visualization• Traditionally: commonly used techniques are e.g. principal component analysis (PCA), multi-dimensional scaling (MDS), self-organizing maps (SOM)• More recently: isomap, locally linear embedding (LLE), Hessian locally linear embedding, diffusion maps, Laplacian eigenmaps (“kernel eigenmap methods and manifold learning”) [Roweis & Saul, 2000; Coifman et al., 2005; Belkin et al., 2006]• Kernel maps with reference point [Suykens, IEEE-TNN 2008]: data visualization and dimensionality reduction by solving linear systemModels from data: a unifying picture - Johan Suykens 22
  38. 38. Kernel maps with reference point: formulation • Kernel maps with reference point [Suykens, IEEE-TNN 2008]: - LS-SVM core part: realize dimensionality reduction x → z - Regularization term: (z − PD z)T (z − PD z) = N zi − N sij Dzj 2 P P i=1 j=1 2 2 2 with D diagonal matrix and sij = exp(− xi − xj 2/σ ) - reference point q (e.g. first point; sacrificed in the visualization) • Example: d = 2 N 1 ν η X 2 min T T T (z − PD z) (z − PD z)+ (w1 w1 + w2 w2 ) + (ei,1 + e2 ) i,2z,w1 ,w2 ,b1 ,b2 ,ei,1 ,ei,2 2 2 2 i=1 such that cT z = q1 + e1,1 1,1 T c1,2z = q2 + e1,2 cT z = w1 ϕ1 (xi ) + b1 + ei,1 , ∀i = 2, ..., N i,1 T cT z = w2 ϕ2 (xi ) + b2 + ei,2 , ∀i = 2, ..., N i,2 T Coordinates in low dimensional space: z = [z1; z2; ...; zN ] ∈ RdN Models from data: a unifying picture - Johan Suykens 23
  39. 39. Kernel maps with reference point: formulation • Kernel maps with reference point [Suykens, IEEE-TNN 2008]: - LS-SVM core part: realize dimensionality reduction x → z T PN PN 2 - Regularization term: (z − PD z) (z − PD z) = i=1 zi − j=1 sij Dzj 2 with D diagonal matrix and sij = exp(− xi − xj 2/σ 2)2 - reference point q (e.g. first point; sacrificed in the visualization) • Example: d = 2 N 1 ν η X 2 min T T T (z − PD z) (z − PD z)+ (w1 w1 + w2 w2 ) + (ei,1 + e2 ) i,2z,w1 ,w2 ,b1 ,b2 ,ei,1 ,ei,2 2 2 2 i=1 such that cT z = q1 + e1,1 1,1 T c1,2z = q2 + e1,2 cT z = w1 ϕ1 (xi ) + b1 + ei,1 , ∀i = 2, ..., N i,1 T cT z = w2 ϕ2 (xi ) + b2 + ei,2 , ∀i = 2, ..., N i,2 T Coordinates in low dimensional space: z = [z1; z2; ...; zN ] ∈ RdN Models from data: a unifying picture - Johan Suykens 23
  40. 40. Kernel maps with reference point: formulation • Kernel maps with reference point [Suykens, IEEE-TNN 2008]: - LS-SVM core part: realize dimensionality reduction x → z - Regularization term: (z − PD z)T (z − PD z) = N zi − N sij Dzj 2 P P i=1 j=1 2 2 2 with D diagonal matrix and sij = exp(− xi − xj 2/σ ) - reference point q (e.g. first point; sacrificed in the visualization) • Example: d = 2 N 1 ν η X 2 min T T T (z − PD z) (z − PD z)+ (w1 w1 + w2 w2 ) + (ei,1 + e2 ) i,2z,w1 ,w2 ,b1 ,b2 ,ei,1 ,ei,2 2 2 2 i=1 such that cT z = q1 + e1,1 1,1 cT z = q2 + e1,2 1,2 cT z = w1 ϕ1 (xi ) + b1 + ei,1 , ∀i = 2, ..., N i,1 T cT z = w2 ϕ2 (xi ) + b2 + ei,2 , ∀i = 2, ..., N i,2 T Coordinates in low dimensional space: z = [z1; z2; ...; zN ] ∈ RdN Models from data: a unifying picture - Johan Suykens 23
  41. 41. Kernel maps: spiral example 0.5 0x3 q = [+1; −1] q = [−1; −1] −3 −3 x 10 x 10 12 12 −0.5 1 10 10 0.5 1 0 0.5 −0.5 0 8 8 −0.5 −1 −1 −1.5 −1.5 x 6 6 2 x1 z hat z hat 2 2 4 4 2 2 0 0 −2 −2 −0.02 −0.015 −0.01 −0.005 0 0.005 0.01 −0.01 −0.005 0 0.005 0.01 0.015 0.02 z1hat z1hat training data (blue *), validation data (magenta o), test data (red +) 2 ˆT ˆ zi zj xT xj i Model selection: min ˆ ˆ zi 2 zj 2 − xi 2 xj 2 i,j Models from data: a unifying picture - Johan Suykens 24
  42. 42. Kernel maps: visualizing gene distribution −3 x 10 2.1 2 1.9 3 z 1.8 1.7 2.3 2.2 −2.35 −2.3 2.1 −2.25 −2.2 −3 −2.15 −2.1 2 x 10 −2.05 −2 −1.95 1.9 −3 x 10 z2 z1 Alon colon cancer microarray data set: 3D projections Dimension input space: 62 Number of genes: 1500 (training: 500, validation: 500, test: 500)Models from data: a unifying picture - Johan Suykens 25
  43. 43. Kernels & Tensors neuroscience: EEG data (time samples × frequency × electrodes) computer vision: image (/video) compression/completion/· · · (pixel × illumination × expression × · · ·) web mining: analyze users behaviors (users × queries × webpages) vector x matrix X tensor X- Naive kernel: K(X , Y) = exp(− 1 σ 2 vec(X ) − vec(Y) 2) 2 2- Tensorial kernel exploiting structure: learning from few examples [Signoretto et al., Neural Networks, 2011 & IEEE-TSP, 2012]Models from data: a unifying picture - Johan Suykens 26
  44. 44. Tensor completionMass spectral imaging: sagittal section mouse brain [data: E. Waelkens, R. Van de Plas]Tensor completion using nuclear norm regularization [Signoretto et al., IEEE-SPL, 2011]Models from data: a unifying picture - Johan Suykens 27
  45. 45. Overview• Models from data• Examples• Challenges for computational intelligence
  46. 46. Challenges for Computational Intelligence- Bridging gaps between advanced methods and end-users- New mathematical and methodological frameworks- Scalable algorithms towards large and high-dimensional dataModels from data: a unifying picture - Johan Suykens 28
  47. 47. Acknowledgements• Colleagues at ESAT-SCD (especially research units: systems, models, control - biomedical data processing - bioinformatics): C. Alzate, A. Argyriou, J. De Brabanter, K. De Brabanter, B. De Moor, M. Espinoza, T. Falck, D. Geebelen, X. Huang, V. Jumutc, P. Karsmakers, R. Langone, J. Lopez, J. Luts, R. Mall, S. Mehrkanoon, Y. Moreau, K. Pelckmans, J. Puertas, L. Shi, M. Signoretto, V. Van Belle, R. Van de Plas, S. Van Huffel, J. Vandewalle, C. Varon, S. Yu, and others• Many people for joint work, discussions, invitations, organizations• Support from ERC AdG A-DATADRIVE-B, KU Leuven, GOA-MaNet, COE Optimization in Engineering OPTEC, IUAP DYSCO, FWO projects, IWT, IBBT eHealth, COSTModels from data: a unifying picture - Johan Suykens 29

×