Successfully reported this slideshow.
Upcoming SlideShare
×

# Methods of Manifold Learning for Dimension Reduction of Large Data Sets

7,433 views

Published on

Doctoral candidacy talk to faculty of the Applied Mathematics, Applied Statistics and Scientific Computing program at the University of Maryland

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Methods of Manifold Learning for Dimension Reduction of Large Data Sets

1. 1. METHODS OF MANIFOLD LEARNING FOR DIMENSION REDUCTION OF LARGE DATA SETS Doctoral Candidacy Preliminary Oral Exam Ryan Bensussan Harvey May 17, 2010 Committee: Wojciech Czaja, Chair Kasso Okoudjou John Benedetto Rama Chellappa 1
2. 2. PREVIEW • Motivation • Problem • Methods • Research Ideas Image by Stefan Baudy, used under Creative Commons license Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Preview 2
3. 3. MOTIVATION • Science and business producing massive quantities of data • Computationally difficult to store, process, analyze, visualize • Academic focus on compression, dimension-reduced processing to address this problem • Compression methods widely available, but require decompression step to use data • Dimension-reduced processing generally not available Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Motivation 3
4. 4. THE PROBLEM •What are we trying to do? •What is the intuition for this problem? • How can we formalize the problem mathematically? • On what kinds of data can we solve this problem? Image by qisur, used under Creative Commons license Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Problem Definition 4
5. 5. PROBLEM INTUITION • Think flattening a 3D surface to a 2D image • Simple projection • Preserving some particular quantity of interest locally • Preserving some global property of the surface Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Problem Definition 5
6. 6. THE PROBLEM (FORMALIZED) • Inputs: • Outputs: X = [x1, . . . ,xn], xk ∈ RD Y = [y1, . . . , yn], yk ∈ Rd, d D M⊂ Rd • Assumption: data live on some manifold embedded in RD , and inputs X are samples taken in RD of the underlying manifold . M • Problem statement: Find a reduced representation Y of X which best preserves the manifold structure of the data, as defined by some metric of interest. Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Problem Definition 6
7. 7. An example from Molecular Dynamics, I EXAMPLES Text OF documents DATA SETS 1000 Science News articles, from 8 different categories. We compute about 10000 coordinates, i-th coordinate of document d represents frequency in document d of the i-th word in a fixed dictionary. Data base of about 60, 000 28 × 28 gray-scale pictures of handwritten digits, Text collected by USPS. Point cloud in R282 . Goal: automatic recognition. Mauro Maggioni Geometry of data sets in high dimensions and learning Hyperspectral The dynamics of a small protein in a bath of water molecules is approximated by a Langevin system of stochastic equations x˙ = −∇U(x) + w˙ . Handwritten Digits 10 model in the form of Equation 3, we can synthesize new shapes through the walking cycle. In these examples only 10 samples were used to embed the manifold for half a cycle on a unit circle in 2D and to learn the model. Silhouettes at intermediate body configurations were synthesized (at the middle point between each two centers) using the learned model. The learned model can successfully interpolate shapes at intermedi-ate The set of states of the protein is a noisy set of points in R36. configurations (never seen in the learning) using only two-dimensional embedding. The figure shows results for three different peoples. Mauro Maggioni Analysis of High-dimensional Data Sets and Graphs Learn Mapping from Embedding to 3D Learn Nonlinear Mapping Manifold Embedding Visual input 3D pose from Embedding to visual input Learn Nonlinear Manifold Embedding (a) Learning components Manifold Embedding (view based) Visual input 3D pose Image Closed Form solution for inverse mapping Collections Error Criteria Manifold Selection View Determination 3D pose interpolation (b) pose estimation. (c) Synthesis. Video Fig. 4. (a,b) Block diagram for the learning framework and 3D pose estimation. (c) Shape synthe-sis for three different people. First, third and fifth rows: samples used in learning. Second, fourth, sixth rows: interpolated shapes at intermediate configurations (never seen in the learning) Dimension Reduction - Given a visual input (silhouette), and the learned model, we can recover the intrinsic Ryan B Harvey - Prelim Oral Exam body configuration, recover the view point, and reconstruct the input and detect any spatial or temporal outliers. In other words, we can simultaneously solve for the pose, view point, and reconstruct the input. A block diagram for recovering 3D pose and view point given learned manifold models are shown in Figure 4. The framework [20] Molecular Dynamics Set of 10, 000 picture (28 by 28 pixels) of 10 handwritten digits. Color represents the label (digit) of each point. Problem Definition 7
8. 8. METHODS: TAXONOMY • Methods considered involve convex optimizations solved via eigenvalue problems • Full-rank: PCA, Kernel PCA • Sparse: LLE, Laplacian Eigenmaps Dimension Reduction Convex Non-convex Full-Rank Sparse Linear: PCA Non-linear: k-PCA Reconstruction Weights: LLE Neighborhood Graph Laplacian: LE Methods Introduction 8 Ryan Dimension Reduction - B Harvey - Prelim Oral Exam
9. 9. METHODS: FRAMEWORK • 3 step algorithm framework: • Build the kernel matrix • Solve the appropriate eigenvalue problem associated with that kernel • Use eigenvectors to compute the embedding in the lower dimension • Some methods: • Principal Components Analysis • Kernel-based Principal Components Analysis • Laplacian Eigenmaps • Locally Linear Embedding Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Methods Introduction 9
10. 10. PRINCIPAL COMPONENTS ANALYSIS • Linear method: rotation, translation, simple scaling • Think SNR: maximize signal while minimizing noise • Rotate and translate axes so that signal variances lie on as few axes as possible • The kernel: C = 1n n j=1 xjxTj • The eigenvalue problem: λp = Cp PTΛ = CPT • The embedding: {λk}d+1 k=2, 1 ≥ λ1 ≥ · · · ≥ λD ≥ 0 Y = P{k}X Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Principal Components Analysis 10
11. 11. PRINCIPAL COMPONENTS ANALYSIS EVP −20 −10 0 10 20 −20 50 40 30 20 10 0 15 10 5 0 −5 −10 −15 Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Principal Components Analysis 11 −10 0 10 20 −20 −15 −10 −5 0 5 10 15 20
12. 12. PCA USING DOT PRODUCTS • To move from (linear) PCA to (nonlinear) Kernel-based PCA (Schölkopf, Smola Müller, 1998), we consider a formulation of PCA exclusively using dot products: λp = Cp Cp = 1n n i=1(xj · p)xj λ(xk · p) = (xk · Cp) p λ= 0 span(x1, x2, . . . ,xn) • All solutions with lie in . Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 12
13. 13. INTRODUCING NONLINEARITY IN PCA •We then introduce nonlinearity by mapping from the input space to the feature space : F Φ : RD → F x→ ˜x = Φ(x) Φ(xk) F • For now, assume the data in is centered: n k=1 Φ(xk) = 0 • Then, the covariance matrix in is: F ˜ C = 1n n k=1 Φ(xj)Φ(xj)T = 1n n k=1 xj xj T Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 13
14. 14. THE EIGENPROBLEM IN F •We now rewrite the eigenvalue problem of PCA in : λ˜p = ˜ C˜p λ(Φ(xk) · ˜p) = (Φ(xk) · ˜ C˜p), ∀k = 1, . . . ,n p˜ λ= 0 span(x˜1, x˜2, . . . , x˜n) • Again, all with lie in . • In addition, we can write the linear expansion: F ˜p = n j=1 αjΦ(xi) Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 14
15. 15. THE EIGENPROBLEM IN (CONTINUED) F • Using this expansion, we rewrite the dot product formulation of the PCA eigenproblem in : F λ n j=1 αj(˜xk · ˜xj) = 1n n i=1 αi(˜xk · ∀k = 1, . . . ,n •We define an kernel matrix by n j=1 ˜xj)(˜xj · ˜xi) n × n K Kij = (Φ(xi) · Φ(xj)) = (˜xi · ˜xj) • And rewrite the eigenproblem in matrix form: nλKα = K2α Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 15
16. 16. COMPUTING IN FEATURE SPACE • The feature space F is of arbitrarily large and possibly infinite dimension. Computing dot products directly is often not possible, and computationally impractical when it is. • Solution: the “kernel trick” (Aizerman et al, 1964). Construct a kernel function k(u, v) = (Φ(u) · Φ(v)) (Φ(u) · Φ(v)) k(u, v) • Then replace each with . This construction implicitly defines and via . Φ F k(u, v) Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 16
17. 17. SOME POSSIBLE KERNELS • Kernels must be continuous, symmetric, positive semi-definite. • Some possible kernels proposed by Schölkopf et al include: dth • Dot product in the space of all order monomials: k(u, v) = (u · v)d k(u, v) = exp • Radial basis functions: • Sigmoid functions: −||u−v||2 2σ2 k(u, v) = tanh(κ(u · v) +Θ) Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 17
18. 18. SOLVING THE EIGENPROBLEM nλKα = K2α • To solve the eigenproblem where , we solve the following: Kij = k(xi, xj) nλα = Kα • Solutions are identical to all relevant solutions to the prior, as can be seen by expanding in the eigenvector basis of . α K 0 ≤ λ1 ≤ λ2 ≤ · · · ≤ λn • Let be the complete set of eigenvalues nλ and α1,α2, . . . ,αn the corresponding eigenvectors, with the first nonzero eigenvalue. λq Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 18
19. 19. COMPUTING THE EMBEDDING •We normalize by requiring that the corresponding vectors in be normalized: λq, . . . ,λn (˜pk · ˜pk) = 1, ∀k = q, . . . , n F • This translates to a normalization condition for : j (˜xi · ˜xj) = i αk • Compute projections of a test point onto eigenvectors : j (Φ(xj) · Φ(x)) = j k(xj, x) Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 19 αq, . . . ,αn 1 = n i,j=1 αk i αk n i,j=1 αk jKij = (αk ·Kαk) = λk(αk · αk) x ˜pk (˜pk · Φ(x)) = n j=1 αk n j=1 αk
20. 20. CENTERING THE DATA •We assumed centered data in F , which is unrealistic. To center our data, we must have: Φc(xj) = ˜xcj = ˜xj − 1n • Then we rewrite everything in terms of , and thus have a new kernel , which we will express in terms of : ij = (˜xci · ˜xcj where . Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 20 n k=1 ˜xk Φc(xj) Kc K Kc ) = ˜xi − 1n n k=1 ˜xk · ˜xj − 1n n =1 ˜x = (K − 1nK − K1n + 1nK1n)ij (1n)ij = 1n , ∀i, j
21. 21. KERNEL-BASED PCA • Extend linear PCA to nonlinear space via kernel transformation • Think SNR where signal lies along a curve in space • Rotate/translate transformed axes so signal variances lie on as few axes as possible • The kernel: Kij = (˜xi · ˜xj) = k(xi, xj) Φ : RD → F x→ ˜x • The eigenvalue problem: • The embedding: j k(xi, x) Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 21 (nλ)α = Kα (˜pk · ˜x) = n j=1 αk
22. 22. KERNEL-BASED PCA FEATURE SPACE , DEFINED BY KERNEL PCA (EVP) F Φ k(u, v) B B Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Kernel-based PCA 22 C D A C A A B C B D A C B C D A Images from (3) Figure 3. Embeddings from kernel on the Swiss roll and
23. 23. LAPLACIAN EIGENMAPS •While it provides nonlinearity, kernel-based PCA requires computation dependent on number of points , rather than the often smaller dimension of each point . •We thus consider Laplacian Eigenmaps (Belkin Niyogi, 2003) which introduces sparsity in the kernel. • This method has been shown to be a special case of Kernel-based PCA by Bengio et al (2004). Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 23 n D
24. 24. INTRODUCING SPARSITY • To build a sparse kernel, we build a graph from the data which samples the assumed manifold M . The adjacency matrix is built by taking either a fixed number m of nearest neighbors to, or by selecting all points within an -ball of, a given point as the point’s nearest neighbors. •We denote the set of nearest neighbors of a point by . The adjacency matrix is then given by: Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 24 ε xj Nj A Aij = 1, if xi ∈ Nj 0, if xi /∈ Nj
25. 25. BUILDING THE KERNEL •We then introduce edge weights in the graph. The heat kernel is chosen due to its connection to the Laplace Beltrami operator on the manifold and therefore to the graph approximation of the manifold Laplacian: Wij = exp −||xi−xj ||2 t 0, otherwise t→∞,W → A • Note that as . , xi ∈ Nj or xj ∈ Ni Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 25
26. 26. CONSTRUCTING THE EIGENVALUE PROBLEM • To understand what eigenvalue problem to solve here, we must consider the optimization problem. • First, we think of mapping the graph in a simplistic sense to a line such that connected points stay as close together as possible. • This gives the objective function: Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 26 G(X,E,W) 1D y 12 i j(yi − yj)2Wij
27. 27. CONSTRUCTING THE EIGENVALUE PROBLEM • Here, we introduce the diagonal matrix , where Dii = D and the graph Laplacian matrix , and note that is symmetric, which allows us to rewrite the objective function in matrix-vector form: i + y2 j − 2yiyj)Wij iDii + 12 i + y2 j − 2yiyj)Wij W Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 27 j Wji L = D −W 12 i j(yi − yj)2Wij = 12 i j(y2 = 12 i y2 j y2 jDjj − i j(y2 = yTDy − yTWy = yT Ly
28. 28. CONSTRUCTING THE EIGENVALUE PROBLEM • So the relevant optimization problem in the case becomes: argmin yT Ly y yTDy = 1 • This problem can be solved by solving the generalized eigenvalue problem for the minimum eigenvalues. • Note that the computation on the previous slide also shows that is positive semi-definite. Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 28 1D L Ly = λDy
29. 29. CONSTRUCTING THE EIGENVALUE PROBLEM • Extending the same argument to , with and , we need to minimize the objective function: giving the minimization F FTDF = I • This problem can also be solved by solving the generalized eigenvalue problem for the minimum eigenvalues. Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 29 f ∈ Rd F ∈ Rn × Rd f (i) = [f (i) 1 , . . . , f (i) d ]T i j ||f (i) − f (j)||2Wij = tr(FTLF) argmin tr(FTLF) Lf = λDf
30. 30. THE EIGENVALUE PROBLEM AND THE EMBEDDING • Thus, we solve for the minimum nonzero eigenvalue solution of the generalized eigenvalue problem Lf = λDf •We then order the eigenvalues 0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1 and construct the embedding from the first corresponding eigenvectors (leaving out the zero eigenvector), giving the embedding Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 30 d xi → yi = (f (i) 1 , . . . , f (i) d )
31. 31. LAPLACIAN EIGENMAPS • Move away from PCA’s full-matrix computations toward a graph sampling of the manifold which allows for a sparse kernel matrix • Point-to-point metric locally applied to preserve distances on the manifold between points • The kernel: or • The eigenvalue problem: • The embedding: Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 31 Wij = exp −||xi−xj ||2 t xi ∈ Nj xj ∈ Ni 0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1 xi → yi = (f (i) 1 , . . . , f (i) ) d Lf = λDf Dii = j Wji L = D −W
32. 32. LAPLACIAN EIGENMAPS 15 10 5 0 G(X,E,W) −5 −10 NN, W EVP 15 10 5 0 −5 −10 −10 −5 0 5 10 15 0 50 100 Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Laplacian Eigenmaps 32 −15 −10 −5 0 5 10 15 0 50 100 −15 N = 5 t = 5.0 N = 10 t = 5.0 N = 15 t = 5.0 N = 5 t = 25.0 N = 10 t = 25.0 N = 15 t = 25.0 N = 5 t = ! N = 10 t = ! N = 15 t = ! Images from (1)
33. 33. AN ALTERNATIVE VIEW OF LAPLACIAN EIGENMAPS • Although theoretically sound and using sparsity, Laplacian Eigenmaps is intuitively difficult to understand. • Belkin Niyogi (2003) show that Locally Linear Embedding (Roweis Saul, 2000), which has a more intuitive geometric construction, is approximately equivalent under certain conditions. •We will develop the LLE method, then briefly sketch the argument for approximate equivalence. Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 33
34. 34. LOCALLY LINEAR EMBEDDING •We construct the graph sampling the manifold in the same way as before, by finding nearest neighbors of each point. •Weights for the matrix are selected by assuming that local neighborhoods of points are nearly linear, and solving a minimization problem with the cost function: where . i xi − Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 34 j Wijxij 2 xij ∈ Ni
35. 35. FINDING THE WEIGHTS •Weights solving this minimization can be found via a closed-form expression as follows: (1) Compute neighbor correlation matrices (and inverses): Cjk = xij · xik, xij ∈ Ni,Xik ∈ Ni (2) Compute Lagrange multiplier (sum-to-one constraint): λ = αβ P = 1− j (3) Compute reconstruction weights: jk (xi · xik + λ) • Nearly singular can be preconditioned prior to computing. Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 35 P k C−1 jk (xi·xik ) P j P k C−1 jk Wij = k C−1 Cjk
36. 36. THE EMBEDDING • To find the embedding, we minimize the same form, this time over the embedding coordinates with fixed weights: argmin subject to constraints: • Centering: i yi − • Unit covariance (to avoid degenerate solutions): Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 36 j Wijyj 2 y i yi = 0 1n i yi ⊗ yi = I
37. 37. COMPUTING THE EMBEDDING • To compute solutions to the minimization, we introduce the sparse matrix defined by Eij = δij −Wij −Wji + E = (I −W)T (I −W) •We then solve for eigenpairs of and take as the embedding the eigenvectors corresponding to the lowest eigenvalues, excluding the zero eigenvalue. • Note that is symmetric and positive semi-definite. Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 37 kWkiWkj E E E d
38. 38. LOCALLY LINEAR EMBEDDING • Another method which considers metrics used to weight a graph sampling the manifold •Weights computed by global linear optimization over a local neighborhood around each point • The kernel: jk (xi · xik + λ) • The eigenvalue problem: • The embedding: Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 38 Wij = k C−1 Ef = λf E = (I −W)T (I −W) 0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1 xi → yi = (f (i) 1 , . . . , f (i) ) d
39. 39. CONNECTION TO THE GRAPH LAPLACIAN 12 •We will show in three steps that for a function (under appropriate assumptions) Ef ≈ L2f : (1) Fix a point and show that: xi [(I −W)f]i ≈ −12 f ∈M j Wij(xi − xij )TH(xi − xij ) where H is the Hessian of f at xi . (2) Show that the expectation . (3) Put steps (1) and (2) together to achieve the final result. Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 39 E[vTHv] = rLf
40. 40. • Show: CONNECTION TO THE GRAPH LAPLACIAN (1) [(I −W)f]i ≈ −12 j Wij(xi − xij )TH(xi − xij ) • Consider a coordinate system in the tangent plane centered at vj = xij − xi o = xi o and let . This is a vector originating at . αj = Wij xi • Let . Since is in the affine span of its neighbors (and by construction of W ), we have where . Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 40 j αj = 1 o = xi = j αjvj
41. 41. CONNECTION TO THE GRAPH LAPLACIAN (1) • Assuming f is sufficiently smooth, we write the 2nd order Taylor approximation f(v) = f(o) + vT∇f + 12 where is the gradient and is the Hessian, both evaluated at . Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 41 (vTHv) + o(||v||2) o ∇f H
42. 42. CONNECTION TO THE GRAPH LAPLACIAN (1) [(I −W)f]i = f(o) − •We have , and using Taylor’s approximation for , we can write j ∇f + 12 j Hvj) • Since and , the first three terms disappear, and Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 42 j αjf(vj) f(vj) [(I −W)f]i = f(o) − j αjf(vj) ≈ f(o) − j αjf(o) − j αjvT j αj(vT j αj = 1 j αjvj = o [(I −W)f]i = f(o) − j αjf(vj) ≈ −12 j αjvT j Hvj
43. 43. CONNECTION TO THE GRAPH LAPLACIAN (2) vTHv Lf • Show: is proportional to √αjvj • If form an orthonormal basis (unusual), then j Hvj = tr(H) = Lf j WijvT • If not, we assume x to be a random vector with uniform distribution on every sphere centered at , and show proportionality. • Let be an orthonormal basis for corresponding to eigenvalues . Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 43 xi e1, . . . , en H λ1, . . . ,λn
44. 44. CONNECTION TO THE GRAPH LAPLACIAN (2) • Then using the Spectral theorem, we can write E[vTHv] = E • Since is independent of , we can replace to get i Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 44 i λiv, ei2 E v, ei2 E v, ei2 = r E[vTHv] = r ( i λi) = rtr(H) = rLf
45. 45. CONNECTION TO THE GRAPH LAPLACIAN (3) • Now, putting these together, we have [(I −W)f]i ≈ −12 • LLE minimizes which reduces to finding eigenfunctions of , which can now be interpreted as finding eigenfunctions of the iterated Laplacian . Eigenfunctions of coincide with those of . Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 45 j WijvT j Hvj E[vTHv] = rLf (I −W)T (I −W)f ≈ 12 L2f fT (I −W)T (I −W)f (I −W)T (I −W) L2 L L 2
46. 46. LOCALLY LINEAR EMBEDDING • Another method which considers metrics used to weight a graph sampling the manifold •Weights computed by global linear optimization over a local neighborhood around each point • The kernel: jk (xi · xik + λ) • The eigenvalue problem: • The embedding: Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 46 Wij = k C−1 Ef = λf E = (I −W)T (I −W) 0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1 xi → yi = (f (i) 1 , . . . , f (i) ) d
47. 47. documents of text. coordinates as observed modes of variability. Previous approaches to this problem, based on multidimensional scaling (MDS) (2), have computed embeddings that attempt to preserve pairwise distances [or generalized disparities Reconstruction errors are measured by the cost function !W# !!iX!i\$%jWij X!j2 LOCALLY LINEAR EMBEDDING G(X,E,W) between data points; these distances are measured along straight lines or, in more so-phisticated usages of MDS such as Isomap (4), NN, MIN EVP Images from (6) Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Locally Linear Embedding 47 linear reconstruc-tions, manifolds, such as modes of variability. this problem, based on (MDS) (2), have that attempt to preserve generalized disparities these distances are lines or, in more so-phisticated such as Isomap (4), these patches by linear coefficients that reconstruct each data point from its neigh-bors. Reconstruction errors are measured by the cost function !W# !!i X!i\$%jWij X!j2 (1) which adds up the squared distances between all the data points and their reconstructions. The weights Wij summarize the contribution of the jth data point to the ith reconstruction. To com-pute the weights Wij, we minimize the cost nonlinear dimensionality reduction, as illustrated (10) for three-dimensional two-dimensional manifold (A). An unsupervised learning algorithm must coordinates of the manifold without signals that explicitly indicate how (1) which adds up the squared distances between all the data points and their reconstructions. The weights Wij summarize the contribution of the jth data point to the ith reconstruction. To com-pute the weights Wij, we minimize the cost The problem of nonlinear dimensionality reduction, as illustrated (10) for three-dimensional B) sampled from a two-dimensional manifold (A). An unsupervised learning algorithm must
48. 48. COMPUTATIONAL COMPLEXITY OF METHODS Method Computational Cost Memory Usage O(D3) O(D2) O(n3) O(n2) O(ξn2) O(ξn2) O(ξn2) O(ξn2) PCA k-PCA LLE LE is the sparsity ratio of the kernel matrix: the number of non-zero elements divided by the total number of elements in the kernel matrix Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Comparison 48 ξ
49. 49. BENEFITS AND LIMITATIONS Method Benefits Limitations PCA Fast, simple Linear k-PCA Kernel choice Computation, kernel selection LE Sparse kernel, justification Nearest neighbor search LLE Sparse kernel, direct solution Nearest neighbor search Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Comparison 49
50. 50. RESEARCH IDEAS • Guiding principles • Software should be built for modular use within a framework and library • Software should be validated with real known data associated with “ground truth” • Research directions • Landmarks, out-of-sample extensions, low-rank update iterative methods • Hybridization of methods and ideas • Extension to higher order graph properties Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Research Ideas 50
51. 51. SOFTWARE LIBRARY • Comprehensive testbed software library and experimentation framework is needed to support manifold learning research • Must be modular, extensible, platform agnostic • Interpreted/scriptable languages are a good choice for experimentation: Python, MATLAB, Boo, IDL • Previous efforts: • DRToolbox (MATLAB, 2007-) by van der Maaten • scikit.learn (Python, 2009-) by Matthieu Brucher Ryan Dimension Reduction - B Harvey - Prelim Oral Exam Research Ideas 51
52. 52. 1) Belkin, M. and P. Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Comp. 15 (2003),1373-1396. 2) Schölkopf, B., A. Smola, and K. Müller, Nonlinear Component Analysis as a Kernel Eigenvalue Problem, Neural Comp. 10 (1998), 1299-1319. 3)Weinberger K. K., B. D. Packer, and L. K. Saul, Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization, Proc AI and Statistics (Dec 2005), 381-388. 4)Golub, G. H. and C. F. Van Loan, Matrix Computations, 3rd ed., Johns Hopkins University Press (Baltimore, 1996), 70-75. 5) Roweis, S. T. and L. K. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science 290 (2000), 2323-2326. 6) Shlens, J, A Tutorial on Principal Component Analysis, Version 2 (Dec 2005). 7) van der Maaten, L., E. Postma and J. van der Herik, Dimensionality Reduction: A Comparative Review, TiCC TR 2009-005 (Oct 2009). Ryan Dimension Reduction - B Harvey - Prelim Oral Exam References 52
53. 53. QUESTIONS? IDEAS? THANK YOU! Ryan Dimension Reduction - B Harvey - Prelim Oral Exam 53