Methods of Manifold Learning for Dimension Reduction of Large Data Sets

METHODS OF MANIFOLD LEARNING FOR
DIMENSION REDUCTION OF
LARGE DATA SETS
Doctoral Candidacy Preliminary Oral Exam
Ryan Bensussan Harvey
May 17, 2010
Committee:
Wojciech Czaja, Chair
Kasso Okoudjou
John Benedetto
Rama Chellappa
1

PREVIEW
• Motivation
• Problem
• Methods
• Research Ideas Image by Stefan Baudy, used under Creative Commons license
Ryan Dimension Reduction - B Harvey - Prelim Oral Exam
Preview 2

MOTIVATION
• Science and business producing massive quantities of data
• Computationally difficult to store, process, analyze, visualize
• Academic focus on compression, dimension-reduced
processing to address this problem
• Compression methods widely available, but require
decompression step to use data
• Dimension-reduced processing generally not available
Motivation 3

THE PROBLEM
•What are we trying to do?
•What is the intuition for
this problem?
• How can we formalize the
problem mathematically?
• On what kinds of data can
we solve this problem?
Image by qisur, used under Creative Commons license
Problem Definition 4

PROBLEM INTUITION
• Think flattening a 3D surface
to a 2D image
• Simple projection
• Preserving some particular
quantity of interest locally
• Preserving some global
property of the surface

THE PROBLEM (FORMALIZED)
• Inputs:
• Outputs:
X = [x1, . . . ,xn], xk ∈ RD
Y = [y1, . . . , yn], yk ∈ Rd, d D
M⊂ Rd
• Assumption: data live on some manifold embedded
in RD , and inputs X are samples taken in RD
of the underlying
manifold .
M
• Problem statement: Find a reduced representation Y of
X
which best preserves the manifold structure of the data, as
defined by some metric of interest.

An example from Molecular Dynamics, I
EXAMPLES Text OF documents
DATA SETS
1000 Science News articles, from 8 different categories. We
compute about 10000 coordinates, i-th coordinate of document
d represents frequency in document d of the i-th word in a fixed
dictionary.
Data base of about 60, 000 28 × 28 gray-scale pictures of
handwritten digits, Text
collected by USPS. Point cloud in R282 .
Goal: automatic recognition.
Mauro Maggioni Geometry of data sets in high dimensions and learning Hyperspectral
The dynamics of a small protein in a bath of water molecules is
approximated by a Langevin system of stochastic equations
x˙ = −∇U(x) + w˙ .
Handwritten Digits
10
model in the form of Equation 3, we can synthesize new shapes through the walking
cycle. In these examples only 10 samples were used to embed the manifold for half a
cycle on a unit circle in 2D and to learn the model. Silhouettes at intermediate body
configurations were synthesized (at the middle point between each two centers) using
the learned model. The learned model can successfully interpolate shapes at intermedi-ate
The set of states of the protein is a noisy set of points in R36.
configurations (never seen in the learning) using only two-dimensional embedding.
The figure shows results for three different peoples.
Mauro Maggioni Analysis of High-dimensional Data Sets and Graphs
Learn Mapping from
Embedding to 3D
Learn Nonlinear Mapping
Manifold
Embedding
Visual
input
3D pose
from Embedding
to visual input
Learn Nonlinear
Manifold Embedding
(a) Learning components
Manifold
Embedding
(view based)
Visual
input
3D pose
Image
Closed Form
solution for
inverse mapping
Collections
Error Criteria
Manifold Selection
View Determination
3D pose
interpolation
(b) pose estimation.
(c) Synthesis.
Video
Fig. 4. (a,b) Block diagram for the learning framework and 3D pose estimation. (c) Shape synthe-sis
for three different people. First, third and fifth rows: samples used in learning. Second, fourth,
sixth rows: interpolated shapes at intermediate configurations (never seen in the learning)
Dimension Reduction - Given a visual input (silhouette), and the learned model, we can recover the intrinsic
Ryan B Harvey - Prelim Oral Exam
body configuration, recover the view point, and reconstruct the input and detect any
spatial or temporal outliers. In other words, we can simultaneously solve for the pose,
view point, and reconstruct the input. A block diagram for recovering 3D pose and
view point given learned manifold models are shown in Figure 4. The framework [20]
Molecular
Dynamics
Set of 10, 000 picture (28 by 28 pixels) of 10 handwritten digits. Color represents the label (digit) of each point.

METHODS: TAXONOMY
• Methods considered involve
convex optimizations solved
via eigenvalue problems
• Full-rank: PCA, Kernel PCA
• Sparse: LLE, Laplacian
Eigenmaps
Dimension
Reduction
Convex Non-convex
Full-Rank Sparse
Linear: PCA
Non-linear: k-PCA
Reconstruction
Weights: LLE
Neighborhood
Graph Laplacian: LE Methods Introduction 8

METHODS: FRAMEWORK
• 3 step algorithm framework:
• Build the kernel matrix
• Solve the appropriate
eigenvalue problem
associated with that kernel
• Use eigenvectors to
compute the embedding
in the lower dimension
• Some methods:
• Principal Components
Analysis
• Kernel-based Principal
Components Analysis
• Laplacian Eigenmaps
• Locally Linear Embedding
Methods Introduction 9

PRINCIPAL COMPONENTS
ANALYSIS
• Linear method: rotation,
translation, simple scaling
• Think SNR: maximize signal
while minimizing noise
• Rotate and translate axes so
that signal variances lie on as
few axes as possible
• The kernel:
C = 1n
n
j=1 xjxTj
• The eigenvalue problem:
λp = Cp
PTΛ = CPT
• The embedding: {λk}d+1
k=2,
1 ≥ λ1 ≥ · · · ≥ λD ≥ 0
Y = P{k}X
Principal Components Analysis 10

PRINCIPAL COMPONENTS
ANALYSIS
EVP
−20
−10
0
10
20
−20
50
40
30
20
10
0
15
10
5
0
−5
−10
−15
Principal Components Analysis 11
−10
0
10
20
−20 −15 −10 −5 0 5 10 15 20

PCA USING DOT PRODUCTS
• To move from (linear) PCA to (nonlinear) Kernel-based PCA
(Schölkopf, Smola Müller, 1998), we consider a formulation
of PCA exclusively using dot products:
λp = Cp
Cp = 1n
n
i=1(xj · p)xj
λ(xk · p) = (xk · Cp)
p λ= 0 span(x1, x2, . . . ,xn)
• All solutions with lie in .
Kernel-based PCA 12

INTRODUCING
NONLINEARITY IN PCA
•We then introduce nonlinearity by mapping from the input
space to the feature space :
F
Φ : RD → F
x→ ˜x = Φ(x)
Φ(xk) F
• For now, assume the data in is centered:
n
k=1 Φ(xk) = 0
• Then, the covariance matrix in is:
F
˜ C = 1n
n
k=1 Φ(xj)Φ(xj)T = 1n
n
k=1 xj xj
T
Kernel-based PCA 13

THE EIGENPROBLEM IN
F
•We now rewrite the eigenvalue problem of PCA in :
λ˜p = ˜ C˜p
λ(Φ(xk) · ˜p) = (Φ(xk) · ˜ C˜p), ∀k = 1, . . . ,n
p˜ λ= 0 span(x˜1, x˜2, . . . , x˜n)
• Again, all with lie in .
• In addition, we can write the linear expansion:
F
˜p =
n
j=1 αjΦ(xi)
Kernel-based PCA 14

THE EIGENPROBLEM IN
(CONTINUED)
F
• Using this expansion, we rewrite the dot product formulation
of the PCA eigenproblem in :
F
λ
n
j=1 αj(˜xk · ˜xj) = 1n
n
i=1 αi(˜xk ·
∀k = 1, . . . ,n
•We define an kernel matrix by
n
j=1 ˜xj)(˜xj · ˜xi)
n × n K
Kij = (Φ(xi) · Φ(xj)) = (˜xi · ˜xj)
• And rewrite the eigenproblem in matrix form:
nλKα = K2α
Kernel-based PCA 15

COMPUTING IN FEATURE
SPACE
• The feature space F
is of arbitrarily large and possibly infinite
dimension. Computing dot products directly is often not
possible, and computationally impractical when it is.
• Solution: the “kernel trick” (Aizerman et al, 1964). Construct a
kernel function
k(u, v) = (Φ(u) · Φ(v))
(Φ(u) · Φ(v)) k(u, v)
• Then replace each with . This
construction implicitly defines and via .
Φ F k(u, v)
Kernel-based PCA 16

SOME POSSIBLE KERNELS
• Kernels must be continuous, symmetric, positive semi-definite.
• Some possible kernels proposed by Schölkopf et al include:
dth
• Dot product in the space of all order monomials:
k(u, v) = (u · v)d
k(u, v) = exp
• Radial basis functions:
• Sigmoid functions:

−||u−v||2
2σ2

k(u, v) = tanh(κ(u · v) +Θ)
Kernel-based PCA 17

SOLVING THE EIGENPROBLEM
nλKα = K2α
• To solve the eigenproblem where
, we solve the following:
Kij = k(xi, xj)
nλα = Kα
• Solutions are identical to all relevant solutions to the prior, as
can be seen by expanding in the eigenvector basis of .
α K
0 ≤ λ1 ≤ λ2 ≤ · · · ≤ λn
• Let be the complete set of
eigenvalues nλ and α1,α2, . . . ,αn
the corresponding
eigenvectors, with the first nonzero eigenvalue.
λq
Kernel-based PCA 18

COMPUTING THE
EMBEDDING
•We normalize by requiring that the corresponding
vectors in be normalized:
λq, . . . ,λn
(˜pk · ˜pk) = 1, ∀k = q, . . . , n
F
• This translates to a normalization condition for :
j (˜xi · ˜xj) =
i αk
• Compute projections of a test point onto eigenvectors :
j (Φ(xj) · Φ(x)) =
j k(xj, x)
Kernel-based PCA 19
αq, . . . ,αn
1 =
n
i,j=1 αk
i αk
n
i,j=1 αk
jKij
= (αk ·Kαk) = λk(αk · αk)
x ˜pk
(˜pk · Φ(x)) =
n
j=1 αk
n
j=1 αk

CENTERING THE DATA
•We assumed centered data in F
, which is unrealistic. To center
our data, we must have:
Φc(xj) = ˜xcj
= ˜xj − 1n
• Then we rewrite everything in terms of , and thus have
a new kernel , which we will express in terms of :
ij = (˜xci · ˜xcj
where .
Kernel-based PCA 20
n
k=1 ˜xk
Φc(xj)
Kc K
Kc
) =

˜xi − 1n
n
k=1 ˜xk

·

˜xj − 1n
n
=1 ˜x

= (K − 1nK − K1n + 1nK1n)ij
(1n)ij = 1n
, ∀i, j

KERNEL-BASED PCA
• Extend linear PCA to
nonlinear space via kernel
transformation
• Think SNR where signal lies
along a curve in space
• Rotate/translate transformed
axes so signal variances lie
on as few axes as possible
• The kernel:
Kij = (˜xi · ˜xj) = k(xi, xj)
Φ : RD → F
x→ ˜x
• The embedding:
j k(xi, x)
Kernel-based PCA 21
(nλ)α = Kα
(˜pk · ˜x) =
n
j=1 αk

KERNEL-BASED PCA
FEATURE SPACE
, DEFINED BY
KERNEL PCA
(EVP)
F
Φ k(u, v)
B
B
Kernel-based PCA 22
C
D A
C
A
A B C B
D
A
C
B
C
D A
Images from (3)
Figure 3. Embeddings from kernel on the Swiss roll and

LAPLACIAN EIGENMAPS
•While it provides nonlinearity, kernel-based PCA requires
computation dependent on number of points , rather than
the often smaller dimension of each point .
•We thus consider Laplacian Eigenmaps (Belkin Niyogi, 2003)
which introduces sparsity in the kernel.
• This method has been shown to be a special case of Kernel-based
PCA by Bengio et al (2004).
Laplacian Eigenmaps 23
n
D

INTRODUCING SPARSITY
• To build a sparse kernel, we build a graph from the data which
samples the assumed manifold M . The adjacency matrix is
built by taking either a fixed number m
of nearest neighbors
to, or by selecting all points within an -ball of, a given point as
the point’s nearest neighbors.
•We denote the set of nearest neighbors of a point by .
The adjacency matrix is then given by:
ε
xj Nj
A
Aij =

1, if xi ∈ Nj
0, if xi /∈ Nj

BUILDING THE KERNEL
•We then introduce edge weights in the graph. The heat kernel
is chosen due to its connection to the Laplace Beltrami
operator on the manifold and therefore to the graph
approximation of the manifold Laplacian:
Wij =

exp

−||xi−xj ||2
t
0, otherwise
t→∞,W → A
• Note that as .

, xi ∈ Nj or xj ∈ Ni

CONSTRUCTING THE
EIGENVALUE PROBLEM
• To understand what eigenvalue problem to solve here, we
must consider the optimization problem.
• First, we think of mapping the graph in a
simplistic sense to a line such that connected points stay
as close together as possible.
• This gives the objective function:
G(X,E,W)
1D y
12

i

j(yi − yj)2Wij

CONSTRUCTING THE
EIGENVALUE PROBLEM
• Here, we introduce the diagonal matrix , where
Dii =

D
and the graph Laplacian matrix , and note that
is symmetric, which allows us to rewrite the objective
function in matrix-vector form:
i + y2
j − 2yiyj)Wij
iDii + 12
i + y2
j − 2yiyj)Wij
W
j Wji
L = D −W
12

i

j(yi − yj)2Wij = 12

i

j(y2
= 12

i y2

j y2
jDjj −

i

j(y2
= yTDy − yTWy = yT Ly

CONSTRUCTING THE
EIGENVALUE PROBLEM
• So the relevant optimization problem in the case
becomes:
argmin yT Ly
y
yTDy = 1
• This problem can be solved by solving the generalized
eigenvalue problem for the minimum eigenvalues.
• Note that the computation on the previous slide also shows
that is positive semi-definite.
1D
L
Ly = λDy

CONSTRUCTING THE
EIGENVALUE PROBLEM
• Extending the same argument to , with
and , we need to minimize the objective
function:
giving the minimization
F
FTDF = I
• This problem can also be solved by solving the generalized
eigenvalue problem for the minimum eigenvalues.
f ∈ Rd F ∈ Rn × Rd
f (i) = [f (i)
1 , . . . , f (i)
d ]T

i

j ||f (i) − f (j)||2Wij = tr(FTLF)
argmin tr(FTLF)
Lf = λDf

THE EIGENVALUE PROBLEM
AND THE EMBEDDING
• Thus, we solve for the minimum nonzero eigenvalue solution
of the generalized eigenvalue problem
Lf = λDf
•We then order the eigenvalues
0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1
and construct the embedding from the first corresponding
eigenvectors (leaving out the zero eigenvector), giving the
embedding
d
xi → yi = (f (i)
1 , . . . , f (i)
d )

LAPLACIAN EIGENMAPS
• Move away from PCA’s full-matrix
computations toward
a graph sampling of the
manifold which allows for a
sparse kernel matrix
• Point-to-point metric locally
applied to preserve
distances on the manifold
between points
• The kernel:
or
• The embedding:
Wij = exp

−||xi−xj ||2
t

xi ∈ Nj xj ∈ Ni
0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1
xi → yi = (f (i)
1 , . . . , f (i)
)
d Lf =
λDf
Dii =
j Wji
L = D −W

LAPLACIAN EIGENMAPS
15
10
5
0
G(X,E,W)
−5
−10
NN, W EVP
15
10
5
0
−5
−10
−10 −5 0 5 10 15
0
50
100
−15
−10 −5 0 5 10 15
0
50
100
−15
N = 5 t = 5.0 N = 10 t = 5.0 N = 15 t = 5.0
N = 5 t = 25.0 N = 10 t = 25.0 N = 15 t = 25.0
N = 5 t = ! N = 10 t = ! N = 15 t = !
Images from (1)

AN ALTERNATIVE VIEW OF
LAPLACIAN EIGENMAPS
• Although theoretically sound and using sparsity, Laplacian
Eigenmaps is intuitively difficult to understand.
• Belkin Niyogi (2003) show that Locally Linear Embedding
(Roweis Saul, 2000), which has a more intuitive geometric
construction, is approximately equivalent under certain
conditions.
•We will develop the LLE method, then briefly sketch the
argument for approximate equivalence.
Locally Linear Embedding 33

LOCALLY LINEAR EMBEDDING
•We construct the graph sampling the manifold in the same
way as before, by finding nearest neighbors of each point.
•Weights for the matrix are selected by assuming that local
neighborhoods of points are nearly linear, and solving a
minimization problem with the cost function:
where .

i

xi −

j Wijxij

2
xij ∈ Ni

FINDING THE WEIGHTS
•Weights solving this minimization can be found via a closed-form
expression as follows:
(1) Compute neighbor correlation matrices (and inverses):
Cjk = xij · xik, xij ∈ Ni,Xik ∈ Ni
(2) Compute Lagrange multiplier (sum-to-one constraint):
λ = αβ
P
= 1−
j
(3) Compute reconstruction weights:
jk (xi · xik + λ)
• Nearly singular can be preconditioned prior to computing.
P
k C−1
jk (xi·xik )
P
j
P
k C−1
jk
Wij =

k C−1
Cjk

THE EMBEDDING
• To find the embedding, we minimize the same form, this time
over the embedding coordinates with fixed weights:
argmin
subject to constraints:
• Centering:

i

yi −
• Unit covariance (to avoid degenerate solutions):

j Wijyj

2
y

i yi = 0
1n

i yi ⊗ yi = I

COMPUTING THE
EMBEDDING
• To compute solutions to the minimization, we introduce the
sparse matrix defined by
Eij = δij −Wij −Wji +
E = (I −W)T (I −W)
•We then solve for eigenpairs of and take as the embedding
the eigenvectors corresponding to the lowest eigenvalues,
excluding the zero eigenvalue.
• Note that is symmetric and positive semi-definite.

kWkiWkj
E
E
E
d

• Another method which
considers metrics used to
weight a graph sampling the
manifold
•Weights computed by global
linear optimization over a
local neighborhood around
each point
• The kernel:
jk (xi · xik + λ)
• The embedding:
Wij =

k C−1
Ef = λf
E = (I −W)T (I −W)
0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1
xi → yi = (f (i)
1 , . . . , f (i)
)
d

CONNECTION TO THE
GRAPH LAPLACIAN
12
•We will show in three steps that for a function (under
appropriate assumptions) Ef ≈ L2f
:
(1) Fix a point and show that:
xi
[(I −W)f]i ≈ −12

f ∈M
j Wij(xi − xij )TH(xi − xij )
where H is the Hessian of f at xi
.
(2) Show that the expectation .
(3) Put steps (1) and (2) together to achieve the final result.
E[vTHv] = rLf

• Show:
CONNECTION TO THE
GRAPH LAPLACIAN (1)
[(I −W)f]i ≈ −12

j Wij(xi − xij )TH(xi − xij )
• Consider a coordinate system in the tangent plane centered at
vj = xij − xi
o = xi o
and let . This is a vector originating at .
αj = Wij xi
• Let . Since is in the affine span of its neighbors
(and by construction of W
), we have
where .

j αj = 1
o = xi =

j αjvj

CONNECTION TO THE
GRAPH LAPLACIAN (1)
• Assuming f
is sufficiently smooth, we write the 2nd order
Taylor approximation
f(v) = f(o) + vT∇f + 12
where is the gradient and is the Hessian, both evaluated
at .
(vTHv) + o(||v||2)
o ∇f H

CONNECTION TO THE
GRAPH LAPLACIAN (1)
[(I −W)f]i = f(o) −
•We have , and using Taylor’s
approximation for , we can write
j ∇f + 12
j Hvj)
• Since and , the first three terms
disappear, and

j αjf(vj)
f(vj)
[(I −W)f]i = f(o) −

j αjf(vj)
≈ f(o) −

j αjf(o) −

j αjvT

j αj(vT

j αj = 1

j αjvj = o
[(I −W)f]i = f(o) −

j αjf(vj) ≈ −12

j αjvT
j Hvj

CONNECTION TO THE
GRAPH LAPLACIAN (2)
vTHv Lf
• Show: is proportional to
√αjvj
• If form an orthonormal basis (unusual), then
j Hvj = tr(H) = Lf
j WijvT
• If not, we assume x
to be a random vector with uniform
distribution on every sphere centered at , and show
proportionality.
• Let be an orthonormal basis for corresponding
to eigenvalues .
xi
e1, . . . , en H
λ1, . . . ,λn

CONNECTION TO THE
GRAPH LAPLACIAN (2)
• Then using the Spectral theorem, we can write
E[vTHv] = E

• Since is independent of , we can replace
to get
i

i λiv, ei2

E

v, ei2

E

v, ei2

= r
E[vTHv] = r (
i λi) = rtr(H) = rLf

CONNECTION TO THE
GRAPH LAPLACIAN (3)
• Now, putting these together, we have
[(I −W)f]i ≈ −12
• LLE minimizes which reduces to
finding eigenfunctions of , which can now
be interpreted as finding eigenfunctions of the iterated
Laplacian . Eigenfunctions of coincide with those of .

j WijvT
j Hvj
E[vTHv] = rLf
(I −W)T (I −W)f ≈ 12
L2f
fT (I −W)T (I −W)f
(I −W)T (I −W)
L2 L L 2

• Another method which
considers metrics used to
weight a graph sampling the
manifold
•Weights computed by global
linear optimization over a
local neighborhood around
each point
• The kernel:
jk (xi · xik + λ)
• The embedding:
Wij =

k C−1
Ef = λf
E = (I −W)T (I −W)
0 = λ0 ≤ λ1 ≤ · · · ≤ λn−1
xi → yi = (f (i)
1 , . . . , f (i)
)
d

documents of text.
coordinates as observed modes of variability.
Previous approaches to this problem, based on
multidimensional scaling (MDS) (2), have
computed embeddings that attempt to preserve
pairwise distances [or generalized disparities
Reconstruction errors are measured
by the cost function
!W# !!iX!i$%jWij X!j2
G(X,E,W)
between data points; these distances are
measured along straight lines or, in more so-phisticated
usages of MDS such as Isomap (4),
NN, MIN EVP
Images from (6)
linear reconstruc-tions,
manifolds, such as
modes of variability.
this problem, based on
(MDS) (2), have
that attempt to preserve
generalized disparities
these distances are
lines or, in more so-phisticated
such as Isomap (4),
these patches by linear coefficients that
reconstruct each data point from its neigh-bors.
Reconstruction errors are measured
by the cost function
!W# !!i
X!i$%jWij X!j2
(1)
which adds up the squared distances between
all the data points and their reconstructions. The
weights Wij summarize the contribution of the
jth data point to the ith reconstruction. To com-pute
the weights Wij, we minimize the cost
nonlinear dimensionality reduction, as illustrated (10) for three-dimensional
two-dimensional manifold (A). An unsupervised learning algorithm must
coordinates of the manifold without signals that explicitly indicate how
(1)
which adds up the squared distances between
all the data points and their reconstructions. The
weights Wij summarize the contribution of the
jth data point to the ith reconstruction. To com-pute
the weights Wij, we minimize the cost
The problem of nonlinear dimensionality reduction, as illustrated (10) for three-dimensional
B) sampled from a two-dimensional manifold (A). An unsupervised learning algorithm must

COMPUTATIONAL
COMPLEXITY OF METHODS
Method Computational
Cost
Memory
Usage
O(D3) O(D2)
O(n3) O(n2)
O(ξn2) O(ξn2)
O(ξn2) O(ξn2)
PCA
k-PCA
LLE
LE
is the sparsity ratio of the kernel matrix: the number of non-zero
elements divided by the total number of elements in the kernel matrix
Comparison 48
ξ

BENEFITS AND LIMITATIONS
Method Benefits Limitations
PCA Fast, simple Linear
k-PCA Kernel choice
Computation,
kernel selection
LE
Sparse kernel,
justification
Nearest neighbor
search
LLE
Sparse kernel,
direct solution
Nearest neighbor
search
Comparison 49

RESEARCH IDEAS
• Guiding principles
• Software should be built
for modular use within a
framework and library
• Software should be
validated with real known
data associated with
“ground truth”
• Research directions
• Landmarks, out-of-sample
extensions, low-rank
update iterative methods
• Hybridization of methods
and ideas
• Extension to higher order
graph properties
Research Ideas 50

SOFTWARE LIBRARY
• Comprehensive testbed software library and experimentation
framework is needed to support manifold learning research
• Must be modular, extensible, platform agnostic
• Interpreted/scriptable languages are a good choice for
experimentation: Python, MATLAB, Boo, IDL
• Previous efforts:
• DRToolbox (MATLAB, 2007-) by van der Maaten
• scikit.learn (Python, 2009-) by Matthieu Brucher
Research Ideas 51

1) Belkin, M. and P. Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation,
Neural Comp. 15 (2003),1373-1396.
2) Schölkopf, B., A. Smola, and K. Müller, Nonlinear Component Analysis as a Kernel Eigenvalue Problem,
Neural Comp. 10 (1998), 1299-1319.
3)Weinberger K. K., B. D. Packer, and L. K. Saul, Nonlinear Dimensionality Reduction by Semidefinite
Programming and Kernel Matrix Factorization, Proc AI and Statistics (Dec 2005), 381-388.
4)Golub, G. H. and C. F. Van Loan, Matrix Computations, 3rd ed., Johns Hopkins University Press
(Baltimore, 1996), 70-75.
5) Roweis, S. T. and L. K. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science
290 (2000), 2323-2326.
6) Shlens, J, A Tutorial on Principal Component Analysis, Version 2 (Dec 2005).
7) van der Maaten, L., E. Postma and J. van der Herik, Dimensionality Reduction: A Comparative Review,
TiCC TR 2009-005 (Oct 2009).
References 52

QUESTIONS? IDEAS?
THANK YOU!
53

Methods of Manifold Learning for Dimension Reduction of Large Data Sets

More Related Content

What's hot

Viewers also liked

Similar to Methods of Manifold Learning for Dimension Reduction of Large Data Sets

Recently uploaded

Methods of Manifold Learning for Dimension Reduction of Large Data Sets