Manifold learning

A Brief Introduction to Manifold Learning
Wei Yang
platero.yang@gmail.com
2016/8/11 1
Some slides are from Geometric Methods and Manifold Learning in Machine Learning (Mikhail Belkin and
Partha Niyoqi). Summer School (MLSS), Chicago 2009

What is a manifold?
2016/8/11 2
https://en.wikipedia.org/wiki/Manifold

Manifolds in visual perception
Consider a simple example of image variability, the set 𝑀 of all
facial images generated by varying the orientation of a face
• 1D manifold:
– a single degree of freedom: the angle of rotation
• The dimensionality of 𝑀 would increase if we allow
– image scaling
– illumination changing
– …
2016/8/11 3
The Manifold Ways of Perception
•H. Sebastian Seung and
•Daniel D. Lee
Science 22 December 2000

Why Manifolds?
• Euclidean distance in the high dimensional input space may
not accurately reflect the intrinsic similarity
– Euclidean distance
– Geodesic distance
2016/8/11 4
Linear Manifold VS. Nonlinear
Manifold

Differential Geometry
• Embedded manifolds
• Tangent space
• Geodesic
• Laplace-Beltrami operator
2016/8/11 5

Embedded manifolds
• ℳ 𝑘
⊂ ℝ 𝑁
• Locally (not globally) looks like Euclidean space
𝑆 2
⊂ ℝ3
2016/8/11 6

Example: circle
• 𝑥2
+ 𝑦2
= 1
• Charts: continuous, invertible
𝜙 𝑡𝑜𝑝: (𝑥, 𝑦) ⟼ 𝑥
𝜙 𝑡𝑜𝑝
−1
: (𝑥) ⟼ (𝑥, 1 − 𝑥2)
• Atlas: charts covered the whole
circle
• Transition map
𝑇 𝑎 = 𝜙 𝑟𝑖𝑔ℎ𝑡 𝜙 𝑡𝑜𝑝
−1
𝑎
2016/8/11 7
𝑥
𝑦
= 𝜙 𝑟𝑖𝑔ℎ𝑡(𝑎, 1 − 𝑎2)
= 1 − 𝑎2
𝑎
𝜙 𝑡𝑜𝑝
−1
𝑎
𝜙 𝑟𝑖𝑔ℎ𝑡 𝜙 𝑡𝑜𝑝
−1
𝑎
http://en.wikipedia.org/wiki/Manifold

Tangent space
• 𝑘-dimensional affine subspace of ℝ 𝑁
𝑇𝑝ℳ 𝑘
⊂ ℝ 𝑁
𝑝
2016/8/11 8

Tangent vectors and curves
• Tangent vectors <———> curves.
Geometric Methods and Manifold Learning – p. 1
𝜙 𝑡 : ℝ → ℳ 𝑘
𝑑 𝜙 𝑡
𝑑 𝑡
|0 = 𝑉
𝑣
𝜙 𝑡
𝑝
2016/8/11 9

Tangent vectors as derivatives
• Tangent vectors <———> Directional derivatives
𝜙 𝑡 : ℝ → ℳ 𝑘
𝑓 𝜙 𝑡 : ℝ → ℝ
𝑑𝑓
𝑑𝑣
=
𝑑 𝜙 𝑡
𝑑 𝑡
|0
𝑣
𝜙 𝑡
𝑓: ℳ 𝑘
→ ℝ𝑝
2016/8/11 10

Riemannian geometry
• Norms and angles in tangent space
< 𝑣, 𝑤 >
𝑣 , 𝑤
𝑣
𝑤𝑝
2016/8/11 11

Length of curves and geodesics
• Can measure length using norm in tangent space.
• Geodesic — shortest curve between two points.
𝜙 𝑡 : [0,1] → ℳ 𝑘
𝑙 𝜙 =
0
1
𝑑𝜙
𝑑𝑡
𝑑𝑡
𝑝
2016/8/11 12

Gradient
• Tangent vectors <———> Directional derivatives
• Gradient points in the direction of maximum change.
< 𝛻𝑓, 𝑣 >≡
𝑑𝑓
𝑑𝑣
𝑣
𝜙 𝑡
𝑓: ℳ 𝑘
→ ℝ𝑝
2016/8/11 13

Exponential map
• Geodesic: 𝜙 𝑡
• 𝜙 0 = 𝑝, 𝜙 𝑣 = 𝑞,
𝑑𝜙 𝑡
𝑑𝑡
|0 = 𝑣
exp 𝑝: 𝑇𝑝ℳ 𝑘
→ ℳ 𝑘
exp 𝑝 𝑣 = 𝑟
exp 𝑝 𝑤 = 𝑞𝑞
𝑝
𝑟
𝑣𝑤
𝜙 𝑡
2016/8/11 14

Laplace-Beltrami operator
• Orthonormal coordinate system.
𝑝 𝑥2
𝑥1
𝑓: ℳ 𝑘
→ ℝ
exp 𝑝: 𝑇𝑝ℳ 𝑘
→ ℳ 𝑘
2016/8/11 15

Linear Manifold Learning
• Principal Components Analysis
• Multidimensional Scaling
2016/8/11 16

Principal Components Analysis
• Given x1, x2… , x 𝑛 ∈ ℝ 𝐷 with mean 0
• Find 𝑦1, 𝑦2 … , 𝑦𝑛 ∈ ℝ such that
𝑦𝑖 = w ∙ x 𝑖
• And
argmax
w
𝑣𝑎𝑟({𝑦𝑖}) =
𝑖
𝑦𝑖
2 = w 𝑇
𝑖
x𝑖x𝑖
𝑇 w
• w∗ is leading eigenvectors of 𝑖 x𝑖x𝑖
𝑇
2016/8/11 17

Multidimensional Scaling
• MDS: exploring similarities or dissimilarities in data.
• Given 𝑁 data points with distance function is defined as:
𝛿𝑖,𝑗
• The dissimilarity matrix can be defined as:
Δ ≔
𝛿1,1 … 𝛿1,𝑁
⋮ ⋱ ⋮
𝛿 𝑁,1 … 𝛿 𝑁,𝑁
Find x1, x2… , x 𝑁 ∈ ℝ 𝐷 such that
min
x1,... ,x 𝑁
𝑖<𝑗
( x 𝑖 − x𝑗 − 𝛿𝑖,𝑗)2
2016/8/11 18

Nonlinear Manifold Learning
• ISOMAP (Tenenbaum, et al, 00)
• LLE (Roweis, Saul, 00)
• Laplacian Eigenmaps (Belkin, Niyogi, 01)
2016/8/11 19

Algorithmic framework
• Neighborhood graph common to all methods.
2016/8/11 20

Isomap: Motivation
• PCA/MDS see just the Euclidean structure
• Only geodesic distances reflect the true low-dimensional
geometry of the manifold
• The question:
– How to approximate geodesic distances?
2016/8/11 21

Isomap
1. Construct neighborhood graph 𝒅 𝒙(𝒊, 𝒋) using Euclidean
distance
– 𝜖-Isomap: neighbors within a radius 𝜖
– 𝐾-Isomap: 𝐾 nearest neighbors
2. Compute shortest path as the approximation of geodesic
distance
1. 𝒅 𝑮 𝒊, 𝒋 = 𝒅 𝒙(𝒊, 𝒋)
2. For 𝒌 = 𝟏, 𝟐, … , 𝑵, replace all 𝒅 𝑮 𝒊, 𝒋 by 𝐦𝐢𝐧 𝒅 𝑮 𝒊, 𝒋 , 𝒅 𝑮 𝒊, 𝒌 + 𝒅 𝑮 𝒌, 𝒋
3. Construct 𝑑-dimensional embedding using MDS
2016/8/11 22

Isomap: results
2016/8/11 23
Face varying in pose and illumination Hand varying in finger extension & wrist
rotation

Isomap: estimate the intrinsic dimensionality
2016/8/11 24

Locally Linear Embedding
• Intuition: each data point and its neighbors are expected to lie
on or close to a locally linear patch of the manifold.
2016/8/11 25

Locally Linear Embedding
1. Assign neighbours to
each data point (𝑘-NN)
2. Reconstruct each
point by a weighted
linear combination of
its neighbors.
3. Map each point to
embedded
coordinates.
2016/8/11 26
S T Roweis, and L K Saul Science 2000;290:2323-2326

Steps of locally linear embedding
• Suppose we have 𝑁 data points 𝑋𝑖 in a 𝐷 dimensional space.
• Step 1: Construct neighborhood graph
– 𝑘-NN neighborhood
– Euclidean distance or normalized dot products
2016/8/11 27

• Step 2: Compute the weights 𝑊𝑖𝑗 that best linearly
reconstruct 𝑋𝑖 from its neighbors by minimizing
where
2016/8/11 28

• Step 3: Compute the low-dimensional embedding best
reconstructed by 𝑊𝑖𝑗 by minimizing
• Note: 𝑊 is a sparse matrix, and 𝑖-th row is barycentric
coordinates (center of mass) of 𝑋𝑖 in the basis of its nearest
neighbors.
• Similar to PCA, using lowest eigenvectors of 𝐼 − 𝑊 𝑇(𝐼 − 𝑊)
to embed.
2016/8/11 29

LLE (Comments by Ruimao Zhang)
算法优点
• LLE算法可以学习任意维的局部线性
的低维流形.
• LLE算法中的待定参数很少, K 和 d.
• LLE算法中每个点的近邻权值在平
移, 旋转,伸缩变换下是保持不变的.
• LLE算法有解析的整体最优解,不需
迭代.
• LLE算法归结为稀疏矩阵特征值计
算, 计算复杂度相对较小, 容易执行.
算法缺点
• LLE算法要求所学习的流形只能是不
闭合的且在局部是线性的.
• LLE算法要求样本在流形上是稠密采
样的.
• LLE算法中的参数 K, d 有过多的选择.
• LLE算法对样本中的噪音很敏感.
2016/8/11 30

Laplacian Eigenmaps
• Using the notion of the Laplacian of a graph to compute a
low-dimensional representation of the data
– The laplacian of a graph is analogous to the Laplace Beltrami operator
on manifolds, of which the eigenfunctions have properties desirable
for embedding (See M. Belkin and P. Niyogi for justification).
2016/8/11 31

Laplacian matrix (discrete Laplacian)
• Laplacian matrix is a matrix representation of a graph
𝐿 = 𝐷 − 𝐴
– 𝐿 is the Laplacian matrix
– 𝐷 is the degree matrix
– 𝐴 is the adjacent matrix
2016/8/11 32

Laplacian Eigenmaps
2016/8/11 mlss09us_niyogi_belkin_gmml 33

Laplacian Eigenmaps

Laplacian Eigenmaps
𝑫: Degree matrix
𝑾 : (Weighted) adjacent matrix
𝑳 : Laplacian matrix
𝐷−1
𝐿𝑓 = 𝜆𝑓

Justification of optimal embedding
• We have constructed a weighted graph 𝐺 = 𝑉, 𝐸
• We want to map 𝐺 to a line 𝒚 so that connected points stay
as close together as possible
𝒚 = 𝑦1, 𝑦2, … , 𝑦𝑛
𝑻
• This can be done by minimizing the objective function
𝑖𝑗
𝑦𝑖 − 𝑦𝑗
2
𝑊𝑖𝑗
• It incurs a heavy penalty if neighboring points are mapped far
apart.
2016/8/11 36

Justification of optimal embedding (Cont.)
𝑖𝑗
𝑦𝑖 − 𝑦𝑗
2
𝑊𝑖𝑗
=
𝑖𝑗
𝑦𝑖
2 + 𝑦𝑗
2 − 2𝑦𝑖 𝑦𝑗 𝑊𝑖𝑗
= (
𝑖
𝑦𝑖
2
𝐷𝑖𝑖 +
𝑗
𝑦𝑗
2
𝐷𝑗𝑗) − 2
𝑖,𝑗
𝑦𝑖 𝑦𝑗 𝑊𝑖𝑗
= 2𝒚 𝑇 𝐷𝒚 − 2𝒚 𝑇 𝑊𝒚
= 2𝒚 𝑇(𝐷 − 𝑊)𝒚
= 2𝒚 𝑇 𝐿𝒚
2016/8/11 37
1
2 𝑖𝑗 𝑦𝑖 − 𝑦𝑗
2
𝑊𝑖𝑗 = 𝒚 𝑇 𝐿𝒚

Justification of optimal embedding (Cont.)
The minimization problem reduces to finding
Note the constraint removes an arbitrary scaling factor in the
embedding.
Using Lagrange multiplier and setting the derivative with respect
to 𝒚 equal to zero, we obtain
The optimum is given by the minimum eigenvalue solution to the
generalized eigenvalue problem (trivial solution: 𝒚 = 𝟏, 𝜆 = 0).
2016/8/11 38

More methods for non-linear manifold learning
2016/8/11 39

Applications
• Super-resolution
• Laplacianfaces
2016/8/11 40

Super-Resolution Through Neighbor Embedding
• Intuition: small patches in the low- and high-resolution images
form manifolds with similar local geometry in two distinct
spaces.
• X: low-resolution image Y: target high-resolution image
• The algorithm is extremely analogous to LLE!
– Step 1: construct neighborhood of each patch in X
– Step 2: compute the reconstructing weights of the neighbors that
minimize the reconstruction error
– Step 3: perform high-dimensional embedding to (as opposed to the
low-dimensional embedding of LLE)
– Step 4: Construct the target high-resolution image Y by enforcing local
compatibility and smoothness constraints between adjacent patches
obtained in step 3.
2016/8/11 41
Chang, Hong, Dit-Yan Yeung, and Yimin Xiong. "Super-resolution
through neighbor embedding." CVPR 2004.

• Training parameters
– The number of nearest neighbors K
– The patch size
– The degree of overlap
2016/8/11 42

2016/8/11 43

Laplacianfaces
• Mapping face images in the image space via Locality
Preserving Projections (LPP) to low-dimensional face
subspace (manifold), called Laplacianfaces.
• LLP is analogous to Laplacian Eigenmaps except the objective
function
– Laplacian Eigenmaps:
– LLP:
2016/8/11 44
He, Xiaofei, et al. "Face recognition using laplacianfaces." Pattern Analysis
and Machine Intelligence, IEEE Transactions on 27.3 (2005): 328-340.

Laplacianfaces
Learning Laplacianfaces for Representation
1. PCA projection (kept 98 percent information in the sense of
reconstruction error)
2. Constructing the nearest-neighbor graph
3. Choosing the weights
4. Optimize
The k lowest eigenvectors of
are choosing to form
is the so-called Laplacianfaces.
2016/8/11 45

Laplacianfaces
Two-dimensional linear embedding of face images by Laplacianfaces.
2016/8/11 46

Reference and Resources
• 浅谈流形学习. Pluskid. 2010-05-29. http://blog.pluskid.org/?p=533&cpage=1
• Wikipedia: MDS, Manifold, Laplacian Matrix
• PCA: M. Bishop, PRML
• Eigenvalue decomposition and SVD:机器学习中的数学(5)-强大的矩阵奇异值分解(SVD)及其应用
• General Eigenvalue Problem: wolfram, tutorial
• Video lecture: Geometric Methods and Manifold Learning. Mikhail Belkin, Partha Niyogi (author of
Laplacian eigenmap)
• MANI fold Learning Matlab Demo: http://www.math.ucla.edu/~wittman/mani/index.html
2016/8/11 47

Manifold learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Manifold learning

Similar to Manifold learning (20)

More from Wei Yang

More from Wei Yang (6)

Recently uploaded

Recently uploaded (20)

Manifold learning

Editor's Notes