High Dimensional Data Visualization using t-SNE

Visualizing Data using t-SNE
Laurens van der Maaten and Georey Hinton, JMLR 2008
Kevin Zhao
kevinzhaio@gmail.com
October 30, 2014
Laurens van der Maaten and Georey Hinton, JMLR 2008 (MCLta-bS)NE October 30, 2014 1 / 33

Overview
1 Overview
2 t-Distributed Stochastic Neighbor Embedding
3 Experiment Setup and Results
4 Code and Web Resources

Introduction
Overview
We are given a collection of N high-dimensional objects x1; :::xN
How can we get a feel for how these objects are arranged in the data
space?

Introduction
Principal Components Analysis

Introduction
Swiss Roll
PCA is mainly concerned dimensionality, with preserving when large
pairwise distances in the map

t-Distributed Stochastic Neighbor Embedding
Introduction
Distance Perservation
Neighbor Perservation

Introduction

Introduction
Preserve the neighborhood

Introduction
Measure pairwise similarities between high-dimensional and
low-dimensonal objects
pj ji =
exp(jjxi xj jj2=22
i ) P
k6=i exp(jjxi xk jj2=22
i )

Stochastic Neighbor Embedding
Converting the high-dimensional Euclidean distances into conditional
probabilities that represent similarities
Similarity of datapoints in High Dimension
pj ji =
exp(jjxi xj jj2=22
i ) P
k6=i exp(jjxi xk jj2=22
i )
Similarity of datapoints in Low Dimension
qj ji =
exp(jjyi yj jj2 P )
k6=i exp(jjyi yk jj2)
Cost function
C =
X
i
KL(Pi jjQi ) =
X
i
X
j
pj ji log
pj ji
qj ji
Minimize the cost function using gradient descent

Stochastic Neighbor Embedding
Gradient has a surprisingly simple form
@C
@yi
=
X
j6=i
(pj ji qj ji + pi jj qi jj )(yi yj )
The gradient update with momentum term is given by
Y (t) = Y (t1) +
@C
@yi
+

(t)(Y (t1) Y (t2))

Symmetric SNE
Minimize the sum of the KL divergences between the conditional
probabilities
C =
X
i
KL(Pi jjQi ) =
X
i
X
j
pj ji log
pj ji
qj ji
Minimize a single KL divergence between a joint probability
distribution
C = KL(PjjQ) =
X
i
X
j6=i
pij log
pij
qij
The obvious way to rede

ne the pairwise similarities is
pij =
exp(jjxi xj jj2=22 P )
k6=l exp(jjxl xk jj2=22)
qij =
P exp(jjyi yj jj2)
k6=l exp(jjyl yk jj2)

Symmetric SNE
Such that pij = pji ; qij = qji , the main advantage is simpli

ng the gradient
@C
@yi
= 2
X
j
(pij qij )(yi yj )
However, in practice we symmetrize (or average) the conditionals
pij =
pj ji + pi jj
2N
Set the bandwidth i such that the conditional has a

xed perplexity
(eective number of neighbors) Perp(Pi ) = 2H(Pi ), typical value is about 5
to 50

t-Distribution
Use heavier tail distribution than Gaussian in low-dim space, we choose
qij / (1 + jjyi yj jj2)1
Then the gradient could be
@C
@yi
= 4
X
j6=i
(pij qij )(1 + jjyi yj jj2)1(yi yj )

Similarity of datapoints in High Dimension
pij =
exp(jjxi xj jj2=22 P )
k6=l exp(jjxl xk jj2=22)
Similarity of datapoints in Low Dimension
qij =
(1 + jjyi yj jj2)1
P
k6=l (1 + jjyk yl jj2)1

Cost function
C = KL(PjjQ) =
X
i
X
j
pij log
pij
qij
Large pij modeled by small qij : Large penalty
Small pij modeled by large qij : Small penalty
t-SNE mainly preserves local similarity structure of the data
Gradient
@C
@yi
= 4
X
j6=i

Gradient Interpretation
Pairwise Euclidean distance between two points in the high-dim and in
low-dim data representation
Figure : Gradient of SNE and t-SNE

We can interpret the t-SNE gradient as a simulation of an N-body system
@C
@yi
= 4
X
j6=i

Displacement
(yi yj )

Exertion / Compression
(pij qij )(1 + jjyi yj jj2)1

N-Body, summation
@C
@yi
= 4
X
j6=i
Reduce Complexity from O(N2) to O(N log N) via Barnes Hut
(tree-based) algorithm

Experiment Setup and Results
Experiment Results
MNIST
Randomly selected 6,000 images
28 28 = 784 pixels
Olivetti faces
400 images (10 per individual)
92 112 = 10; 304 pixels
COIL-20
20 dierent objects and 72 equally spaced orientations, yielding a
total of 1,440 images
32 32 = 1024 pixels
Start by using PCA to reduce the dimensionality of the data to 30

Experiment Results

MNIST t-SNE

MNIST Sammon

MNIST Isomap

MNIST LLE

Olivetti faces

High Dimensional Data Visualization using t-SNE

More Related Content

What's hot

Similar to High Dimensional Data Visualization using t-SNE

More from Kai-Wen Zhao

Recently uploaded

High Dimensional Data Visualization using t-SNE