Mattar_PhD_Thesis

Unsupervised Joint Alignment,
Clustering, and Feature Learning
Marwan A. Mattar
!
PhD Thesis Defense
December 12, 2013

Unsupervised Joint Alignment,
Clustering, and Feature Learning

Research Goal
Develop an unsupervised data set-agnostic processing module
that includes alignment, clustering, and feature learning

Joint Alignment
Transform instances to make data set more coherent

Joint Alignment: Domains
• Computer vision: remove spatial variability in images

• Speech: remove temporal variability in speech signals

• Psychology: recover stimulus onset in ERP signals

• Radiology: remove bias in MRI scans

• ...

Joint Alignment != Pairwise Alignment
• Pairwise alignment involves two instances

• Joint alignment can be achieved through (N - 1) pairwise alignments

• Can be ineffective (local view & optimization)

Joint Alignment: Unsupervised
• Supervision: ﬁducial points or examples of transforms

• Can be expensive to obtain

• We focus on unsupervised joint alignment

Joint Alignment
• Input:

• Data set,

• Transformation function,

• Output:

• Transformed data set, s.t. is maximized

• Transformation parameters,
{x1, x2, . . . , xN }
{y1, y2, . . . , yN }
yi = ⌧(xi, ⇢i)
{⇢1, ⇢2, . . . , ⇢N }
S(y)

Joint Alignment: Previous Work
• Procrustes analysis (1939 & 1970’s)

• Congealing framework (2000)

• Domain-speciﬁc algorithms (e.g. curves or images)

• Data-set speciﬁc algorithms (e.g. ERP)

Previous Work: Congealing
(Miller et al. 2000)
• General framework, widely applicable

• Utilizes entropy-based objective function

• Iterative conditional maximization optimization

• Expectation regularization step: E[⇢] = 0

Research Goals
• Develop a framework for joint alignment

• Handle complexities due to multi-modalities and representation
!
• Release open-source implementations of all models
clustering feature learning

Research Goals
• Develop a
• Handle complexities due to
!
• Release
clustering deep learning
Develop an unsupervised data set-agnostic processing module
that includes alignment, clustering, and feature learning

Today ....
alignment

Future ....
alignment

Curve Alignment: Objectives
• Adapt congealing framework to 1-dimensional curves

• Nonparametric: entropy-based objective function

• Efﬁcient: global transformation parametrization

• Experiment with effective parameterizations

• Evaluate effect of alignment on classiﬁcation

Curve Alignment: Objective Function
• Input: curves of length

• Sum of location-wise entropy (or variance)

!
!
• Marginal entropy computed usingVasicek estimator
{yn(t)}N
n=1 ⇠ Y tS(y1, y2, . . . , yN ) =
TX
t=0
H(Y t
) H(Y)
ˆH(x1, x2, . . . , xN ) /
X
log(x(mi+m+1)
x(mi+1)
)
N T

Curve Alignment: Optimization
• Conditional maximization with expectation regularization step

• Step sizes are sampled:

• Dampen parameters:
p ⇠ U(0, p)

Curve Alignment:Transformation
y(t) = ↵g(t0
)x(t0
) + t0
= h(t)
linear amplitude

scaling
non-linear time

warping
non-linear amplitude

scaling
amplitude

translation

• is a smooth monotone function

• Utilize monotonicity operator of Ramsey (1998)
y(t) = ↵g(t0
)x(t0
) + t0
= h(t)
linear amplitude

scaling
non-linear time

warping

scaling
amplitude

translation

h(t)
unconstrained

function
monotonicity

operator

• Unconstrained and estimated using Fourier basis

• 2 and 7 frequencies, respectively, were sufﬁcient

• Total # of parameters: 20
y(t) = ↵g(t0
)x(t0
) + t0
= h(t)
linear amplitude

scaling
non-linear time

warping

scaling
amplitude

translation

w(t) g(t)

Curve Alignment: Experiments
• Alignment of synthetic data sets

• Alignment of real data sets

• Improving classiﬁcation with unsupervised joint alignment

Curve Alignment: Synthetic
• 5 curves x 5 transformations x 3 levels of difﬁculty = 75 data sets

• ~90% of the data sets converged to correct alignments

Non-linear time scaling

All transformations

Failure case

Curve Alignment: Real
2-class data set, color-coded

Curve Alignment: Comparison to CPM
TIC data set
Original w/ variance

Curve Alignment: Comparison to CPM
TIC data set
Original w/ entropy

Curve Alignment: Classiﬁcation
Aligned

Data Set
Transformation

Parameters
Original
Data

• Perform unsupervised alignment

• Represent each instance by its original + aligned + transformation

• Improve the performance of an SVM

Method Accuracy
Original Curves (no alignment) 78.83%
Dynamic Time Warping (pairwise alignment) 82.35%
13 Data Sets from UCRTime Series Repository

Method Accuracy
Joint Alignment (entropy + non-linear time) 82.97%

Method Accuracy
Joint Alignment (entropy + all transformations) 84.12%

Method Accuracy
Joint Alignment (variance + all transformations) 85.46%

Method Accuracy
Joint Alignment (variance + all transformations) 85.46%
Joint Alignment (variance + all trans. + MKL) 85.79%

Curve Alignment: Conclusions
• Effective on 1-dimensional curves

• Small # of parameters needed

• Unsupervised alignment improves classiﬁcation

Curve Alignment: Drawbacks
• Lack of a feature representation

• Independence assumption in entropy computation

• Results in co-alignment and difﬁculty aligning complex data sets

• Transformation regularization is ad hoc

CRBM Alignment: Objectives
• Alignment of complex data sets requires an appropriate representation

• Automate feature selection

• Use statistics of data set to learn an appropriate feature representation

• Experimented with convolutional restricted Boltzmann machines

!
• Weights between visible units and hidden unit utilized as feature detector

• Hidden units are binary and activate if feature is detected

• Higher layers represent more complex features
CRBM Alignment: Feature Learning
v1 v2 v3 v4 v5 v6 v7
h1 h2 h3 h4
v1 v2 v3 v4 v5 v6 v7
h1
1 h1
2
h1
3 h1
4
p1
2p1
1
RBM

CRBM

(1 group)

CRBM Alignment: Overview
• Train a CRBM using unaligned data

• Use pooling unit activation probability as feature representation

• Use (stochastic) congealing algorithm

• Applied successfully to both curves and face images

CRBM Alignment: Curve Features
• Trained a CRBM on training data from 13 UCR data sets

• Learned 32 ﬁlters of length 10, with pooling area of 5

• 576 pooling unit activations

CRBM Alignment: Curve Features

CRBM Alignment: Classiﬁcation
Method
Accuracy
Alignment Classiﬁcation
None

Raw 76.89%

Method
Accuracy
None

Raw 76.89%
Raw Raw 84.63%

Method
Accuracy
None

Raw 76.89%
Raw Raw 84.63%
None

CRBM 84.96%

Method
Accuracy
None

Raw 76.89%
Raw Raw 84.63%
None

CRBM 84.96%
CRBM CRBM 87.40%

CRBM Alignment: FaceVerification
• Use alignment as pre-processing for face verification task

• Use alternative / consistent feature representation for verification step

CRBM Alignment: Filters
CRBM

32 ﬁlters, 10x10 size

pooling: 5x5
KMean

clusters

Method Accuracy
Original (no alignment) 74.2%
Alignment with SIFT features 75.8%
Alignment with CRBM features 82.0%
Supervised alignment 80.5%
View 1 from LFW

CRBM Alignment: Summary
• Effective for alignment of both curves & faces

• Effective for curve classiﬁcation

JAC: Objectives
• Develop nonparametric/unsupervised alignment & clustering framework

• Applicable to any domain

• Allows use of continuous transformations

• Discovers # clusters automatically

• Regularize transformations principally

• Evaluate model on curve and image data sets

JAC: Objectives
• Simultaneous > sequentially

!
!
!
!
• KMeans clustering accuracy = 54% (almost random guessing)

JAC: Overview
Bayesian alignment

(assumes single mode)
Inﬁnite Bayesian

alignment & clustering

JAC: Bayesian Alignment
xi = ⌧(yi, ⇢i)

• Explicit regularization of transformations

• Direct maximization that treats transformation function as black box

• Low memory footprint with sufﬁcient stats

Average pairwise distance: 8.7 (before), 5.2 (with congealing), 5.1 (with BA)

75 synthetic curve data sets

Sample result

JAC: Model
Bayesian alignment
Single set of parameters
Alignment & Clustering
Potentially inﬁnite parameters

JAC: Model
Blocked Gibbs sampler: Direct generalization of BA
transformation termdata termcluster term (CRP)
(zi, ⇢i) ⇠ p(zi | z i; )p(yi | yzi
; )p(⇢i | ⇢zi
; ↵)

JAC: Model
Second sampler integrates out transformations
zi ⇠ p(zi | z i; )p(xi | x i, z; , ↵)
Transformation parameter updated with EM

JAC: Digits ‘4’ & ‘9’
Method Clustering Alignment
Original - 4.54
KMeans 54.0%
-Afﬁnity Propagation 82.0% (3)

Inﬁnite mixture model 86.0% (4)

Original - 4.54
KMeans 54.0%

Congealing (+KMeans) 90.0% 1.80

Original - 4.54
KMeans 54.0%

Unsupervised JAC 94.0% (2) 1.20

JAC:All 10 Digit Classes
Original - 5.22
KMeans 62.5%

Original - 5.22
KMeans 62.5%
Liu et al. 56.5% / 73.7% 3.80

JAC: ECG Heart Data
Original - 93.6
KMeans (2 clusters) 58.70%
-
Afﬁnity Propagation 65.22% (4)

JAC: ECG Heart Data
Original - 93.6
-
Congealing (+KMeans, 5 clusters) 76.09% 53.85

JAC: Summary
• Effective, large improvements over prior work

• Allows ﬁnite, semi-supervised, online & distributed variations

• Modular implementation

Thesis Summary
Curve Alignment
Alignment &
Feature Learning
Alignment &
Clustering

Future: Closing the Loop
• Simultaneous unsupervised alignment, clustering, and feature learning

• Update learned features through alignment

• Different ﬁlters / weights for each cluster

• Multi-resolution framework using different network depths

Thank you.
Photo credit: Gary B. Huang

Previous Work: Procrustes Analysis
(Mosier 1939, Hartley & Cattel 1962, Kristof & Wingersky 1971)
• Framework for computing optimal matching under transformations

• Residual sum-of-squares distance called procrustes statistic

• Applied to joint alignment and several other applications

!
!

• Framework for computing optimal matching under transformations

• Residual sum-of-squares distance called procrustes statistic

• Applied to joint alignment and several other applications

• Drawbacks:

• Mean may not be appropriate reference

• Assumes unimodal data set

(Binary images of handwritten digits: Miller et al. 2000)
before after

(Bias removal in MRI scans: Learned-Miller et al. 2004 & 2005)
before after

• Complex images of cars and faces

• 3D MRI scans

• Images of Drosophila discs

• Facial contour labeling

(Grayscale images of faces: Huang et al. 2007 & 2012)

(3D MRI scans: Zollei et al. 2005)

Previous Work: Curves
• HMM-based models: Continuous Proﬁle Model & Segmental HMM

• Many parameters

• Assumes unimodal data set

• Regression-based models:Transformed mixture of regression

• Requires # of clusters

• Only linear transformations

Previous Work: Images
• Congealing variants:

• Speciﬁc to image domains

• Improved performance on digit and simple face data sets

• Clustering-based models

h(t) =
1
Z
Z t
0

exp
✓Z r=t
0
w(s)ds
◆
dr = (w(t))

y(t) = ↵g(t0
)x(t0
) + t0
= h(t)
linear amplitude

scaling
non-linear time

warping

scaling
amplitude

translation

h(t)
d2
h
dt2
= w
dh
dt

h(t) =
1
Z
Z t
0

exp
✓Z r=t
0
w(s)ds
◆
dr = (w(t))

y(t) = ↵g(t0
)x(t0
) + t0
= h(t)
linear amplitude

scaling
non-linear time

warping

scaling
amplitude

translation

h(t)
d2
h
dt2
= w
dh
dt
unconstrained

function
monotonicity

operator

Curve Congealing: Drawbacks
Co-alignment
original mean supervised unsupervised

Curve Congealing: Drawbacks
Complex multi-modal
50 100 150 200 250
−3
−2
−1
0
1
2
3
4
Original Data Set
50 100 150 200 250
−3
−2
−1
0
1
2
3
4
with Congealing

CRBM Alignment: Overview
• Train a convolutional deep belief network using unaligned images

!
!
!
• Use pooling activations as feature representation

• Use (stochastic) congealing algorithm
CRBM

(1-layer)
CDBN

(2-layer)

JAC:Toy Example
(2) Congealing (3) Clustering (4) Alignment and Clustering
(1) Original

8i=1:N ⇢i ⇠ p(yi | y i; )p(⇢i | ⇢ i; ↵)
Collapsed (Rao-Blackwellized) Gibbs Sampler

8i=1:N ⇢i ⇠ p(yi | y i; )p(⇢i | ⇢ i; ↵)
⇢i = arg max
⇢i
p(yi | y i; )p(⇢i | ⇢ i; ↵)

8i=1:N ⇢i ⇠ p(yi | y i; )p(⇢i | ⇢ i; ↵)
⇢i = arg max
⇢i
p(yi | y i; )p(⇢i | ⇢ i; ↵)
data term transformation term

JAC: Implementation
• visDataSet: represents data set (e.g. images, curves)

• visProcessor: handles feature representation (e.g. HOG)

• visTransform: represents transformation function (e.g. afﬁne trans)

• visSufﬁcientStats: represents exponential family members and priors

• visAlignment: base class for alignment algorithms

• visMixtureModelBase: base class for DP-based clustering/alignment

JAC: Implementation
visDataSet
visAlignment
visSufﬁcientStatsvisTransforms
visMixtureModel
visProcessor

JAC: Implementation
visDataSet
visAlignment
visMixtureModel
visProcessor
visCongealing

JAC: Implementation
visDataSet
visAlignment
visMixtureModel
visProcessor
visDPClustering

JAC: Implementation
visDataSet
visAlignment
visMixtureModel
visProcessor
visBayesianAlignment

JAC: Implementation
visDataSet
visAlignment
visMixtureModel
visProcessor
visAlignmentClustering

JAC: Implementation
visDataSet visSufﬁcientStatsvisTransforms visProcessor
visImageTransforms
visCurveTransforms
visRotatePointTransforms
visVectorDataSet
visImageDataSet
visDiscretize
visHOG
visCDBN
visMultinomial
visGaussian

Mattar_PhD_Thesis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Mattar_PhD_Thesis

Similar to Mattar_PhD_Thesis (20)

Mattar_PhD_Thesis