5. Joint Alignment: Domains
• Computer vision: remove spatial variability in images
• Speech: remove temporal variability in speech signals
• Psychology: recover stimulus onset in ERP signals
• Radiology: remove bias in MRI scans
• ...
6. Joint Alignment != Pairwise Alignment
• Pairwise alignment involves two instances
• Joint alignment can be achieved through (N - 1) pairwise alignments
• Can be ineffective (local view & optimization)
7. Joint Alignment: Unsupervised
• Supervision: fiducial points or examples of transforms
• Can be expensive to obtain
• We focus on unsupervised joint alignment
10. Previous Work: Congealing
(Miller et al. 2000)
• General framework, widely applicable
• Utilizes entropy-based objective function
• Iterative conditional maximization optimization
• Expectation regularization step: E[⇢] = 0
11. Research Goals
• Develop a framework for joint alignment
• Handle complexities due to multi-modalities and representation
!
• Release open-source implementations of all models
clustering feature learning
12. Research Goals
• Develop a
• Handle complexities due to
!
• Release
clustering deep learning
Develop an unsupervised data set-agnostic processing module
that includes alignment, clustering, and feature learning
23. Curve Alignment:Transformation
• is a smooth monotone function
• Utilize monotonicity operator of Ramsey (1998)
y(t) = ↵g(t0
)x(t0
) + t0
= h(t)
linear amplitude
scaling
non-linear time
warping
non-linear amplitude
scaling
amplitude
translation
h(t)
unconstrained
function
monotonicity
operator
24. Curve Alignment:Transformation
• Unconstrained and estimated using Fourier basis
• 2 and 7 frequencies, respectively, were sufficient
• Total # of parameters: 20
y(t) = ↵g(t0
)x(t0
) + t0
= h(t)
linear amplitude
scaling
non-linear time
warping
non-linear amplitude
scaling
amplitude
translation
w(t) g(t)
25. Curve Alignment: Experiments
• Alignment of synthetic data sets
• Alignment of real data sets
• Improving classification with unsupervised joint alignment
26. Curve Alignment: Synthetic
• 5 curves x 5 transformations x 3 levels of difficulty = 75 data sets
• ~90% of the data sets converged to correct alignments
34. Curve Alignment: Classification
• Perform unsupervised alignment
• Represent each instance by its original + aligned + transformation
• Improve the performance of an SVM
35. Curve Alignment: Classification
Method Accuracy
Original Curves (no alignment) 78.83%
Dynamic Time Warping (pairwise alignment) 82.35%
13 Data Sets from UCRTime Series Repository
36. Curve Alignment: Classification
Method Accuracy
Original Curves (no alignment) 78.83%
Dynamic Time Warping (pairwise alignment) 82.35%
Joint Alignment (entropy + non-linear time) 82.97%
13 Data Sets from UCRTime Series Repository
37. Curve Alignment: Classification
Method Accuracy
Original Curves (no alignment) 78.83%
Dynamic Time Warping (pairwise alignment) 82.35%
Joint Alignment (entropy + non-linear time) 82.97%
Joint Alignment (entropy + all transformations) 84.12%
13 Data Sets from UCRTime Series Repository
38. Curve Alignment: Classification
Method Accuracy
Original Curves (no alignment) 78.83%
Dynamic Time Warping (pairwise alignment) 82.35%
Joint Alignment (entropy + non-linear time) 82.97%
Joint Alignment (entropy + all transformations) 84.12%
Joint Alignment (variance + all transformations) 85.46%
13 Data Sets from UCRTime Series Repository
39. Curve Alignment: Classification
Method Accuracy
Original Curves (no alignment) 78.83%
Dynamic Time Warping (pairwise alignment) 82.35%
Joint Alignment (entropy + non-linear time) 82.97%
Joint Alignment (entropy + all transformations) 84.12%
Joint Alignment (variance + all transformations) 85.46%
Joint Alignment (variance + all trans. + MKL) 85.79%
13 Data Sets from UCRTime Series Repository
41. Curve Alignment: Conclusions
• Effective on 1-dimensional curves
• Small # of parameters needed
• Unsupervised alignment improves classification
42. Curve Alignment: Drawbacks
• Lack of a feature representation
• Independence assumption in entropy computation
• Results in co-alignment and difficulty aligning complex data sets
• Transformation regularization is ad hoc
44. CRBM Alignment: Objectives
• Alignment of complex data sets requires an appropriate representation
• Automate feature selection
• Use statistics of data set to learn an appropriate feature representation
• Experimented with convolutional restricted Boltzmann machines
45. !
• Weights between visible units and hidden unit utilized as feature detector
• Hidden units are binary and activate if feature is detected
• Higher layers represent more complex features
CRBM Alignment: Feature Learning
v1 v2 v3 v4 v5 v6 v7
h1 h2 h3 h4
v1 v2 v3 v4 v5 v6 v7
h1
1 h1
2
h1
3 h1
4
p1
2p1
1
RBM
CRBM
(1 group)
46. CRBM Alignment: Overview
• Train a CRBM using unaligned data
• Use pooling unit activation probability as feature representation
• Use (stochastic) congealing algorithm
• Applied successfully to both curves and face images
47. CRBM Alignment: Curve Features
• Trained a CRBM on training data from 13 UCR data sets
• Learned 32 filters of length 10, with pooling area of 5
• 576 pooling unit activations
56. CRBM Alignment: FaceVerification
• Use alignment as pre-processing for face verification task
• Use alternative / consistent feature representation for verification step
58. CRBM Alignment: Classification
Method Accuracy
Original (no alignment) 74.2%
Alignment with SIFT features 75.8%
Alignment with CRBM features 82.0%
Supervised alignment 80.5%
View 1 from LFW
59. CRBM Alignment: Summary
• Effective for alignment of both curves & faces
• Effective for curve classification
61. JAC: Objectives
• Develop nonparametric/unsupervised alignment & clustering framework
• Applicable to any domain
• Allows use of continuous transformations
• Discovers # clusters automatically
• Regularize transformations principally
• Evaluate model on curve and image data sets
65. JAC: Bayesian Alignment
• Explicit regularization of transformations
• Direct maximization that treats transformation function as black box
• Low memory footprint with sufficient stats
90. Future: Closing the Loop
• Simultaneous unsupervised alignment, clustering, and feature learning
• Update learned features through alignment
• Different filters / weights for each cluster
• Multi-resolution framework using different network depths
91. Research Goal
Develop an unsupervised data set-agnostic processing module
that includes alignment, clustering, and feature learning
93. Previous Work: Procrustes Analysis
(Mosier 1939, Hartley & Cattel 1962, Kristof & Wingersky 1971)
• Framework for computing optimal matching under transformations
• Residual sum-of-squares distance called procrustes statistic
• Applied to joint alignment and several other applications
!
!
94. Previous Work: Procrustes Analysis
(Mosier 1939, Hartley & Cattel 1962, Kristof & Wingersky 1971)
• Framework for computing optimal matching under transformations
• Residual sum-of-squares distance called procrustes statistic
• Applied to joint alignment and several other applications
• Drawbacks:
• Mean may not be appropriate reference
• Assumes unimodal data set
101. Previous Work: Curves
• HMM-based models: Continuous Profile Model & Segmental HMM
• Many parameters
• Assumes unimodal data set
• Regression-based models:Transformed mixture of regression
• Requires # of clusters
• Only linear transformations
102. Previous Work: Images
• Congealing variants:
• Specific to image domains
• Improved performance on digit and simple face data sets
• Clustering-based models
103. h(t) =
1
Z
Z t
0
exp
✓Z r=t
0
w(s)ds
◆
dr = (w(t))
Curve Alignment:Transformation
• is a smooth monotone function
• Utilize monotonicity operator of Ramsey (1998)
y(t) = ↵g(t0
)x(t0
) + t0
= h(t)
linear amplitude
scaling
non-linear time
warping
non-linear amplitude
scaling
amplitude
translation
h(t)
d2
h
dt2
= w
dh
dt
104. h(t) =
1
Z
Z t
0
exp
✓Z r=t
0
w(s)ds
◆
dr = (w(t))
Curve Alignment:Transformation
• is a smooth monotone function
• Utilize monotonicity operator of Ramsey (1998)
y(t) = ↵g(t0
)x(t0
) + t0
= h(t)
linear amplitude
scaling
non-linear time
warping
non-linear amplitude
scaling
amplitude
translation
h(t)
d2
h
dt2
= w
dh
dt
unconstrained
function
monotonicity
operator
111. JAC: Bayesian Alignment
8i=1:N ⇢i ⇠ p(yi | y i; )p(⇢i | ⇢ i; ↵)
⇢i = arg max
⇢i
p(yi | y i; )p(⇢i | ⇢ i; ↵)
data term transformation term
Collapsed (Rao-Blackwellized) Gibbs Sampler
112. JAC:Alignment & Clustering
Second sampler integrates out transformations
zi ⇠ p(zi | z i; )p(xi | x i, z; , ↵)
p(xi | x i, z; , ↵) ⇡
P
l wlp(xi | ⇢l, ✓zi ; )
P
l wl
s.t. 8l=1:L ⇢l ⇠ q(⇢) & wl =
p(⇢l | 'zi ; ↵)
q(⇢l)
113. JAC: Implementation
• visDataSet: represents data set (e.g. images, curves)
• visProcessor: handles feature representation (e.g. HOG)
• visTransform: represents transformation function (e.g. affine trans)
• visSufficientStats: represents exponential family members and priors
• visAlignment: base class for alignment algorithms
• visMixtureModelBase: base class for DP-based clustering/alignment