SlideShare a Scribd company logo
1 of 270
Download to read offline
UNIVERSIDAD POLIT´ECNICA DE MADRID
FACULTAD DE INFORM´ATICA
TESIS DOCTORAL
Efficient Model-based 3D Tracking by Using Direct Image
Registration
presentada en la
FACULTAD DE INFORM´ATICA
de la
UNIVERSIDAD POLIT´ECNICA DE MADRID
para la obtenci´on del
GRADO DE DOCTOR EN INFORM´ATICA
AUTOR: Enrique Mu˜noz Corral
DIRECTOR: Luis Baumela Molina
Madrid, 2012
i
ii
Agradecimientos
La verdad es que los diez a˜nos (diez!) que he tardado en escribir esta tesis dan
para muchas cosas, y si tuviera que agradecer algo a todas las personas que me
han ayudado, necesitar´ıa un cap´ıtulo entero. En primer lugar quisiera agradecer
a Luis Baumela, gran director de tesis y mejor persona, el haber despertado en
m´ı el gusanillo por la investigaci´on, y sobre todo, por tener la suficiente paciencia
para aguantar mis cabezonadas. Luis, si no fuera por t´ı, no habr´ıa entrado en la
Universidad y estar´ıa en la empresa privada ganando una pasta gansa—yeah, thank
you so much!
Gracias mil a Javier de Lope, por incansables discusiones t´ecnicas y no tan
t´ecnicas y sobre todo a Jos´e Miguel Buenaposada, quien durante todos estos a˜nos
me ha aguantado, ayudado, irritado, bromeado, e incluso buscado trabajo. No me
puedo olvidar de los buenos ratos pasados en la hora de la comida junto con las
“chicas” de estad´ıstica (Maribel, Arminda, Concha y Juan Antonio), en las que han
aguantado mis interminables peroratas sobre la burbuja inmobiliaria y los pol´ıticos
patrios. Un recuerdo tambi´en para todos los compa˜neros que han pasado por el
laboratorio L-3202 durante estos a˜nos: los “chicos de Javi” (Javi, Juan, Bea y
Yadira), Juan Bekios, los dos “Pablos” (M´arquez y Herrero), Antonio y Rub´en.
Quisiera agradecer tambi´en a Lourdes Agapito por permitirme participar en el
proyecto Automated facial expression analysis using computer vision, financiado por
la Royal Society del Reino Unido. Gracias a este proyecto pude tener el privilegio
de trabajar con Lourdes y con Xavier Llad´o, y sobre todo de conocer a ese singular
personaje llamado Alessio del Bue. No tengo palabras para agradecer a Alessio el
ser tan majete y el aguantar estoicamente tantas veces como le hemos gorroneado.
Tampoco puedo olvidarme de la ayuda prestada por el profesor Thomas Vetter y su
grupo de la Universidad de Basilea (especialmente Brian Amberg y Pascal Paysan);
ellos se tomaron la molestia de construir un modelo tridimensional de mi cara,
incluyendo deformaciones y expresiones. No quisiera cerrar estos agradecimientos
sin comentar que parte de los trabajos de esta tesis se han realizado bajo los proyectos
del Ministerio de Ciencia y Tecnolog´ıa TIC2002-00591, y del Ministerio de Ciencia
e Innovaci´on TIN2008-06815-C02-02.
Y por ´ultimo, aunque no por ello menos importante, agradecer a Susana la
paciencia que ha tenido todos estos a˜nos (que han sido muchos) en los que he estado
liado con la tesis. Va por t´ı, Susana!
Enero de 2012
iii
Contents
Resumen xvii
Summary xix
Notations 1
1 Introduction 5
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 9
2 Literature Review 13
2.1 Image Registration vs. Tracking . . . . . . . . . . . . . . . . . . . . . 13
2.2 Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Model-based 3D Tracking . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Modelling assumptions . . . . . . . . . . . . . . . . . . . . . . 15
2.3.2 Rigid Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.3 Nonrigid Objects . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.4 Facial Motion Capture . . . . . . . . . . . . . . . . . . . . . . 18
3 Efficient Direct Image Registration 21
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.1 Imaging Geometry . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.2 Brightness Constancy Constraint . . . . . . . . . . . . . . . . 23
3.2.3 Image Registration by Optimization . . . . . . . . . . . . . . . 23
3.2.4 Additive vs. Compositional . . . . . . . . . . . . . . . . . . . 25
3.3 Additive approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.1 Lucas-Kanade Algorithm . . . . . . . . . . . . . . . . . . . . . 27
3.3.2 Hager-Belhumeur Factorization Algorithm . . . . . . . . . . . 29
3.4 Compositional approaches . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.1 Forward Compositional Algorithm . . . . . . . . . . . . . . . . 33
3.4.2 Inverse Compositional Algorithm . . . . . . . . . . . . . . . . 35
3.5 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
v
4 Equivalence of Gradients 39
4.1 Image Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.1 Image Gradients in R2
. . . . . . . . . . . . . . . . . . . . . . 40
4.1.2 Image Gradients in P2
. . . . . . . . . . . . . . . . . . . . . . 42
4.1.3 Image Gradients in R3
. . . . . . . . . . . . . . . . . . . . . . 43
4.2 The Gradient Equivalence Equation . . . . . . . . . . . . . . . . . . . 45
4.2.1 Relevance of the Gradient Equivalence Equation . . . . . . . . 46
4.2.2 General Approach to Gradient Replacement . . . . . . . . . . 46
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 Additive Algorithms 51
5.1 Gradient Replacement Requirements . . . . . . . . . . . . . . . . . . 52
5.2 Systematic Factorization . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3 3D Rigid Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3.1 3D Textured Models . . . . . . . . . . . . . . . . . . . . . . . 55
5.3.2 Shape-induced Homography . . . . . . . . . . . . . . . . . . . 57
5.3.3 Change to the Reference Frame . . . . . . . . . . . . . . . . . 57
5.3.4 Optimization Outline . . . . . . . . . . . . . . . . . . . . . . . 61
5.3.5 Gradient Replacement . . . . . . . . . . . . . . . . . . . . . . 61
5.3.6 Systematic Factorization . . . . . . . . . . . . . . . . . . . . . 63
5.4 3D Nonrigid Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4.1 Nonrigid Morphable Models . . . . . . . . . . . . . . . . . . . 65
5.4.2 Nonrigid Shape-induced Homography . . . . . . . . . . . . . . 65
5.4.3 Change of Variables to the Reference Frame . . . . . . . . . . 66
5.4.4 Optimization Outline . . . . . . . . . . . . . . . . . . . . . . . 69
5.4.5 Gradient Replacement . . . . . . . . . . . . . . . . . . . . . . 69
5.4.6 Systematic Factorization . . . . . . . . . . . . . . . . . . . . . 71
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6 Compositional Algorithms 77
6.1 Unravelling the Inverse Compositional Algorithm . . . . . . . . . . . 77
6.1.1 Change of Variables in IC . . . . . . . . . . . . . . . . . . . . 79
6.1.2 The Efficient Forward Compositional Algorithm . . . . . . . . 79
6.1.3 Rationale of the Change of Variables in IC . . . . . . . . . . . 82
6.1.4 Differences between IC and EFC . . . . . . . . . . . . . . . . . 84
6.2 Requirements for Compositional Warps . . . . . . . . . . . . . . . . . 85
6.2.1 Requirement on Warp Composition . . . . . . . . . . . . . . . 85
6.2.2 Requirement on Gradient Equivalence . . . . . . . . . . . . . 85
6.3 Other Compositional Algorithms . . . . . . . . . . . . . . . . . . . . 86
6.3.1 Generalized Inverse Compositional Algorithm . . . . . . . . . 86
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
vi
7 Computational Complexity 91
7.1 Complexity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.1.1 Number of Operations . . . . . . . . . . . . . . . . . . . . . . 91
7.1.2 Complexity of Matrix Operations . . . . . . . . . . . . . . . . 92
7.1.3 Comparing Algorithm Complexities . . . . . . . . . . . . . . . 93
7.2 Algorithm Naming Conventions . . . . . . . . . . . . . . . . . . . . . 94
7.2.1 Additive Algorithms . . . . . . . . . . . . . . . . . . . . . . . 95
7.2.2 Compositional Algorithms . . . . . . . . . . . . . . . . . . . . 96
7.3 Complexity of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 96
7.3.1 Additive Algorithms . . . . . . . . . . . . . . . . . . . . . . . 97
7.3.2 Compositional Algorithms . . . . . . . . . . . . . . . . . . . . 103
7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8 Experiments 107
8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.2 Features and Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.2.1 Numerical Ranges for Features . . . . . . . . . . . . . . . . . . 115
8.3 Generation of Synthetic Experiments . . . . . . . . . . . . . . . . . . 116
8.3.1 Synthetic Datasets and Images . . . . . . . . . . . . . . . . . 118
8.3.2 Generation of Result Plots . . . . . . . . . . . . . . . . . . . . 120
8.4 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.4.1 Convergence Criteria . . . . . . . . . . . . . . . . . . . . . . . 122
8.4.2 Visibility Management . . . . . . . . . . . . . . . . . . . . . . 122
8.4.3 Scale of Homographies . . . . . . . . . . . . . . . . . . . . . . 125
8.4.4 Minimization of Jacobian Operations . . . . . . . . . . . . . . 126
8.5 Additive Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
8.5.1 Experimental Hypotheses . . . . . . . . . . . . . . . . . . . . 126
8.5.2 Experiments with Synthetic Rigid data . . . . . . . . . . . . . 127
8.5.3 Experiments with Synthetic Nonrigid data . . . . . . . . . . . 142
8.5.4 Experiments With Nonrigid Sequence . . . . . . . . . . . . . . 151
8.5.5 Experiments with real Rigid data . . . . . . . . . . . . . . . . 154
8.5.6 Experiment with real Nonrigid data . . . . . . . . . . . . . . . 158
8.6 Compositional Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 163
8.6.1 Experimental Hyphoteses . . . . . . . . . . . . . . . . . . . . 163
8.6.2 Experiments with Synthetic Rigid data . . . . . . . . . . . . . 163
8.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9 Conclusions and Future work 179
9.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 179
9.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
A Gauss-Newton Optimization 201
B Plane-induced Homography 203
vii
C Plane+Parallax-constrained Homography 205
C.1 Compositional Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
D Methodical Factorization 209
D.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
D.2 Lemmas that Re-organize Product of Matrices . . . . . . . . . . . . . 211
D.3 Lemmas that Re-organize Kronecker Products . . . . . . . . . . . . . 215
D.4 Lemmas that Re-organize Sums of Matrices . . . . . . . . . . . . . . 216
E Methodical Factorization of f3DTM 219
F Methodical Factorization of f3DMM (Partial case) 223
G Methodical Factorization of f3DMM (Full case) 225
H Detailed Complexity of Algorithms 235
H.1 Warp f3DTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
H.2 Warp f3DMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
H.3 Jacobian of Algorithm HB3DTM . . . . . . . . . . . . . . . . . . . . 237
H.4 Jacobian of Algorithm HB3DTMNF . . . . . . . . . . . . . . . . . . 239
H.5 Jacobian of Algorithm HB3DMMNF . . . . . . . . . . . . . . . . . 241
H.6 Jacobian of Algorithm HB3DMMSF . . . . . . . . . . . . . . . . . . 246
viii
List of Figures
1.1 Example of 3D rigid tracking. . . . . . . . . . . . . . . . . . . . . 6
1.2 3D Nonrigid Tracking. . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Image registration. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Industrial applications of 3D tracking. . . . . . . . . . . . . . . 9
1.5 Motion capture in the film industry. . . . . . . . . . . . . . . . 10
1.6 Markerless facial motion capture. . . . . . . . . . . . . . . . . . 11
3.1 Imaging geometry. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Iterative gradient descent image registration. . . . . . . . . . . 24
3.3 Generic descent method for image registration. . . . . . . . . . 26
3.4 Lucas-Kanade image registration. . . . . . . . . . . . . . . . . . 28
3.5 Hager-Belhumeur image registration. . . . . . . . . . . . . . . . 32
3.6 Forward compositional image registration. . . . . . . . . . . . . 34
3.7 Inverse compositional image registration. . . . . . . . . . . . . 36
4.1 Depiction of Image Gradients. . . . . . . . . . . . . . . . . . . . 41
4.2 Image Gradient in P2
. . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Image gradient in R3
. . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4 Comparison between BCC and GEE. . . . . . . . . . . . . . . . 47
4.5 Gradients and Convergence. . . . . . . . . . . . . . . . . . . . . . 49
4.6 Open Subsets in Various Domains. . . . . . . . . . . . . . . . . . 49
5.1 3D Textured Model. . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2 Shape-induced homographies. . . . . . . . . . . . . . . . . . . . . 58
5.3 Warp defined on the reference frame. . . . . . . . . . . . . . . . 59
5.4 Reference frame advantages. . . . . . . . . . . . . . . . . . . . . . 60
5.5 Nonrigid Morphable Models. . . . . . . . . . . . . . . . . . . . . 65
5.6 Nonrigid shape-induced homographies. . . . . . . . . . . . . . . 67
5.7 Deformable warp defined on the reference frame. . . . . . . . 68
6.1 Change of variables in IC. . . . . . . . . . . . . . . . . . . . . . . 80
6.2 Forward compositional image registration. . . . . . . . . . . . . 83
6.3 Generalized inverse compositional image registration. . . . . . 88
7.1 Complexity of Additive Algorithms. . . . . . . . . . . . . . . . . 102
7.2 Complexities of Compositional Algorithms . . . . . . . . . . . 105
ix
8.1 Registration vs. Tracking. . . . . . . . . . . . . . . . . . . . . . . 109
8.2 Algorithm initialization . . . . . . . . . . . . . . . . . . . . . . . . 110
8.3 Accuracy and convergence. . . . . . . . . . . . . . . . . . . . . . 114
8.4 Ground Truth and Noise Variance. . . . . . . . . . . . . . . . . 117
8.5 Definition of Datasets. . . . . . . . . . . . . . . . . . . . . . . . . 118
8.6 Example of Synthetic Datasets. . . . . . . . . . . . . . . . . . . . 119
8.7 Experimental Evaluation with Synthetic Data . . . . . . . . . 121
8.8 Visibility management. . . . . . . . . . . . . . . . . . . . . . . . . 123
8.9 Efficiently solving of WLS. . . . . . . . . . . . . . . . . . . . . . . 125
8.10 The cube model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
8.11 The face model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
8.12 The tea box model. . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.13 Results from dataset DS1 for cube. . . . . . . . . . . . . . . . . . 130
8.14 Results from dataset DS2 for cube. . . . . . . . . . . . . . . . . . 131
8.15 Results from dataset DS3 for cube. . . . . . . . . . . . . . . . . . 132
8.16 Results from dataset DS4 for cube. . . . . . . . . . . . . . . . . . 133
8.17 Results from dataset DS5 for cube. . . . . . . . . . . . . . . . . . 134
8.18 Results from dataset DS6 for cube. . . . . . . . . . . . . . . . . . 135
8.19 tea box sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
8.20 Results for the tea box sequence. . . . . . . . . . . . . . . . . . . 137
8.21 Estimated parameters from teabox sequence. . . . . . . . . . . 138
8.22 Estimated parameters from face sequence. . . . . . . . . . . . . 140
8.23 Good texture vs. bad texture. . . . . . . . . . . . . . . . . . . . 141
8.24 The face-deform model. . . . . . . . . . . . . . . . . . . . . . . . . 142
8.25 Distribution of Synthetic Datasets. . . . . . . . . . . . . . . . . 143
8.26 Results from dataset DS1 for face-deform. . . . . . . . . . . . . 145
8.27 Results from dataset DS2 for face-deform. . . . . . . . . . . . . 146
8.28 Results from dataset DS3 for face-deform. . . . . . . . . . . . . 147
8.29 Results from dataset DS4 for face-deform. . . . . . . . . . . . . 148
8.30 Results from dataset DS5 for face-deform. . . . . . . . . . . . . 149
8.31 Results from dataset DS6 for face-deform. . . . . . . . . . . . . 150
8.32 face-deform sequence. . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.33 Results from face-deform sequence. . . . . . . . . . . . . . . . . 152
8.34 Estimated parameters from face-deform sequence. . . . . . . . 153
8.35 The cube-real model. . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.36 The cube-real sequence. . . . . . . . . . . . . . . . . . . . . . . . 156
8.37 Results from cube-real sequence. . . . . . . . . . . . . . . . . . . 157
8.38 Selected facial scans used to build the model. . . . . . . . . . . 158
8.39 Unfolded texture model. . . . . . . . . . . . . . . . . . . . . . . . 159
8.40 The face-real sequence. . . . . . . . . . . . . . . . . . . . . . . . 160
8.41 Anchor points in the model. . . . . . . . . . . . . . . . . . . . . . 161
8.42 Results for the face-real sequence. . . . . . . . . . . . . . . . . 162
8.43 The plane model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
8.44 Distribution of Synthetic Datasets. . . . . . . . . . . . . . . . . 165
x
8.45 Results from dataset DS1 for plane. . . . . . . . . . . . . . . . . 167
8.46 Results from dataset DS2 for plane. . . . . . . . . . . . . . . . . 168
8.47 Results from dataset DS3 for plane. . . . . . . . . . . . . . . . . 169
8.48 Results from dataset DS4 for plane. . . . . . . . . . . . . . . . . 170
8.49 Results from dataset DS5 for plane. . . . . . . . . . . . . . . . . 171
8.50 Results from dataset DS6 for plane. . . . . . . . . . . . . . . . . 172
8.51 Average Time per iteration. . . . . . . . . . . . . . . . . . . . . . 176
9.1 Spiderweb Plots for Image Registration Algorithms. . . . . . 182
9.2 Spherical Harmonics-based Illumination Model . . . . . . . . . 184
9.3 Tracking by simultaneously using texture and edges infor-
mation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
9.4 Efficient tracking using multiple views . . . . . . . . . . . . . . 186
B.1 Plane-induced homography. . . . . . . . . . . . . . . . . . . . . . 203
C.1 Plane+Parallax-constrained homograpy. . . . . . . . . . . . . . 206
xi
List of Tables
4.1 Characteristics of the warps . . . . . . . . . . . . . . . . . . . . . 50
6.1 Relationship between compositional algorithms and warps . . 89
6.2 Requirements for Optimization Algorithms . . . . . . . . . . . 90
7.1 Complexity of matrix operations. . . . . . . . . . . . . . . . . . 93
7.2 Additive testing algorithms. . . . . . . . . . . . . . . . . . . . . . 95
7.3 Additive testing algorithms. . . . . . . . . . . . . . . . . . . . . . 96
7.4 Complexity of Algorithm LK3DTM. . . . . . . . . . . . . . . . . 97
7.5 Complexity of Algorithm HB3DTM. . . . . . . . . . . . . . . . 98
7.6 Complexity of Algorithm LK3DMM. . . . . . . . . . . . . . . . 98
7.7 Complexity of Algorithm HB3DMMNF. . . . . . . . . . . . . . 99
7.8 Complexity of Algorithm HB3DMM. . . . . . . . . . . . . . . . 100
7.9 Complexity of Algorithm HB3DMMSF. . . . . . . . . . . . . . 101
7.10 Complexities of Additive Algorithms. . . . . . . . . . . . . . . . 101
7.11 Complexity of Algorithm LKH8. . . . . . . . . . . . . . . . . . . 103
7.12 Complexity of Algorithm ICH8. . . . . . . . . . . . . . . . . . . 103
7.13 Complexity of Algorithm HBH8. . . . . . . . . . . . . . . . . . . 104
7.14 Complexity of Algorithm GICH8. . . . . . . . . . . . . . . . . . 104
7.15 Complexities of Compositional Algorithms. . . . . . . . . . . . 106
7.16 Comparison of Relative Complexities for Additive Algorithms106
7.17 Comparison of Relative Complexities for Compositional Al-
gorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.1 Registration vs. tracking in efficient methods . . . . . . . . . . 111
8.2 Features and Measures. . . . . . . . . . . . . . . . . . . . . . . . . 115
8.3 Numerical Ranges for Features. . . . . . . . . . . . . . . . . . . . 115
8.4 Evaluated Additive Algorithms . . . . . . . . . . . . . . . . . . . 127
8.5 Ranges of parameters for cube experiments. . . . . . . . . . . . 129
8.6 Average reprojection error vs. noise for cube. . . . . . . . . . . 129
8.7 Ranges of parameters for face-deform experiments. . . . . . . 144
8.8 Average reprojection error vs. noise for face-deform. . . . . . 144
8.9 Evaluated Compositional Algorithms . . . . . . . . . . . . . . . 164
8.10 Ranges of motion parameters for each dataset. . . . . . . . . . 165
8.11 Average reprojection error vs. noise for plane. . . . . . . . . . 166
xiii
9.1 Classification of Motion Warps. . . . . . . . . . . . . . . . . . . . 181
D.1 Lemmas used to re-arrange matrices product. . . . . . . . . . 214
D.2 Lemmas used to re-arrange Kronecker matrix products. . . . 216
xiv
List of Algorithms
1 Outline of the basic GN-based descent method for image
registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Outline of the Lucas-Kanade algorithm. . . . . . . . . . . . . . 28
3 Outline of the Hager-Belhumeur algorithm. . . . . . . . . . . . 31
4 Outline of the Forward Compositional algorithm. . . . . . . . 34
5 Outline of the Inverse Compositional algorithm. . . . . . . . . 36
6 Iterative factorization of the Jacobian matrix. . . . . . . . . . 54
7 Outline of the HB3DTM algorithm. . . . . . . . . . . . . . . . . 64
8 Outline of the full-factorized HB3DMM algorithm. . . . . . . 75
9 Outline of the HB3DMMSF algorithm. . . . . . . . . . . . . . . 76
10 Outline of the Efficient Forward Compositional algorithm. . . 82
11 Outline of the Generalized Inverse Compositional algorithm. 88
12 Creating the synthetic datasets. . . . . . . . . . . . . . . . . . . 119
13 Outline of the GN algorithm. . . . . . . . . . . . . . . . . . . . . 202
xv
Resumen
Esta tesis trata el problema de seguimiento eficiente de objectos 3D en secuencias de
im´agenes. Tratamos el problema del seguimiento 3D usando registrado de im´agenes
directo, una t´ecnica que permite alinear dos im´agenes usando sus niveles de inten-
sidad. El registrado de im´agenes se suele resolver usando m´etodos de optimizaci´on
iterativa, donde la funci´on a minimizar depende del error en los niveles de intensidad.
En esta tesis examinaremos los m´etodos de registrado de im´agenes m´as comunes,
haciendo hincapi´e en aquellos que usan algoritmos eficientes de optimizaci´on.
En esta tesis investigaremos dos formas de registrado eficiente. La primera in-
cluye a los m´etodos aditivos de registrado: los par´ametros de movimiento se calculan
incrementalmente mediante una aproximaci´on lineal de la funci´on de error. Dentro
de este tipo de algoritmos, nos centraremos en el m´etodo de factorizaci´on de Hager y
Belhumeur. Introduciremos un requisito necesario que el algoritmo de factorizaci´on
debe cumplir para tener una buena convergencia. Adem´as, proponemos un pro-
cedimiento autom´atico de factorizaci´on que nos permitir´a seguir objetos 3D tanto
r´ıgidos como deformables.
El segundo tipo son los llamados m´etodos composicionales de registrado, donde
la norma de error se reescribe usando composici´on de funciones. Estudiaremos los
m´etodos composicionales m´as usuales, haciendo hincapi´e en el m´etodo de registrado
m´as r´apido, el algoritmo composicional inverso. Introduciremos un nuevo m´etodo
de registrado composicional, el algoritmo Efficient Forward Compositional, que nos
permite interpretar los mecanismos de funcionamiento del algoritmo composicional
inverso. Gracias a esta interpretaci´on novedosa, enunciaremos dos requisitos funda-
mentales para algoritmos composicionales eficientes.
Por ´ultimo, realizaremos una serie de experimentos con datos reales y sint´eticos
para comprobar los postulados te´oricos. Adem´as, diferenciaremos entre los proble-
mas de registrado y seguimiento para algoritmos eficientes: aquellos algoritmos que
cumplan su(s) requisito(s) podr´an usarse para registrado de im´agenes, pero no para
seguimiento.
xvii
Abstract
This thesis deals with the problem of efficiently tracking 3D objects in sequences of
images. We tackle the efficient 3D tracking problem by using direct image registra-
tion. This problem is posed as an iterative optimization procedure that minimizes
a brightness error norm. We review the most popular iterative methods for image
registration in the literature, turning our attention to those algorithms that use
efficient optimization techniques.
Two forms of efficient registration algorithms are investigated. The first type
comprises the additive registration algorithms: these algorithms incrementally com-
pute the motion parameters by linearly approximating the brightness error function.
We centre our attention on Hager and Belhumeur’s factorization-based algorithm for
image registration. We propose a fundamental requirement that factorization-based
algorithms must satisfy to guarantee good convergence, and introduce a systematic
procedure that automatically computes the factorization. Finally, we also bring
out two warp functions to register rigid and nonrigid 3D targets that satisfy the
requirement.
The second type comprises the compositional registration algorithms, where the
brightness function error is written by using function composition. We study the
current approaches to compositional image alignment, and we emphasize the impor-
tance of the Inverse Compositional method, which is known to be the most efficient
image registration algorithm. We introduce a new algorithm, the Efficient Forward
Compositional image registration: this algorithm avoids the necessity of inverting
the warping function, and provides a new interpretation of the working mechanisms
of the inverse compositional alignment. By using this information, we propose two
fundamental requirements that guarantee the convergence of compositional image
registration methods.
Finally, we support our claims by using extensive experimental testing with
synthetic and real-world data. We propose a distinction between image registration
and tracking when using efficient algorithms. We show that, depending whether the
fundamental requirements are hold, some efficient algorithms are eligible for image
registration but not for tracking.
xix
Notations
Specific Sets and Constants
X Set of target points or target region.
Ω Set of target points currently visible.
N Number of points in the target region—i.e., N = X .
NΩ Number of visible target points—i.e., NΩ = Ω .
P Dimension of the parameter space.
C Number of image channels.
K Dimension of the deformations space.
F Number of frames in the image sequence.
Vectors and Matrices
a Lowercase bold letters denote vectors.
Am×n Monospace uppercase letters denote m × n matrices.
vec(A) Vectorization of matrix A: if A is a m × n matrix, vec(A) is
a mn × 1 vector.
Ik ∈ Mk×k k × k identity matrix.
I 3 × 3 identity matrix.
0k ∈ Rk
k × 1 vector full with zeroes.
0m×n ∈ Mm×n m × n matrix full with zeroes.
Camera Model Notations
x ∈ R2
Pixel location at the image.
ˆx ∈ P2
Location in the Projective space.
X ∈ R3
Point in Cartesian coordinates
Xc ∈ R3
Point expressed in the camera reference system.
K ∈ M3×3 3 × 3 camera intrinsics matrix.
P ∈ M3×4 3 × 4 camera projection matrix.
1
Imaging Notations
T (x) ∈ Rc
Brightness value of the template image at pixel x.
I(x, t) ∈ Rc
Brightness value of the current image for pixel x at instant t.
It(x) Another notation for I(x, t).
T,It Vector forms of functions T and It.
[ ] Composite function of I ◦ p, that is I[x] = I(p(x)).
Optimization Notations
µ ∈ RP
Column vector of motion parameters.
µ0 ∈ RP
Initial guess of the optimization.
µi ∈ RP
Parameters at the i-th iteration of the optimization.
µ∗
∈ RP
Actual optimum of the optimization.
µt ∈ RP
Parameters at image t.
µJ ∈ RP
Parameters where the Jacobian is computed for efficient algorithms.
δµ ∈ RP
Incremental step at the current state of the optimization.
ℓ(δµ) Linear model for the incremental step δµ.
L(δµ) Local minimizer for the incremental step δµ.
r(µ) ∈ RN
N × 1 vector-valued residuals function at parameters µ.
∇ˆxf(x) Derivatives of function f with respect to variables x, instantiated at x.
J(µ) ∈ MN×P Jacobian matrix of the brightness dissimilarity at µ (i.e., J(µ) =
∇ˆµD(X; µ)).
H(µ) ∈ MP×P Hessian matrix of the brightness dissimilarity at µ (i.e., H(µ) =
∇2
ˆµD(X; µ)).
Warp Function Notations
f(x; µ) : Rn
× RP
→ Rn
Motion model or Warp.
p : Rn
→ R2
Projection into the Cartesian plane.
R ∈ M3×3 3 × 3 rotation matrix.
ri ∈ R3
Columns of the rotation matrix R (i.e., R = (r1, r2, r3)).
t ∈ R3
Translation vector in Euclidean space.
D : R2
× Rp
→ R Dissimilarity function.
U : Rp
× Rp
→ Rp
Parameters update function.
ψ : Rp
× Rp
→ Rp
Jacobian update function for algorithm GIC.
2
Factorization Notations
⊗ Kronecker product.
⊙ Row-wise Kronecker product.
S(x) Constant matrix in the factorization method that is computed from the
target structure and camera calibration.
M(µ) Variable matrix in the factorization methods that is computed from
motion parameters.
W ∈ Mp×p Weighting matrix for Weighted Least-Squares.
π : Rn
→ Rn
Permutation of the set {1, . . . , n}.
Pπ(n) ∈ Mn×n Permutation matrix of the set {1, . . . , n}.
π(n, q) Permutation of the set {1, . . . , n} with ratio q.
3D Models Notations
F ⊂ R2
Reference frame for algorithm HB.
S : F → R3
Target shape function.
T : F → RC
Target texture function.
u ∈ F Target coordinates in the reference frame.
S ∈ M3×Nv Target 3D shape.
s ∈ R3
Shape coordinates in the Euclidean space.
s0
∈ R3
Mean shape of the target generative model.
si
∈ R3
i-th basis of deformation of the target generative model.
n⊤
∈ R3
Normal vector to a given triangle. n⊤
is normalized with the triangle
depth (i.e., if x belongs to the triangle, then n⊤
x = 1).
Bs ∈ M3×K Basis of deformations.
c ∈ RK
Vector containing K deformation coefficients.
HA ∈ M3×3 Affine warp between the image reference frame and F.
˙R∆ Derivatives of the rotation matrix R with respect to the Euler angle
∆ = {α, β, γ}.
λ ∈ R Homogeneous scale factor.
v ∈ R3
Change of variables defined as v = K−1
HA ˆu.
Function Naming Conventions
fH82D : P2
→ P2
8-dof homography.
fH6P : P2
→ P2
Plane-induced homography.
fH6S : P2
→ P2
Shape-induced homography.
f3DTM : P2
→ P2
3D Textured Model motion model.
fH6D : P2
→ P2
Deformable shape-induced homography.
f3DMM : P2
→ P2
3D Textured Morphable Model motion model.
ε : Rp
→ R Reprojection error function.
3
Algorithms Naming Conventions
LK Lucas-Kanade algorithm [Lucas and Kanade, 1981]1
.
HB Hager-Belhumeur factorization algorithm [Hager and Belhumeur, 1998].
IC Inverse Compositional algorithm [Baker and Matthews, 2004].
FC Forward Compositional algorithm [Baker and Matthews, 2004].
GIC Generalized Inverse Compositional algorithm [Brooks and Arbel, 2010].
EFC Efficient Forward Compositional algorithm.
LKH8 Lucas-Kanade algorithm for homographies.
LKH6 Lucas-Kanade algorithm for plane-induced homographies.
LK3DTM Lucas-Kanade algorithm for 3D Textured Models (rigid).
LK3DMM Lucas-Kanade algorithm for 3D Morphable Models (deformable).
HB3DTR Full-factorized HB algorithm for 6-dof motion in R3
[Sepp, 2006].
HB3DTM Full-factorized HB algorithm for 3D Textured Models (rigid).
HB3DMM Full-factorized HB algorithm for 3D Morphable Models (deformable).
HB3DMMSF Semi-factorized HB algorithm for 3D Morphable Models.
HB3DMMNF HB algorithm for 3D Morphable Models without the factorization stage.
ICH8 IC algorithm for homographies.
ICH6 IC algorithm for plane-induced homographies.
GICH8 IC algorithm for homographies.
GICH6 IC algorithm for plane-induced homographies.
IC3DRT IC algorithm for 6-dof motion in R3
[Mu˜noz et al., 2005].
FCH6PP FC algorithm for plane+parallax homographies.
1
We only show the most relevant cite for each algorithm
4
Chapter 1
Introduction
This thesis deals with the problems of registration and tracking in sequences of
images. Both problems are classical topics in Computer Vision and Image Processing
that have been widely studied in the past. We summarize the subjects of this thesis
in the dissertation title:
Efficient Model-based 3D Tracking by using Direct Image Registration
What is 3D Tracking? Let the target be a part of the scene—e.g. the cube in
Figure 1.1. We define tracking as the process of repeatedly computing the target
state in a sequence of images. When we describe this state as the relative 3D
orientation and location of the target with respect the coordinate system of the
camera (or another arbitrary reference system), we refer to this process as 3D rigid
tracking (see Figure 1.1). If we also include state parameters that describe the
possible deformation of the object, we have 3D nonrigid or deformable tracking (see
Figure 1.2). We use 3D tracking to refer to both the rigid or the nonrigid case.
What is Direct Image Registration? When the target is imaged by two cam-
eras with different point-of-view, the resulting images are different although they
represent the same portion of the scene (see Figure 1.3). Image Registration or
Image Alignment computes the geometric transformation that best aligns the coor-
dinate systems of both images such that their pixel-wise differences are minimal (cf.
Figure 1.3). We say that the image registration is a direct method when we register
the coordinate systems by just using the brightness differences of the images.
What is Model-based? We say that a technique is model-based when we re-
strict the information from the real world by using certain assumptions: on the
target dynamics, on the target structure, on the camera sensing process, etc—e.g.
in Figure 1.1 we model the target with a cube structure and rigid body dynamics.
5
Figure 1.1: Example of 3D rigid tracking (Left) Selected frames of a scene containing
a textured cube. We track the object and we overlay its state in blue. (Right) The relative
position of the camera—represented by a coloured pyramid—and the cube is computed
from the estimated 3D parameters.
Figure 1.2: 3D Nonrigid Tracking. Selected frames from a sequence of a cushion
under a bending motion. We track some landmarks on the cushion through the sequence,
and we plot the resulting triangular mesh for the selected frames. The motion of the
landmarks is both global—translation of the mesh—and local—changes on the relative
position of the mesh vertices due to the deformation. Source: Alessio del Bue.
And Finally, What does Efficient mean? We say that a method is efficient
if it substantially improves the computation time with respect to gold-standard
techniques. In a more practical way, efficient is equivalent to real-time—i.e. the
6
Figure 1.3: Image registration (Top-row)Image of a portion of the scene under two
distinct point-of-views. We have outlined the target in blue (Top-left) and green (Top-
right). (Bottom)The left image is warped such that the coordinates of the target match
up in both images. Source:Graffiti sequence, from Oxford Visual Geometry Group.
tracking procedure operates at 25 frames per second.
1.1 Motivation
In less than thirty years, and quite enclosed to academic or military environments,
video tracking has a widespread acknowledgement mainly thanks to the media.
7
Thus, video tracking is now a staple in sci-fi shows and films where futuristic Head-
up Displays (hud) work in a show-and-tell fashion, a camera surveillance system
can locate an object or a person, or a robot can address people and even recognize
their mood.
However, tv is, sadly to say, years ahead of reality. Actual video tracking systems
are still in a primitive stage: they are inaccurate, sloppy, slow, and usually work in
laboratory conditions only. Anyway, video tracking progression increases by leaps
and bounds and it will probably match some sci-fi standards soon.
We investigate the problem of efficiently tracking an object in a video sequence.
Nowadays there exists several efficient optimization algorithms for video tracking
or image registration. We study two of the fastest algorithms available: the Hager-
Belhumeur factorization algorithm and the Baker-Matthews inverse compositional
algorithm. Both algorithms, although very efficient for planar registration, present
diverse problems for 3D tracking. This thesis studies which assumptions can be done
with these algorithms whilst underlining their limitations through extensive testing.
Eventually, the objective is to provide a detail description of each algorithm, pointing
out pros and cons, leading to a kind of Quick Guide to Efficient Tracking Algorithms.
1.2 Applications
Typical applications for 3D tracking include target localization for military oper-
ations; security and surveillance tasks such as person counting, face identification,
people detection, determining people activity or detecting left objects; it also in-
cludes human-computer interaction for computer security, aids for disabled people
or even controlling video-games. Tracking is used for augmenting video sequences
with additional information such as advertisements, expanding information about
the scene, or adding or removing objects of the scene. We show some examples of
actual industrial applications in Figure 1.4.
A tracking process that is widely used in film industry is Motion Capture: we
track the motion of the different parts of the an actor’s body using a suit equipped
with reflective markers; then, we transfer the estimated motion to a computer-
generated character (see Figure 1.5). Using this technique, we can animate a syn-
thetic 3D character in a movie as Gollum in the Lord of the Rings trilogy (2001),
or Jar-Jar Binks in the new Star Wars trilogy (1999). Another relevant movies
that employ these techniques are Polar Express (2004), King Kong (2005), Beowulf
(2007), A Christmas Carol (2009), and Avatar (2009). Furthermore, we can generate
a complete computer-generated movie populated with characters animated through
motion capture. Facial motion capture is of special interest for us: we animate a
computer-generated facial expression by facial expression tracking (see Figure 1.5).
We turn our attention to markerless facial motion capture, that is, the process
of recovering the face expression and orientation without using fiducial markers.
Markerless motion capture does not require special equipment—such as close-up
8
Figure 1.4: Industrial applications of 3D tracking. (Top-left) Augmented reality
inserts virtual objects into the scene. (Top-middle) Augmented reality shows additional
information about tracked objects in the scene. Source:Hawk-eye, Hawk-Eye Innovations
Ltd., copyright c 2008. Top-right Tracking a pedestrian for video surveillance. Source:
Martin Communications, copyright c 1998-2007. Bottom-left People flow counter by
tracking. Source: EasyCount, by Keeneo, copyright c 2010. Bottom-middle Car track-
ing detects possible traffic infractions or estimates car speed. Source: Fibridge, copy-
right c . Bottom-right Body tracking is used for interactive controlling of video-games.
Source: Kinect, Microsoft, copyright c 2010.
cameras—or a complicated set-up on the actor’s face—such as special reflective
make-up or facial stickers. In this thesis we propose a technique that captures facial
expressions motion by only using brightness information and a prior knowledge on
the deformation of the target (see Figure 1.6).
1.3 Contributions of the Thesis
We outline the remaining chapters of the thesis and their principal contributions as
follows:
Chapters2: Literature Review We provide a detailed survey of the literature
on techniques for both image registration and tracking.
Chapters3: Efficient Image Registration We review the state-of-the-art on
efficient methods. We introduce the taxonomy for efficient registration algorithms:
9
Figure 1.5: Motion capture in the film industry. Facial and body motion capture
from AvatarTM (Top-row) and Polar ExpressTM (Bottom-row). (Left-column) The
body motion and head pose are computed using reflective fiducial markers—grey spheres
of the motion capture jumpsuit. For facial expression capture they use plenty of smaller
markers and even close-up cameras. (Right-column) They use the estimated motion to
animate characters in the movie. Source: Avatar, 20th Century Fox, copyright c 2009;
Polar Express, Warner Bros. Pictures, copyright c 2004.
an algorithm is classified as either additive or compositional.
Chapter 4: Equivalence of Gradients We introduce the gradient equiva-
lence equation constraint: we show that the accomplishment of this assumption
has positive effects on the performance of the algorithms.
Chapter 5: Additive Algorithms We review which constraints determine the
convergence of additive registration algorithms, specially the factorization approach.
We provide a methodical procedure to factorize an algorithm in general form; we
state a basic set of theorems and lemmas that enable us to systematize the factor-
ization. We introduce two tracking algorithms using factorization: one for rigid 3D
objects, and other for deformable 3D objects.
10
Figure 1.6: Markerless facial motion capture. (Top) Several frames where the
face modifies both its orientation—due to a rotation—and its shape structure—due to
changes in facial expression. (Bottom) The tracking state vector includes both pose and
deformation. Legend: Blue Actual projection of the target shape using the estimated
parameters; Pink Highlighted projections corresponding to profiles of the jaw, eyebrows,
lips and nasolabial wrinkles.
Chapter 6: Compositional Algorithms We review the basic inverse composi-
tional algorithm. We introduce an alternative efficient compositional algorithm that
is equivalent to the inverse compositional algorithm under certain assumptions. We
show that if the gradient equivalent equation holds then both efficient compositional
methods shall converge.
Chapter 7: Computational Complexity We study the resources used by the
registration algorithms in terms of their computational complexity. We compare the
theoretical complexities of efficient and nonefficient algorithms.
Chapter8: Experiments We devise a set of experimental tests that shall con-
firm our assumptions on the registration algorithms, that is, (1) the dependence
of the convergence on the algorithm constraint, and (2) evaluate the theoretical
complexities with actual data.
Chapter 9: Conclusions and Future Work Finally, we drawn conclusions
about where each technique is more suitable to be used, and we provide insight into
future work to improve the proposed methods.
11
Chapter 2
Literature Review
In this chapter we review the basic literature on tracking and image registration.
First we introduce the basic similarities and differences between image registration
and tracking. Then, we review the usual methods for both tracking and image
registration.
2.1 Image Registration vs. Tracking
The frontier between image registration and tracking is a bit fuzzy: tracking identi-
fies the location of an object in a sequence of images, whereas registration finds the
pixel-to-pixel correspondence between a pair of images. Note that in both cases we
compute a geometric and photometric transformation between images: pairwise in
the context of image registration and among multiple images for the tracking case.
Although we may indistinctly use the terms registration and tracking, we define the
following subtle semantic differences between them:
• Image registration finds the best alignment between two images of the same
scene. We use use a geometric transformation to align the images of both
cameras. We consider that image registration emphasizes in finding the best
alignment between two images in visual terms, not in accurately recovering
parameters of the transformation—this is usually the case in e.g., medical
applications.
• Tracking finds the location of a target object in each frame of a sequence. We
assume that the difference of object position between two consecutive frames is
small. In tracking we are typically interested in recovering the parameters de-
scribing the state of the object rather than the coordinates of the location: we
can describe an object using richer information that just its position (e.g. 3D
orientation, modes of deformation, lighting changes, etc.). This is usually the
case in robotics [Benhimane and Malis, 2007; Cobzas et al., 2009; Nick Molton,
2004], or augmented reality [Pilet et al., 2008; Simon et al., 2000; Zhu et al.,
2006].
13
Also, image registration involves two images with arbitrary baseline whereas track-
ing usually operates in a sequence with a small inter-frame baseline. We assume
that tracking is a higher level problem than image registration. Furthermore, we
propose a tracking-by-registration approach: we track an object through a sequence
by iteratively registering pairs of consecutive images [Baker and Matthews, 2004];
however, we can perform tracking without any registration at all (e.g. tracking-
by-detection [Viola and Jones, 2004], or tracking-by-classification [Vacchetti et al.,
2004]).
2.2 Image Registration
Image registration is a classic topic in computer vision and numerous approaches
have been proposed in the literature; two good surveys in the subject are [Brown,
1992] and [Zitova, 2003]. The process involves computing the pixel-to-pixel corre-
spondence between the two images: that is, for each pixel on one image we find
the corresponding pixel in the other image so that both pixels project from the
same actual point in the scene (cf. Figure 1.1). Applications include image mo-
saicing [Capel, 2004; Irani and Anandan, 1999; Shum and Szeliski, 2000], video
stitching [Caspi and Irani, 2002], super-resolution [Capel, 2004; Irani and Peleg,
1991], region tracking [Baker and Matthews, 2004; Hager and Belhumeur, 1998; Lu-
cas and Kanade, 1981], recovering scene/camera motion [Bartoli et al., 2003; Irani
et al., 2002], or medical image analysis [Lester and Arridge, 1999].
Image registration methods commonly fall in one of the two following groups [Bar-
toli, 2008; Capel, 2004; Irani and Anandan, 1999]:
Direct methods A direct image registration method aligns two images by only
using the colour—or intensity in greyscale data—values of the pixels that
are common to both images (namely, the region of support). Direct meth-
ods minimize an error measure based on image brightness from the region of
support. Typical error measures include a L2
-norm of the brightness differ-
ence [Irani and Anandan, 1999; Lucas and Kanade, 1981], normalized cross-
correlation [Brooks and Arbel, 2010; Lewis, 1995], or mutual information [Dow-
son and Bowden, 2008; Viola and Wells, 1997].
Feature-based methods In feature-based methods, we align two images by com-
puting the geometric transformation between a set of salient features that
we detect in each image. The idea is to abstract distinct geometric image
features that are more reliable than the raw intensity values; typically these
features show invariance with respect to modifications of the camera point-of-
view, illumination conditions, scale, or orientation of the scene [Schmid et al.,
2000]. Corners or interest points [Bay et al., 2008; Harris and Stephens, 1988;
Lowe, 2004; Torr and Zisserman, 1999] are classical features in the literature,
although we can use other features such us edges [Bartoli et al., 2003], or
extremal image regions [Matas et al., 2002].
14
Direct or feature-based methods? Choosing between direct or feature-based
methods is not an easy task: we have to know the strong points of each method
and for what applications it is more suitable. A good comparison between the two
types of methods is [Capel, 2004]. Feature-based methods typically show strong
invariance to a wide range of photometric and geometric transformation of the im-
age, and they are more robust to partial occlusions of the scene that their direct
counterparts [Capel, 2004; Torr and Zisserman, 1999]. On the other hand, direct
methods can align images with sub-pixel accuracy, estimate dominant motion even
when multiple motion are present, and they can provide dense motion field in case of
3D estimation [Irani and Anandan, 1999]. Moreover, direct methods do not require
high-frequency textured surfaces (corners) to operate, but have optimal performance
with smooth graylevel transitions [Benhimane et al., 2007].
2.3 Model-based 3D Tracking
In this section we define what is model-based tracking, and we review the previous
literature on 3D tracking of rigid and nonrigid objects. A special case of interest
for nonrigid objects is the 3D tracking of human faces or facial motion capture.
Recovering the 3D orientation and position of the target can be done with respect
to the camera (or an arbitrary reference system), or the relative displacement and
orientation of the camera with respect to the target (or another arbitrary reference
system in the scene), [Sepp, 2008]. A good survey on the subject is [Lepetit and
Fua, 2005].
2.3.1 Modelling assumptions
In model-based techniques we use a priori knowledge about the scene, the target,
or the sensing device, as a basis for the tracking procedure. We classify these
assumptions on the real-world information as follows:
Target model
The target model specifies how to represent the information about the structure of
the scene in our algorithms. Template tracking or template matching simply repre-
sents the target as the pixel intensity values inside a region defined on one image:
we call this region—or the image itself—the reference image or template. One of
the first proposed technique for template matching was [Lucas and Kanade, 1981],
although it was initially devised for solving optical flow problems. The literature
proposes numerous extensions to this technique [Baker and Matthews, 2004; Benhi-
mane and Malis, 2007; Brooks and Arbel, 2010; Hager and Belhumeur, 1998; Jurie
and Dhome, 2002a].
We may also allow the target to deform its shape: this deformation induces
changes in the target projected appearance. We model these changes in target
texture by using generative models such as eigenimages [Black and Jepson, 1998;
15
Buenaposada et al., 2009], Active Appearance Models (aam) [Cootes et al., 2001],
active blobs [Sclaroff and Isidoro, 2003], or subspace representation [Ross et al.,
2004]. Instead of modelling brightness variations we may represent target shape
deformation by using a linear model representing the location of a set of feature
points [Blanz and Vetter, 2003; Bregler et al., 2000; Del Bue et al., 2004], or Finite
Element Meshes [Pilet et al., 2005; Zhu et al., 2006]. Alternative approaches model
non-rigid motion of the target by using anthropometric data [Decarlo and Metaxas,
2000], or by using a probability distribution of the intensity values of the target
region [Comaniciu et al., 2000; Zimmermann et al., 2009].
These techniques are suitable to track planar objects of the scene. If we add fur-
ther knowledge about the scene, we can track more complex objects: with a proper
model we are able to recover 3D information. Typically, we use a wireframe 3D
model of the target and tracking consists on finding the best alignment between the
sensed image and the 3D model [Cipolla and Drummond, 1999; Kollnig and Nagel,
1997; Marchand et al., 1999]. We can augment this model by adding further texture
priors either from the image stream [Cobzas et al., 2009; Mu˜noz et al., 2005; Sepp
and Hirzinger, 2003; Vacchetti et al., 2004; Xiao et al., 2004a; Zimmermann et al.,
2006], or from and external source (e.g. a 3D scanner or a texture mosaic) [Hong
and Chung, 2007; La Cascia et al., 2000; Masson et al., 2004, 2005; Pressigout and
Marchand, 2007; Romdhani and Vetter, 2003].
Motion model
The motion model describes the target kinematics (i.e. how the object modifies
its position in the image/scene). The motion model is tightly coupled to the tar-
get model: it is usually represented by a geometric transformation that maps the
coordinates of the target model into a different set of coordinates. For a planar
target, these geometric transformations are typically affine [Hager and Belhumeur,
1998], homographic [Baker and Matthews, 2004; Buenaposada and Baumela, 1999],
or spline-based warps [Bartoli and Zisserman, 2004; Brunet et al., 2009; Lester and
Arridge, 1999; Masson et al., 2005]. For actual 3D targets, the geometric warps
account for computing the rotation and translation of the object using a 6 degree-
of-freedom (dof) rigid body transformation [Cipolla and Drummond, 1999; La Cascia
et al., 2000; Marchand et al., 1999; Sepp and Hirzinger, 2003].
Camera model
The camera model specifies how the images are sensed by the camera. The pin-
hole camera models the imaging device as a projector of the coordinates of the
scene [Hartley and Zisserman, 2004]. For tracking zoomed objects located far away,
we may use orthographic projection [Brand and R.Bhotika, 2001; Del Bue et al.,
2004; Tomasi and Kanade, 1992; Torresani et al., 2002]. The perspective projection
accounts for perspective distortion, and it is more suitable for close-up views [Mu˜noz
et al., 2005, 2009]. The camera model may also account for model deviations such
as lens distortion [Claus and Fitzgibbon, 2005; Tsai, 1987].
16
Other model assumptions
We can also model prior photometric knowledge about the target/scene such as
illumination cues [La Cascia et al., 2000; Lagger et al., 2008; Romdhani and Vetter,
2003], or global colour [Bartoli, 2008].
2.3.2 Rigid Objects
We can follow two strategies to recover the 3D parameters of a rigid object:
2D Tracking The first group of methods involves a two-step process: first we
compute the 2D motion of the object as a displacement of the target projection
on the image; second, we recover the actual 3D parameters from the computed
2D displacements by using the scene geometry. A natural choice is to use
optical flow: [Irani et al., 1997] computes the dominant 2D parametric motion
between two frames to register the images; the residual displacement—the
image regions that cannot be registered—is used to recover the 3D motion.
When the object is a 3D plane, we can use a homographic transformation to
compute plane-to-plane correspondences between two images; then we recover
the actual 3D motion of the plane using the camera geometry [Buenaposada
and Baumela, 2002; Lourakis and Argyros, 2006; Simon et al., 2000]. We
can also compute the inter-frame displacements by using linear regressors or
predictors, and then we robustly adjust the projections to a target model—
using RANSAC—to compute the 3D parameters [Zimmermann et al., 2009]. An
alternative method is to compute pixel-to-pixel correspondences by using a
classifier [Lepetit and Fua, 2006], and then recover the target 3D pose using
POSIT [Dementhon and Davis, 1995], or equivalent methods [Lepetit et al.,
2009].
3D Tracking These methods directly compute the actual 3D motion of the object
from the image stream. They mainly use a 3D model of the target to compute
the motion parameters; the 3D model contains a priori knowledge of the target
that improves the estimation of motion parameters (e.g. to get rid of projec-
tive ambiguities). The simplest way to represent a 3D target is using a texture
model—a set of image patches sensed from one or several reference images—as
in [Cobzas et al., 2009; Devernay et al., 2006; Jurie and Dhome, 2002b; Masson
et al., 2004; Sepp and Hirzinger, 2003; Xu and Roy-Chowdhury, 2008]. The
main drawback of these methods is the lack of robustness against changes in
scene illumination, specular reflections. We can alternatively fit the projection
of a 3D wireframe model (e.g. a cad model) to the edges of the image [Drum-
mond and Cipolla, 2002]. However, these methods have also problems with
cluttered backgrounds [Lepetit and Fua, 2005]. To gain robustness, we can use
hybrid models of texture and contours such as [Marchand et al., 1999; Masson
et al., 2003; Vacchetti et al., 2004], or simply use an additional model to deal
with illumination [Romdhani and Vetter, 2003].
17
2.3.3 Nonrigid Objects
Tracking methods for nonrigid objects fall in the same categories that we used for
rigid ones. Point-to-point correspondences of the deformable target can recover
the pose and/or deformation parameters using subspace methods [Del Bue, 2010;
Torresani et al., 2008], or fitting a deformable triangle mesh [Pilet et al., 2008;
Salzmann et al., 2007]. We can alternatively fit the 2D silhouette of the target to a
3D skeletal deformable model of the object [Bowden et al., 2000].
Direct estimation of the 3D parameters unifies the processes of matching pixel
correspondences, and estimating the pose and deformation of the target. [Brand,
2001; Brand and R.Bhotika, 2001] constrains the optical flow by using a linear
generative model to represent the deformation of the object. [Gay-Bellile et al.,
2010] models the object 3D deformations, including self-occlusions, by using a set
of Radial Basis Functions (rbf).
2.3.4 Facial Motion Capture
Estimation of facial motion parameters is a challenging task; head 3D orientation
was typically estimated by using fiducial markers to overcome the inherent difficulty
of the problem [Bickel et al., 2007].
However, markerless methods have been also developed in recent years. Facial
motion capture involves recovering head 3D orientation and/or face deformation due
to changes in expression. We first review techniques for recovering head 3D pose,
then we review techniques for recovering both pose and expression.
Head pose estimation There are numerous techniques to compute head pose or
3D orientation. In the following, we review a number of them—a recent detailed
survey on the subject is [Murphy-Chutorian and Trivedi, 2009]. The main difficulty
of estimating head pose lies on the nonconvex structure of the human head. Classic
2D approaches such as [Black and Yacoob, 1997; Hager and Belhumeur, 1998] are
only suitable to track motions of the head parallel to the image plane: the rea-
son is that these methods only use information from a single reference image. To
fully recover the 3D rotation parameters of the head we need additional informa-
tion. [La Cascia et al., 2000] uses a texture map that was computed by cylindrical
projection of different point-of-view images of the head; [Baker et al., 2004a; Jang
and Kanade, 2008] also use an analogous cylindrical model. In a similar fashion, we
can use a 3D ellipsoid shape [An and Chung, 2008; Basu et al., 1996; Choi and Kim,
2008; Malciu and Prˆeteux, 2000]. Instead of using a cylinder or an ellipsoid, we can
have a detailed model of the head like a 3D Morphable Model (3dmm) [Blanz and
Vetter, 2003; Mu˜noz et al., 2009; Xu and Roy-Chowdhury, 2008], an aam coupled
together with a 3dmm [Faggian et al., 2006], or a triangular mesh model of the
face [Vacchetti et al., 2004]. The latter is robustly tracked in [Strom et al., 1999]
using an Extended Kalman Filter. We can also have a head model with reduced
complexity as in [B. Tordoff et al., 2002].
18
Face expression estimation A change of facial expression induces a deforma-
tion in the 3D structure of the face. The estimation of this deformation can be
used for face expression recognition, expression detection, or facial motion trans-
fer. Classic 2D approaches such as aams [Cootes et al., 2001; Matthews and Baker,
2004] are only suitable to recover expressions from a frontal face. 3D aams are the
three-dimensional extension to these 2D methods: they adjust a statistical model
of 3D shapes and texture—typically a PCA model—to the pixel intensities of the
image [Chen and Wang, 2008; Dornaika and Ahlberg, 2006]. Hybrid methods that
combine 2D and 3D aams show both real-time performance and actual 3D head
pose estimation: we can use the 3D aams to simultaneously constrain the 2D aams
motion and compute the 3D pose [Xiao et al., 2004b], or directly compute the fa-
cial motion from the 2D aams parameters [Zhu et al., 2006]. In contrast to pure
2D aams, 3D aams can recover actual 3D pose and expression from faces that are
not frontal to the camera. However, the out-of-plane rotations that can be recov-
ered by these methods are typically smaller than using a pure 3D model (e.g. a
3dmm). [Blanz and Vetter, 2003; Romdhani and Vetter, 2003] search the best con-
figuration for a 3dmm such that the differences between the rendered model and the
image are minimal; both methods also show great performance recovering strong fa-
cial deformations. Real-time alternatives using 3dmm include [Hiwada et al., 2003;
Mu˜noz et al., 2009]. [Pighin et al., 1999] uses a linear combination of 3D face models
fitted to match the images to estimate realistic facial expressions. Finally, [Decarlo
and Metaxas, 2000] derives an anthropometric physically-based face model that may
be adjusted to each individual face target; besides, they solve a dynamic system for
the face pose and expression parameters by using optical flow constrained by the
edges of the face.
19
Chapter 3
Efficient Direct Image Registration
3.1 Introduction
This chapter reviews the problem of efficiently registering two images. We define
Direct Image Alignment (dia) problem as the process that computes the trans-
formation between two frames using only image brightness information. We orga-
nize the chapter as follows: Section 3.2 introduces basic registration notions; Sec-
tion 3.3 reviews additive registration algorithms such as Lucas-Kanade or Hager-
Belhumeur; Section 3.4 reviews compositional registration algorithms such as Baker
and Matthews’ Forward Compositional and Inverse Compositional; finally, other
methods are reviewed in Section 3.5.
3.2 Modelling Assumptions
This section reviews those assumptions on the real world that we use to mathemat-
ically model the registration procedure. We introduce the notation on the imaging
process through a pinhole camera. We ascertain the Brightness Constancy Assump-
tion or Brightness Constancy Constraint (bcc) as the cornerstone of the direct
image registration techniques. We also pose the registration problem as an itera-
tive optimization problem. Finally, we provide a classification of the existing direct
registration algorithms.
3.2.1 Imaging Geometry
We represent points of the scene using Cartesian coordinates in R3
(e.g. X =
(X, Y, Z)⊤
). We represent points on the image with homogeneous coordinates, so
that the pixel position x = (i, j)⊤
is represented using the notation for augmented
points as ˜x = (i, j, 1)⊤
. The homogeneous point ˜x = (x1, x2, x3)⊤
is conversely
represented in Cartesian coordinates using the mapping p : P2
→ R2
, such that
p(˜x) = x = (x1/x3, x2/x3). The scene is imaged through a perfect pin-hole cam-
era [Hartley and Zisserman, 2004]; by abuse of notation, we define the perspective
21
Figure 3.1: Imaging geometry. An object of the scene is imaged through camera
centres C1 and C2 onto two distinct images I1 and I2 (related by a rotation R and
a translation t). The point X is projected to the points x1 = p(K I|0 ˜X) and x2 =
p(K R − Rt ˜X) in the two images.
projection p : R3
→ R2
that maps scene coordinates onto image points,
x = p(Xc) =
k⊤
1 Xc
k⊤
3 Zc
,
k⊤
2 Yc
k⊤
3 Zc
⊤
,
where K = (k⊤
1 , k⊤
2 , k⊤
3 )⊤
is the 3 × 3 matrix that contains the camera intrinsics
(cf. [Hartley and Zisserman, 2004]), and Xc = (Xc, Yc, Zc)⊤
. We implicitly assume
that Xc represents a point in the camera reference system. If the points to project
are expressed in an arbitrary reference system of the scene we need an additional
mapping; hence, the perspective projection for a point X in the scene is
˜x = K R − Rt
X
1
,
where R and t are the rotation and translation between the scene and the camera
coordinate system (see Figure 3.1). Our input is a smooth sequence of images—i. e.
inter-frame differences are small—where It is the t-th frame of the sequence. We de-
note T as the reference image or template. Images are discrete matrices of brightness
values, although we represent them as functions from R2
to RC
, where C is the num-
ber of image channels (i.e. C = 3 for colour, and C = 1 for gray-scale images): It(x) is
the brightness value at pixel x. For non-discrete pixel coordinates, we use bilinear in-
terpolation. If X is a set of pixels, we collect the brightness values of I(x), ∀x ∈ X in
a single column vector as I(X)—i.e., I(X) = (I(x1), . . . , I(xN))⊤
, {x1, . . . , xN} ∈ X.
22
3.2.2 Brightness Constancy Constraint
The bcc relates brightness information between two frames of a sequence [Hager
and Belhumeur, 1998; Irani and Anandan, 1999]. The reference image T is one
arbitrary image of the sequence. We define the target region X as a set of pixel
coordinates X = {x1, . . . , xN} defined on T (see Figure 3.2). We define the template
as the image values of the target region, that is, T (X). Let us assume we know
the transformation of the target region between T and another arbitrary image of
the sequence, It. The motion model f defines this transformation as Xt = f(X; µt),
where the set of coordinates Xt is the target region on It and µt are the motion
parameters. The bcc states that the brightness values of the template T and the
input image It warped by f with parameters µt should be equal,
T (X) = It(f(X; µt)). (3.1)
The direct conclusion from Equation 3.1 is that the brightness of the target does not
depend on its motion—i.e., the relative position and orientation of the camera with
respect the target does not affect the brightness of the latter. However, we may aug-
ment the bcc to include appearance changes [Black and Jepson, 1998; Buenaposada
et al., 2009; Matthews and Baker, 2004], and changes in illumination conditions due
to ambient [Bartoli, 2008; Basri and Jacobs, 2003] or specular lighting [Blanz and
Vetter, 2003].
3.2.3 Image Registration by Optimization
Direct image registration is usually posed as an optimization problem. We minimize
an error function based on the brightness pixel-wise difference that is parameterized
by motion variables:
µ∗
= arg min
µ
{D(X; µ)2
}, (3.2)
where
D(X; µ) = T (X) − It(f(X; µ)) (3.3)
is a dissimilarity measure based on the bcc (Equation 3.1).
Descent Methods
Recovering these parameters is typically a non-linear problem as it depends on
image brightness—which is usually non-linearly related to the motion parameters.
The usual approach is iterative gradient-based descent (GD): from a starting point
µ0 in the search space, the method iteratively computes a series of partial solu-
tions µ1, µ2, . . . µk that, under certain conditions, converge to the local minimizer
µ∗
[Madsen et al., 2004] (see Figure 3.2). We typically use Gauss-Newton (GN)
methods for efficient registration because they provide good convergence without
computing second derivatives (see Appendix A). Hence, the basic GN-based algo-
rithm for image registration operates as we outline in Algorithm 1 and depict in
Figure 3.3. We describe the four stages of the algorithm in the following:
23
Figure 3.2: Iterative gradient descent image registration. Top-left Template
image for the registration. We highlight the target region as a green quadrangle. Top-
right Image that we register against the template. We generate the image by rotating the
image around its centre and translating it in the X-axis. We highlight the corresponding
target region in yellow. We also display the initial guess for the optimization as a green
quadrangle. Notice that it exactly corresponds to the position of the target region at the
template. Bottom-left Contour plot of the image brightness dissimilarity. The axis show
the values of the search space: image rotation and translation. We show the successive
iterations in the search space: we reach the solution in four steps—µ0 to µ4. Bottom-
right We show the target region that corresponds to the parameters of each iteration.
The colour of each quadrangle matches the colour of the parameters that generated it as
seen in the Bottom-left figure.
24
Dissimilarity measure The dissimilarity measure is a function on the image bright-
ness error between two images. The usual measure for image registration is
the Sum of Squared Differences (ssd), that is, the L2
-norm of the difference
of pixel brightness (Equation 3.3) [Brooks and Arbel, 2010; Hager and Bel-
humeur, 1998; Irani and Anandan, 1999; Lucas and Kanade, 1981]. However,
we can use other measures such as normalized cross-correlation [Brooks and
Arbel, 2010; Lewis, 1995], or mutual information [Brooks and Arbel, 2010;
Dowson and Bowden, 2008; Viola and Wells, 1997].
Linearize the dissimilarity The next stage linearizes the brightness function about
the current search parameters µ; this linearization enables us to transform
the problem into a system of linear equations on the search variables. We
typically approximate the function using Taylor series expansion; depending
on how many terms—derivatives—we compute, we have optimisation methods
like Gradient Descent [Amberg and Vetter, 2009], Newton-Raphson [Lucas and
Kanade, 1981; Shi and Tomasi, 1994], Gauss-Newton [Baker and Matthews,
2004; Brooks and Arbel, 2010; Hager and Belhumeur, 1998] or even higher-
order methods [Benhimane and Malis, 2007; Keller and Averbuch, 2004, 2008;
Megret et al., 2008]. This is theoretically a good approximation when the dis-
similarity is small [Irani and Anandan, 1999], although the estimation can be
improved by using coarse-to-fine iterative methods [Irani and Anandan, 1999],
or by selecting appropriate pixels [Benhimane et al., 2007]. Although Taylor
series expansion is the usual approach to compute the coefficients of the sys-
tem, other approaches such as linear regression [Cootes et al., 2001; Jurie and
Dhome, 2002a] or numeric differentiation [Gleicher, 1997] may be used.
Compute the descent direction The descent direction is a vector δµ in the
search space such that D(µ+δµ) < D(µ). In a GN-based algorithm, we solve
the linear system of equations of the previous stage using least-squares [Baker
and Matthews, 2004; Madsen et al., 2004]. Note that we do not perform the
line search stage—i.e., we implicitly assume that the step size α = 1, cf.
Appendix A.
Update the search parameters Once we have determined the search direction,
δµ, we compute the next point in the series by using the update function
U : RP
→ RP
: µ1 = U(µ0, δµ). We compute the dissimilarity value at µ1 to
check convergence: if the dissimilarity is below a given threshold, then µ1 is the
minimizer µ∗
—i.e., µ∗
= µ1; in other case, we repeat the whole process (i.e.
µ1 are the actual current parameters µ) until we find a suitable minimizer.
3.2.4 Additive vs. Compositional
We turn our attention to the step 4 of Algorithm 1: how to compute the new es-
timation of the optimization parameters. In a GN optimization scheme, the new
25
Algorithm 1 Outline of the basic GN-based descent method for image
registration
On-line: Let µi = µ0 be the initial guess.
1: while no convergence do
2: Compute the dissimilarity function at D(µi).
3: Compute the search direction: linearize the dissimilarity and compute the
descent direction, δµi.
4: Update the optimization parameters:µi+1 = U(µi, δµi).
5: end while
Figure 3.3: Generic descent method for image registration. We initialize the
current parameter estimation at frame It+1 (µ = µ0) using the local minimizer at the
previous frame It (µ0 = µ∗
t ). We compute the Dissimilarity Measure between the Im-
age and the Template using µ (Equation 3.3). We linearize the dissimilarity measure
to compute the descent direction of the search parameters (δµ). We update the search
parameters using the search direction and we obtain an approximation to the minimum
(µ1). We check if µ1 is a local minimizer by using the brightness dissimilarity: if D is
small enough, then µ1 is the local minimizer (µ∗ = µ1); in other case, we repeat the
process with using µ1 as the current parameters estimation (µ = µ1).
26
parameters are typically computed by adding the former optimization parameters
to the search direction vector: µt+1 = µt + δµt (cf. Appendix A); this summation
is a direct consequence of the definition of Taylor series [Madsen et al., 2004]. We
call additive approaches to those methods that update parameters by using addi-
tion [Hager and Belhumeur, 1998; Irani and Anandan, 1999; Lucas and Kanade,
1981]. Nonetheless, Baker and Matthews [Baker and Matthews, 2004] subsequently
proposed a GN-based method that updated the parameters using composition—
i.e., µt+1 = µt ◦ δµt. We call these methods compositional approaches [Baker and
Matthews, 2004; Cobzas et al., 2009; Mu˜noz et al., 2005; Romdhani and Vetter,
2003; Xu and Roy-Chowdhury, 2008].
3.3 Additive approaches
In this section we review some works that use additive update. We introduce the
Lucas-Kanade algorithm, the fundamental work on direct image registration. We
show the basic algorithm as well as the common problems regarding the method. We
also introduce the Hager-Belhumeur approach to image registration and we point
out its highlights.
3.3.1 Lucas-Kanade Algorithm
The Lucas-Kanade (LK) algorithm [Lucas and Kanade, 1981] solves the registration
problem using a GN optimization scheme. The algorithm defines the residuals r of
Equation 3.3 as
r(µ) ≡ T(x) − I(f(x; µ)). (3.4)
The corresponding linear model for these residuals is
r(µ + δµ) ≃ ℓ(δµ) ≡ r(µ) + r′
(µ)δµ
= r(µ) + J(µ)δµ,
(3.5)
where
r(µ) ≡ T(x) − I(f(x; µ)), and J(µ) ≡
∂I(f(x; ˆµ)
∂ ˆµ ˆµ=µ
. (3.6)
Hence, our optimization process amounts to minimise now
δµ∗
= arg min
δµ
{ℓ(δµ)⊤
ℓ(δµ)} = arg min
δµ
{L(δµ)}. (3.7)
We compute the local minimizer of L(δµ) as follows:
0 = L′
(δµ) = ∇δµ r(µ)⊤
r(µ) + 2δµ⊤
J(µ)r(µ) + δµ⊤
J(µ)⊤
J(µ)δµ
= J(µ)r(µ) + J(µ)⊤
J(µ)δµ.
(3.8)
Again, we obtain an approximation to the local minimum at
δµ = − J(µ)⊤
J(µ)
−1
J(µ)⊤
r(µ), (3.9)
which we iteratively refine until we find a suitable solution. We summarize the
optimization process in Algorithm 2 and Figure 3.4.
27
Algorithm 2 Outline of the Lucas-Kanade algorithm.
On-line: Let µi = µ0 be the initial guess.
1: while no convergence do
2: Compute the residual function at r(µi) from Equation 3.4.
3: Linearize the dissimilarity: J = ∇µr(µi)
4: Compute the search direction: δµi = − J(µi)⊤
J(µi)
−1
J(µi)⊤
r(µi).
5: Update the optimization parameters:µi+1 = µi + δµi.
6: end while
Figure 3.4: Lucas-Kanade image registration. We initialize the current parameter
estimation at frame It+1 (µ ≡ µ0) using the local minimizer at the previous frame It
(µ0 ≡ µ∗
t ). We compute the dissimilarity residuals between the Image and the Template
using µ (Equation 3.4). We linearize the residuals at the current parameters µ, and
we compute the descent direction of the search parameters (δµ). We additively update
the search parameters using the search direction and we obtain an approximation to the
minimum—i.e. µ1 = µ0 + δµ. We check if µ1 is a local minimizer by using the brightness
dissimilarity: if D is small enough, then µ1 is the local minimizer (µ∗ ≡ µ1); in other
case, we repeat the process with using µ1 as the current parameters estimation (µ ≡ µ1).
28
Known Issues
The LK algorithm is one instance of a well known technique for object tracking, [Baker
and Matthews, 2004]. The most remarkable feature of this algorithm is its robust-
ness: given a suitable bcc, the LK algorithm typically ensures a good convergence.
However, the algorithm has a series of weaknesses that degrades the overall perfor-
mance of the tracking:
Computational Cost The LK algorithm computes the Jacobian at each iteration
of the optimization loop. Furthermore, the minimization cycle is repeated
between each two consecutive frames of the video sequence. The consequence
is that the Jacobian is computed F × L times, where F is the number of frames
and L is the number of iterations in the optimization loop. The computational
burden of these operations is really high if the Jacobian is large: we have to
compute the derivatives at each point of the target region, and each point
contributes to a row in the Jacobian. As an example, Table 7.15—page 106—
compares the computational complexity of LK algorithm with respect to other
efficient methods.
Local Minima The GN optimization scheme, which is the basis for the LK al-
gorithm, is prone to get trapped in local minima. The very essence of the
minimization implies that the algorithm converges to the closest minimum
to the starting point. So, we must choose the initial guess of the optimiza-
tion very carefully to assure convergence to the true optimum. The best way
to guarantee that the starting point for tracking and the optimum are close
enough is imposing that the differences between consecutive images are small.
On the contrary, images with large baseline will cause problems to LK as falling
into local minima is more likely, which leads to incorrect alignment. To solve
this problem, common to all direct approaches, a pyramidal implementation
of the optimization may be used [Bouguet, 2000].
3.3.2 Hager-Belhumeur Factorization Algorithm
We review now an efficient algorithm for determining the motion parameters of the
target. The algorithm is similar to LK, but uses a priori information about the tar-
get motion and structure to save computation time. The Hager-Belhumeur (HB) or
factorization algorithm was first proposed by G. Hager and P. Belhumeur in [Hager
and Belhumeur, 1998]. The authors noticed the high computational cost when lin-
earizing the brightness error function in the LK algorithm: the dissimilarity depends
on each different frame of the sequence, It. The method focuses on how to efficiently
compute the Jacobian matrix of step 3 of the LK algorithm (see Algorithm 2). The
computation of the Jacobian in the HB algorithm has two separate stages:
1. Gradient replacement
The key idea is to use the derivatives at the template T instead of computing
the derivatives at frame It when estimating J. Hager and Belhumeur dealt with
29
this issue in a very neat way: they noticed that, if the bcc (Equation 3.1) re-
lated image and template brightness values, it could possibly relate also image
and template derivatives—cf. [Hager and Belhumeur, 1998]. The derivatives
of both sides of Equation 3.1 with respect to the target region coordinates are
∇xT (x) =∇xIt(f(x; µt)), x ∈ X,
=∇xIt(x)∇xf(x; µ), x ∈ X. (3.10)
On the other hand, we compute the Jacobian as
J =∇µt
It(f(x; µt)),
=∇xIt(x)∇µt
f(x; µ). (3.11)
We isolate the term ∇tIx(x) in Equations 3.10 and 3.11, and we equal the
remaining terms as follows:
J = ∇xT (x)∇xf(x; µ)−1
∇µt
f(x; µ). (3.12)
Notice that in Equation 3.12 the Jacobian depends on the template derivatives,
∇xT (x), which are constant. Using template derivatives speed up the whole
process up to 10-fold (cf. Table 7.16–page 106).
2. Factorization
Equation 3.12 reveals the internal structure of the Jacobian: it comprises
the product of three matrices: a matrix ∇xT (x) that depends on template
brightness values and two matrices,∇xf(x; µ)−1
and ∇µt
f(x; µ), whose values
depend on both the target shape coordinates and the motion parameters µt.
The factorization stage re-arranges the Jacobian internal structure such that
we speed up the computation of this matrix product.
A word about factorization In the literature, matrix factorization or ma-
trix decomposition refers to the process that expresses the values of a
matrix as the product of matrices of special types. One mayor example
is to factorize a matrix A into the product of a lower triangular ma-
trix L and and upper triangular matrix U, A = LU. This factorization
is called lu decomposition and it allows us to solve the linear system
Ax = b more efficiently: solving Ux = L−1
b require fewer additions and
multiplications than the original system, [Golub and Van Loan, 1996].
Other famous examples of matrix factorization are spectral decomposi-
tion, Cholesky factorization, Singular Value Decomposition (svd) and
qr factorization (see [Golub and Van Loan, 1996] for more information).
The key concept on using factorization in this problem states as follows:
Given a matrix product whose operands contain both constant and
variable terms, we want to re-arrange the product such that one
operand contains only constant values and the other one only con-
tains variable terms.
30
We rewrite this idea in equation as follows:
J = ∇xT (x)∇xf(x; µ)−1
∇µt
f(x; µ) = S(x)M(µ), (3.13)
where S(x) contains only target coordinate values and M(µ) contains only
motion parameters. The process to decompose the matrix J into the product
S(x)M(µ) is generally ad hoc: we must gain insight of the analytic structure
of the matrices ∇xf(x; µ)−1
and ∇µt
f(x; µ) to re-arrange their entries into
S(x)M(µ) [Hager and Belhumeur, 1998]. This process is not obvious at all
and it has been a frequent source of criticism for the HB algorithm [Baker
and Matthews, 2004]. However, we shall introduce procedures for systematic
factorization in Chapter 5
We outline the basic HB optimization in Algorithm 3.3; notice that the only
difference with the LK algorithm lies on the Jacobian computation. We depict the
differences more clearly in Figure 3.5: in the dissimilarity linearization stage we use
the derivatives of the template instead of the frame.
Algorithm 3 Outline of the Hager-Belhumeur algorithm.
Off-line: Let µi = µ0 be the initial guess.
1: Compute S(x)
On-line:
2: while no convergence do
3: Compute the residual function at r(µi) from Equation 3.4.
4: Compute the matrix M(µi)
5: Compute the Jacobian: J(µi) = S(x)M(µi)
6: Compute the search direction: δµi = − J(µi)⊤
J(µi)
−1
J(µi)⊤
r(µi).
7: Update the optimization parameters:µi+1 = µi + δµi.
8: end while
3.4 Compositional approaches
From Section 3.2.4 we recall the definition of compositional method: a GN-like
optimization method that updates the search parameters using function composition.
We review two compositional algorithms: the Forward Compositional (FC) and the
Inverse Compositional (IC), [Baker and Matthews, 2004].
A word about composition Function composition is usually defined as the ap-
plication of the results of a function onto another. Let f : X → Y, and
g : Y → Z be two function applications. We define the composite func-
tion g ◦ f : X → Z as (g ◦ f)(x) = g(f(x)). In the literature on image
registration the problem is posed as follows: Let f : R2
→ R2
be the tar-
get motion model parameterized by µ. We compose the target motion as
31
Figure 3.5: Hager-Belhumeur image registration. We initialize the current param-
eter estimation at frame It+1 (µ ≡ µ0) using the local minimizer at the previous frame
It (µ0 ≡ µ∗
t ). We additionally create the matrix S(x) whose entries depend on the target
values. We compute the dissimilarity residuals between the Image and the Template using
µ (Equation 3.4). Instead of linearizing the residuals, we compute the Jacobian matrix
at µ using Equation 3.12, and we solve for the descent direction using Equation 3.9. We
additively update the search parameters using the search direction and we obtain an ap-
proximation to the minimum— i.e. µ1 = µ0 + δµ. We check if µ1 is a local minimizer
by using the brightness dissimilarity: if D is small enough, then µ1 is the local minimizer
(µ∗ ≡ µ1); in other case, we repeat the process with using µ1 as the current parameters
estimation (µ ≡ µ1).
32
z = f(f(x; µ1); µ2) = f(x; µ1 ◦ µ2) ≡ f(x; µ3), that is, the coordinates z
are the result of mapping x onto y = f(x; µ1) and y onto z = f(y; µ2). We
represent the composite parameters as µ3 = µ1 ◦ µ2 such that z = f(x; µ3).
3.4.1 Forward Compositional Algorithm
The FC algorithm was first proposed in [Shum and Szeliski, 2000], although the
terminology was introduced in [Baker and Matthews, 2001]: FC is an optimization
algorithm, equivalent to the LK approach, that relies in a compositional update
step. Compositional algorithms for image registration uses a dissimilarity brightness
function slightly different from Equation 3.3; we pose the image registration problem
as the following optimization:
µ∗
= arg min
µ
{D(X; µ)2
}, (3.14)
with
D(X; µ) = T (X) − It+1(f(f(X; µ); µt)), (3.15)
where µt comprises the optimal parameters at the image It. Note that our search
variables µ are those parameters that should be composed with the current estima-
tion to yield the minimum. The residuals corresponding to Equation 3.15 are
r(µ) ≡ T(x) − It+1(f(f(x; µ); µt)), (3.16)
As in the LK algorithm, we compute the linear model of the residuals, but now at
the point µ = 0 in the search space:
r(0 + δµ) ≃ ℓ(δµ) ≡ r(0) + r′
(0)δµ
= r(0) + J(0)δµ,
(3.17)
where
r(0) ≡ T(x) − It+1(f(f(x; 0); µt)),
and J(0) ≡
∂It+1(f(f(x; ˆµ); µt)
∂ ˆµ ˆµ=0
.
(3.18)
Notice that, in this case, µt acts as a constant in the derivative. Again, the local
minimizer is
δµ = − J(0)⊤
J(0)
−1
J(0)⊤
r(0). (3.19)
We iterate the above procedure until convergence. The next point in the iterative
series is not computed as µt+1 = µt +δµ, but as µt+1 = µt ◦δµ to be coherent with
Equation 3.16. Also notice that the Jacobian J(0) (Equation 3.18) is not constant
as it depends both in the image It+1 and the parameters µt. Figure 3.6 shows a
graphical depiction of the algorithm that is outlined in Algorithm 4.
33
Algorithm 4 Outline of the Forward Compositional algorithm.
On-line: Let µi = µ0 be the initial guess.
1: while no convergence do
2: Compute the residual function at r(µi) from Equation 3.16.
3: Linearize the dissimilarity: J = ∇ˆµr(0), using Equation 3.18.
4: Compute the search direction: δµi = − J(0)⊤
J(0)
−1
J(0)⊤
r(0).
5: Update the optimization parameters:µi+1 = µi ◦ δµi.
6: end while
Figure 3.6: Forward compositional image registration. We initialize the current
parameter estimation at frame It+1 (µ ≡ µ0) using the local minimizer at the previous
frame It (µ0 ≡ µ∗
t ). We compute the dissimilarity residuals between the Image and
the Template using µ (Equation 3.15). We linearize the residuals at µ = 0 and we
compute the descent direction δµ using Equation 3.19. We update the parameters using
function composition— i.e. µ1 = µ0 ◦ δµ. We check if µ1 is a local minimizer by using
the brightness dissimilarity: if D (Equation 3.15) is small enough, then µ1 is the local
minimizer (µ∗ ≡ µ1); in other case, we repeat the process with using µ1 as the current
parameters estimation (µ ≡ µ1).
34
3.4.2 Inverse Compositional Algorithm
The IC algorithm reinterprets the FC optimization scheme by changing the roles
of the template and the image. The key feature of IC is that its GN Jacobian is
constant: we compute the Jacobian using only template brightness values, therefore
it is constant. Using a constant Jacobian speeds up the whole computation as
the linearization stage is the most critical in time. The IC algorithm receives its
name because we reverse the roles of the template and the current frame (i.e. we
compute the Jacobian on the template). We rewrite the residuals function from FC
(Equation 3.16) as follows:
r(µ) ≡ T(f(x; µ)) − It+1(f(x; µt)), (3.20)
yielding the residuals for IC. Notice that the template brightness values now depend
on the search parameters µ. We linearize the Equation 3.20 around the point µ = 0
in the search space:
r(0 + δµ) ≃ ℓ(δµ) ≡ r(0) + r′
(0)δµ
= r(0) + J(0)δµ,
(3.21)
where
r(0) ≡ T(f(x; 0)) − It+1(f(x; µt)),
and J(0) ≡
∂T(f(x; ˆµ))
∂ ˆµ ˆµ=0
.
(3.22)
We compute the local minimizer of Equation 3.7 by deriving it respect δµ and
equalling to zero,
0 = L′
(δµ) = ∇δµ r(0)⊤
r(0) + 2δµ⊤
J(0)r(0) + δµ⊤
J(0)⊤
J(0)δµ
= J(0)r(0) + J(0)⊤
J(0)δµ.
(3.23)
Again, we obtain an approximation to the local minimum at
δµ = − J(0)⊤
J(0)
−1
J(0)⊤
r(0), (3.24)
which we iteratively refine until we find a suitable solution. We summarize the
optimization process in Algorithm 5 and Figure 3.7.
Note that the Jacobian matrix J(0) is constant as it is computed on the template
image—which is fixed—at the point µ = 0 (cf. Equation 3.22). Notice that the
crucial point of the derivation of the algorithm lies in the change of variables in
Equation 3.20. Solving for the search direction only consists on computing the
IC residuals and computing the least-squares approximation (Equation 3.24). The
Dissimilarity Linearization stage from Algorithm 1 is no longer required, which
results in a boost of the performance of the algorithm.
35
Algorithm 5 Outline of the Inverse Compositional algorithm.
Off-line: Compute J(0) = ∇µr(0) using Equation 3.22.
On-line: Let µi = µ0 be the initial guess.
1: while no convergence do
2: Compute the residual function at r(µi) from Equation 3.20.
3: Compute the search direction: δµi = − J(0)⊤
J(0)
−1
J(0)⊤
r(0).
4: Update the optimization parameters:µi+1 = µi ◦ δµ−1
i .
5: end while
Figure 3.7: Inverse compositional image registration. We initialize the current
parameter estimation at frame It+1 (µ ≡ µ0) using the local minimizer at the previous
frame It (µ0 ≡ µ∗
t ). At this point we compute the Jacobian J(0) using Equation 3.22.
We compute the dissimilarity residuals between the Image and the Template using µ
(Equation 3.15). Using J(0) we compute the descent direction δµ (Equation 3.24). We
update the parameters using inverse function composition— i.e. µ1 = µ0 ◦ δµ−1
. We
check if µ1 is a local minimizer by using the brightness dissimilarity: if D (Equation 3.15)
is small enough, then µ1 is the local minimizer (µ∗ ≡ µ1); in other case, we repeat the
process with using µ1 as the current parameters estimation (µ ≡ µ1).
36
Relevance of IC
The IC algorithm is known to be the most efficient optimization technique for direct
image registration [Baker and Matthews, 2004]. The algorithm was initially pro-
posed for template tracking, although it was later improved to use aams [Matthews
and Baker, 2004], register 3D Morphable Models [Romdhani and Vetter, 2003; Xu
and Roy-Chowdhury, 2008], account for photometric changes [Bartoli, 2008] and
allow for appearance variation [Gonzalez-Mora et al., 2009].
Some efficient algorithms using a constant residual Jacobian with additive in-
crements have been proposed in literature but no one shows reliable performance:
in [Cootes et al., 2001] an iterative regression-based gradient scheme is proposed to
align AAM to frontal images of faces. The regression matrix (similar to our Jaco-
bian matrix) is numerically computed off-line and it remains constant during the
Gauss-Newton optimisation. The method shows good performance because the so-
lution does not depart far from the initial guess. The method is revisited in [Donner
et al., 2006] using Canonical Correlation Analysis instead of numerical differentia-
tion to achieve better convergence rate and range. In [La Cascia et al., 2000] the
authors propose a Gauss-Newton scheme with constant Jacobian matrix for 6-dof
3D tracking of heads. The method needs regularisation constraints to improve the
convergence of the optimisation.
Recently, [Brooks and Arbel, 2010] augmented the scope of the IC framework
with the Generalized Inverse Compositional (GIC) image registration: they propose
an additive update to the parameters that is equivalent to the compositional update
from IC; therefore, they can adapt the IC to other optimization methods than GN,
such as Broyden-Fletche-Goldfarb-Shanno (bfgs) [Press et al., 1992].
3.5 Other Methods
Iterative gradient-based optimization algorithms (see Figure 3.4) can improve their
efficiency in two different ways: (1) by speeding up the linearization of the dissim-
ilarity function, and (2) by reducing the number of iterations of the process. The
algorithms that we have presented—i.e. HB and IC—belong to the first type. The
second type of methods achieve efficiency by using a more involved linearization
that converges faster to the minimum. [Averbuch and Keller, 2002] approximates
the error function in both the template and the current image and average the
least-squares solution to both. They show it converges with less iterations than
LK although the time per iteration is higher. Malis et. al [Benhimane and Malis,
2007] propose a similar method called Efficient Second-Order Minimization (esm)
which differs from the latter in using an efficient linearization on the template by
means of Lie algebra properties. Recently, both methods have been revisited and
reformulated in a common Bi-directional Framework in [Megret et al., 2008]. [Keller
and Averbuch, 2008] derives a high-order approximation to the error function that
leads to a faster algorithm with a wider convergence basin. Unfortunately—with
the exception of esm—none of these algorithm are appropriate for real-time image
37
registration.
3.6 Summary
We have introduced the basic concepts on direct image registration. We pose the reg-
istration problem as the result of gradient-descent optimizing a dissimilarity function
based on brightness differences. We classify the direct image registration algorithms
as either additive or compositional: in the former group we highlight the LK and the
HB algorithms, whereas the FC and IC algorithms belong to the latter.
38
Chapter 4
Equivalence of Gradients
In this chapter we introduce the concept of Equivalence of Gradients, that is, the
process of replacing the gradient of a brightness function for an equivalent alterna-
tive. In chapter 3 we have shown that some efficient algorithms for direct image
registration use a gradient replacement technique as a basis for their speed improve-
ment: (1) HB algorithm transforms the template derivatives using the target warp to
yield the image derivatives; and (2) IC algorithm replaces the image derivatives by
the template derivatives without any modification, but they change the parameters
update rule so the GN-like optimization converges. We introduce a new constraint,
the Gradient Equivalence Equation, and we show that this constraint is a necessary
requirement for the high computational efficiency of both HB and IC algorithms.
We organize the chapter as follows: Section 4.1 introduces the basic concepts
on image gradients in R2
, and its extension to higher dimension spaces such as P2
and R3
; Section 4.2 introduces the Gradient Equivalence Equation, that shall be
subsequently used to impose some requirements on the registration algorithms.
4.1 Image Gradients
We introduce the concept of gradient of a scalar function below. We consider images
as functions in two dimensions that assign a brightness value to an image pixel
position.
The Concept of Gradient The gradient of a scalar function f : Rn
→ R at a
point x ∈ Rn
is a vector ∇f(x) ∈ Rn
that points towards the direction of greatest
rate of increase of f(x). The length of the gradient vector |∇f(x)| is the greatest
rate of change of the function.
Image Gradients Grayscale images are discrete scalar functions I : R2
→ R
ranging from 0 (black) to 255 (white)—see Figure 4.1. We turn our attention to
grayscale images, but we may deal with colour-channelled images (e.g. rgb images)
by simply considering them as one grayscale image per colour plane. Grayscale
39
images are discrete functions: we represent an image as a matrix whose elements
I(i, j) are the brightness function values. We continuously approximate the discrete
function by using interpolation (see Figure 4.1).
We introduce the image gradients in the most common domains in Computer
Vision—R2
, P2
, and R3
. Image gradients are naturally defined in R2
, since the
images are functions defined in such domain. In some Computer Vision applications
the domain of x, D, is not constrained to R2
, but to P2
[Buenaposada and Baumela,
2002; Cobzas et al., 2009], or to R3
[Sepp, 2006; Xu and Roy-Chowdhury, 2008]. In
the following, the target coordinates are expressed in a domain D ∈ {R3
, P2
}, so
we need a projection function to map the target coordinates onto the image. We
generically define the projection mapping as p : D → R2
.
The corresponding projectors are the homogeneous to Cartesian mapping, p :
P2
→ R2
, and the perspective projection, p : R3
→ R2
. Image gradients in domains
other than R2
are computed by using the chain rule with the projector p : Rn
→ R2
:
∇ˆx(I ◦ p(x)) = ∇ˆxI(p(x)) = ∇ˆxI(x)∇ˆxp(x),
=

 ∂I( ˆX)
∂ ˆX ˆX=p(x)

 ∂p( ˆY)
∂ ˆY ˆY=x
,
= ∇ ˆp(x)I(p(x))∇ˆxp(x), x ∈ D ⊂ Rn
.
(4.1)
Equation 4.1 represents image gradients in domain D as the image gradient in
R2
lifted up onto the higher-dimension space D by means of the Jacobian matrix
∇ˆxp(x).
Notation We use operator [ ] to denote the composite function I ◦ p, that is,
I(p(x)) = I[x].
4.1.1 Image Gradients in R2
If the target and its kinematics are expressed in R2
, there is no need to use a
projector as both the target and the image share a common reference frame. The
gradient of a grayscale image at point x = (i, j)⊤
is the vector
∇ˆxI(x) = (∇iI(x), ∇jI(x)) =
∂I(x)
∂i
,
∂I(x)
∂j
, (4.2)
that flows from the darker areas of the image to the brighter ones (see Figure 4.1).
Moreover, the direction of the gradient vector at point x ∈ R2
is orthogonal to the
level set of the brightness function at the point (see Figure 4.1).
40
Figure 4.1: Depiction of Image Gradients. (Top-left) An image is a rectangular
array where each element is a brightness value. (Top-right) Continuous representation
of the image brightness values; we compute the values from the discrete array by interpo-
lation. (Bottom-left) Image gradients are vectors from each image array element in the
direction of maximum increase of brightness (compare to the top-right image). (Bottom-
right) Gradient vectors are orthogonal to the brightness function contour curves. Legend:
blue Gradient vectors. different colours Contour curves.
41
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration
Efficient Model-based 3D Tracking by Using Direct Image Registration

More Related Content

What's hot

Mark Quinn Thesis
Mark Quinn ThesisMark Quinn Thesis
Mark Quinn ThesisMark Quinn
 
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Alexander Zhdanov
 
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOKMth201 COMPLETE BOOK
Mth201 COMPLETE BOOKmusadoto
 
biometry MTH 201
biometry MTH 201 biometry MTH 201
biometry MTH 201 musadoto
 
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zinggFundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zinggRohit Bapat
 
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...valentincivil
 
Fabric Defect Detaction in Frequency Domain Using Fourier Analysis
Fabric Defect Detaction in Frequency Domain Using Fourier AnalysisFabric Defect Detaction in Frequency Domain Using Fourier Analysis
Fabric Defect Detaction in Frequency Domain Using Fourier AnalysisGokay Titrek
 
Stochastic Programming
Stochastic ProgrammingStochastic Programming
Stochastic ProgrammingSSA KPI
 
Lecture notes on hybrid systems
Lecture notes on hybrid systemsLecture notes on hybrid systems
Lecture notes on hybrid systemsAOERA
 
Triangulation methods Mihaylova
Triangulation methods MihaylovaTriangulation methods Mihaylova
Triangulation methods MihaylovaZlatka Mihaylova
 
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimizationDavid_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimizationDavid Mateos
 
Coding interview preparation
Coding interview preparationCoding interview preparation
Coding interview preparationSrinevethaAR
 

What's hot (20)

Mark Quinn Thesis
Mark Quinn ThesisMark Quinn Thesis
Mark Quinn Thesis
 
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
 
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOKMth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
 
diplomarbeit
diplomarbeitdiplomarbeit
diplomarbeit
 
biometry MTH 201
biometry MTH 201 biometry MTH 201
biometry MTH 201
 
feilner0201
feilner0201feilner0201
feilner0201
 
Grafx
GrafxGrafx
Grafx
 
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zinggFundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
 
Machine learning-cheat-sheet
Machine learning-cheat-sheetMachine learning-cheat-sheet
Machine learning-cheat-sheet
 
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
 
Fabric Defect Detaction in Frequency Domain Using Fourier Analysis
Fabric Defect Detaction in Frequency Domain Using Fourier AnalysisFabric Defect Detaction in Frequency Domain Using Fourier Analysis
Fabric Defect Detaction in Frequency Domain Using Fourier Analysis
 
Jung.Rapport
Jung.RapportJung.Rapport
Jung.Rapport
 
Stochastic Programming
Stochastic ProgrammingStochastic Programming
Stochastic Programming
 
thesis
thesisthesis
thesis
 
Lecture notes on hybrid systems
Lecture notes on hybrid systemsLecture notes on hybrid systems
Lecture notes on hybrid systems
 
Triangulation methods Mihaylova
Triangulation methods MihaylovaTriangulation methods Mihaylova
Triangulation methods Mihaylova
 
Ode2015
Ode2015Ode2015
Ode2015
 
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimizationDavid_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
David_Mateos_Núñez_thesis_distributed_algorithms_convex_optimization
 
phd_unimi_R08725
phd_unimi_R08725phd_unimi_R08725
phd_unimi_R08725
 
Coding interview preparation
Coding interview preparationCoding interview preparation
Coding interview preparation
 

Similar to Efficient Model-based 3D Tracking by Using Direct Image Registration

Stochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning PerspectiveStochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning Perspectivee2wi67sy4816pahn
 
Machine learning solutions for transportation networks
Machine learning solutions for transportation networksMachine learning solutions for transportation networks
Machine learning solutions for transportation networksbutest
 
Reading Materials for Operational Research
Reading Materials for Operational Research Reading Materials for Operational Research
Reading Materials for Operational Research Derbew Tesfa
 
Exercises_in_Machine_Learning_1657514028.pdf
Exercises_in_Machine_Learning_1657514028.pdfExercises_in_Machine_Learning_1657514028.pdf
Exercises_in_Machine_Learning_1657514028.pdfRaidTan
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Artur Filipowicz
 
numpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
numpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxnumpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
numpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxsin3divcx
 
High Performance Traffic Sign Detection
High Performance Traffic Sign DetectionHigh Performance Traffic Sign Detection
High Performance Traffic Sign DetectionCraig Ferguson
 
Methods for Applied Macroeconomic Research.pdf
Methods for Applied Macroeconomic Research.pdfMethods for Applied Macroeconomic Research.pdf
Methods for Applied Macroeconomic Research.pdfComrade15
 
Am06 complete 16-sep06
Am06 complete 16-sep06Am06 complete 16-sep06
Am06 complete 16-sep06Nemo Pham
 
Math for programmers
Math for programmersMath for programmers
Math for programmersmustafa sarac
 

Similar to Efficient Model-based 3D Tracking by Using Direct Image Registration (20)

book.pdf
book.pdfbook.pdf
book.pdf
 
Stochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning PerspectiveStochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning Perspective
 
Machine learning solutions for transportation networks
Machine learning solutions for transportation networksMachine learning solutions for transportation networks
Machine learning solutions for transportation networks
 
Mak ms
Mak msMak ms
Mak ms
 
MS_Thesis
MS_ThesisMS_Thesis
MS_Thesis
 
Reading Materials for Operational Research
Reading Materials for Operational Research Reading Materials for Operational Research
Reading Materials for Operational Research
 
Exercises_in_Machine_Learning_1657514028.pdf
Exercises_in_Machine_Learning_1657514028.pdfExercises_in_Machine_Learning_1657514028.pdf
Exercises_in_Machine_Learning_1657514028.pdf
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
 
math-basics.pdf
math-basics.pdfmath-basics.pdf
math-basics.pdf
 
numpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
numpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxnumpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
numpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 
High Performance Traffic Sign Detection
High Performance Traffic Sign DetectionHigh Performance Traffic Sign Detection
High Performance Traffic Sign Detection
 
Intro photo
Intro photoIntro photo
Intro photo
 
Methods for Applied Macroeconomic Research.pdf
Methods for Applied Macroeconomic Research.pdfMethods for Applied Macroeconomic Research.pdf
Methods for Applied Macroeconomic Research.pdf
 
Thesis_Prakash
Thesis_PrakashThesis_Prakash
Thesis_Prakash
 
Scikit learn 0.16.0 user guide
Scikit learn 0.16.0 user guideScikit learn 0.16.0 user guide
Scikit learn 0.16.0 user guide
 
Final_report
Final_reportFinal_report
Final_report
 
10.1.1.652.4894
10.1.1.652.489410.1.1.652.4894
10.1.1.652.4894
 
Master_Thesis
Master_ThesisMaster_Thesis
Master_Thesis
 
Am06 complete 16-sep06
Am06 complete 16-sep06Am06 complete 16-sep06
Am06 complete 16-sep06
 
Math for programmers
Math for programmersMath for programmers
Math for programmers
 

Recently uploaded

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 

Recently uploaded (20)

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 

Efficient Model-based 3D Tracking by Using Direct Image Registration

  • 1. UNIVERSIDAD POLIT´ECNICA DE MADRID FACULTAD DE INFORM´ATICA TESIS DOCTORAL Efficient Model-based 3D Tracking by Using Direct Image Registration presentada en la FACULTAD DE INFORM´ATICA de la UNIVERSIDAD POLIT´ECNICA DE MADRID para la obtenci´on del GRADO DE DOCTOR EN INFORM´ATICA AUTOR: Enrique Mu˜noz Corral DIRECTOR: Luis Baumela Molina Madrid, 2012
  • 2.
  • 3. i
  • 4. ii
  • 5. Agradecimientos La verdad es que los diez a˜nos (diez!) que he tardado en escribir esta tesis dan para muchas cosas, y si tuviera que agradecer algo a todas las personas que me han ayudado, necesitar´ıa un cap´ıtulo entero. En primer lugar quisiera agradecer a Luis Baumela, gran director de tesis y mejor persona, el haber despertado en m´ı el gusanillo por la investigaci´on, y sobre todo, por tener la suficiente paciencia para aguantar mis cabezonadas. Luis, si no fuera por t´ı, no habr´ıa entrado en la Universidad y estar´ıa en la empresa privada ganando una pasta gansa—yeah, thank you so much! Gracias mil a Javier de Lope, por incansables discusiones t´ecnicas y no tan t´ecnicas y sobre todo a Jos´e Miguel Buenaposada, quien durante todos estos a˜nos me ha aguantado, ayudado, irritado, bromeado, e incluso buscado trabajo. No me puedo olvidar de los buenos ratos pasados en la hora de la comida junto con las “chicas” de estad´ıstica (Maribel, Arminda, Concha y Juan Antonio), en las que han aguantado mis interminables peroratas sobre la burbuja inmobiliaria y los pol´ıticos patrios. Un recuerdo tambi´en para todos los compa˜neros que han pasado por el laboratorio L-3202 durante estos a˜nos: los “chicos de Javi” (Javi, Juan, Bea y Yadira), Juan Bekios, los dos “Pablos” (M´arquez y Herrero), Antonio y Rub´en. Quisiera agradecer tambi´en a Lourdes Agapito por permitirme participar en el proyecto Automated facial expression analysis using computer vision, financiado por la Royal Society del Reino Unido. Gracias a este proyecto pude tener el privilegio de trabajar con Lourdes y con Xavier Llad´o, y sobre todo de conocer a ese singular personaje llamado Alessio del Bue. No tengo palabras para agradecer a Alessio el ser tan majete y el aguantar estoicamente tantas veces como le hemos gorroneado. Tampoco puedo olvidarme de la ayuda prestada por el profesor Thomas Vetter y su grupo de la Universidad de Basilea (especialmente Brian Amberg y Pascal Paysan); ellos se tomaron la molestia de construir un modelo tridimensional de mi cara, incluyendo deformaciones y expresiones. No quisiera cerrar estos agradecimientos sin comentar que parte de los trabajos de esta tesis se han realizado bajo los proyectos del Ministerio de Ciencia y Tecnolog´ıa TIC2002-00591, y del Ministerio de Ciencia e Innovaci´on TIN2008-06815-C02-02. Y por ´ultimo, aunque no por ello menos importante, agradecer a Susana la paciencia que ha tenido todos estos a˜nos (que han sido muchos) en los que he estado liado con la tesis. Va por t´ı, Susana! Enero de 2012 iii
  • 6.
  • 7. Contents Resumen xvii Summary xix Notations 1 1 Introduction 5 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 9 2 Literature Review 13 2.1 Image Registration vs. Tracking . . . . . . . . . . . . . . . . . . . . . 13 2.2 Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Model-based 3D Tracking . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.1 Modelling assumptions . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Rigid Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.3 Nonrigid Objects . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.4 Facial Motion Capture . . . . . . . . . . . . . . . . . . . . . . 18 3 Efficient Direct Image Registration 21 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.1 Imaging Geometry . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.2 Brightness Constancy Constraint . . . . . . . . . . . . . . . . 23 3.2.3 Image Registration by Optimization . . . . . . . . . . . . . . . 23 3.2.4 Additive vs. Compositional . . . . . . . . . . . . . . . . . . . 25 3.3 Additive approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.1 Lucas-Kanade Algorithm . . . . . . . . . . . . . . . . . . . . . 27 3.3.2 Hager-Belhumeur Factorization Algorithm . . . . . . . . . . . 29 3.4 Compositional approaches . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.1 Forward Compositional Algorithm . . . . . . . . . . . . . . . . 33 3.4.2 Inverse Compositional Algorithm . . . . . . . . . . . . . . . . 35 3.5 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 v
  • 8. 4 Equivalence of Gradients 39 4.1 Image Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.1.1 Image Gradients in R2 . . . . . . . . . . . . . . . . . . . . . . 40 4.1.2 Image Gradients in P2 . . . . . . . . . . . . . . . . . . . . . . 42 4.1.3 Image Gradients in R3 . . . . . . . . . . . . . . . . . . . . . . 43 4.2 The Gradient Equivalence Equation . . . . . . . . . . . . . . . . . . . 45 4.2.1 Relevance of the Gradient Equivalence Equation . . . . . . . . 46 4.2.2 General Approach to Gradient Replacement . . . . . . . . . . 46 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5 Additive Algorithms 51 5.1 Gradient Replacement Requirements . . . . . . . . . . . . . . . . . . 52 5.2 Systematic Factorization . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3 3D Rigid Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.3.1 3D Textured Models . . . . . . . . . . . . . . . . . . . . . . . 55 5.3.2 Shape-induced Homography . . . . . . . . . . . . . . . . . . . 57 5.3.3 Change to the Reference Frame . . . . . . . . . . . . . . . . . 57 5.3.4 Optimization Outline . . . . . . . . . . . . . . . . . . . . . . . 61 5.3.5 Gradient Replacement . . . . . . . . . . . . . . . . . . . . . . 61 5.3.6 Systematic Factorization . . . . . . . . . . . . . . . . . . . . . 63 5.4 3D Nonrigid Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.4.1 Nonrigid Morphable Models . . . . . . . . . . . . . . . . . . . 65 5.4.2 Nonrigid Shape-induced Homography . . . . . . . . . . . . . . 65 5.4.3 Change of Variables to the Reference Frame . . . . . . . . . . 66 5.4.4 Optimization Outline . . . . . . . . . . . . . . . . . . . . . . . 69 5.4.5 Gradient Replacement . . . . . . . . . . . . . . . . . . . . . . 69 5.4.6 Systematic Factorization . . . . . . . . . . . . . . . . . . . . . 71 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6 Compositional Algorithms 77 6.1 Unravelling the Inverse Compositional Algorithm . . . . . . . . . . . 77 6.1.1 Change of Variables in IC . . . . . . . . . . . . . . . . . . . . 79 6.1.2 The Efficient Forward Compositional Algorithm . . . . . . . . 79 6.1.3 Rationale of the Change of Variables in IC . . . . . . . . . . . 82 6.1.4 Differences between IC and EFC . . . . . . . . . . . . . . . . . 84 6.2 Requirements for Compositional Warps . . . . . . . . . . . . . . . . . 85 6.2.1 Requirement on Warp Composition . . . . . . . . . . . . . . . 85 6.2.2 Requirement on Gradient Equivalence . . . . . . . . . . . . . 85 6.3 Other Compositional Algorithms . . . . . . . . . . . . . . . . . . . . 86 6.3.1 Generalized Inverse Compositional Algorithm . . . . . . . . . 86 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 vi
  • 9. 7 Computational Complexity 91 7.1 Complexity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.1.1 Number of Operations . . . . . . . . . . . . . . . . . . . . . . 91 7.1.2 Complexity of Matrix Operations . . . . . . . . . . . . . . . . 92 7.1.3 Comparing Algorithm Complexities . . . . . . . . . . . . . . . 93 7.2 Algorithm Naming Conventions . . . . . . . . . . . . . . . . . . . . . 94 7.2.1 Additive Algorithms . . . . . . . . . . . . . . . . . . . . . . . 95 7.2.2 Compositional Algorithms . . . . . . . . . . . . . . . . . . . . 96 7.3 Complexity of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 96 7.3.1 Additive Algorithms . . . . . . . . . . . . . . . . . . . . . . . 97 7.3.2 Compositional Algorithms . . . . . . . . . . . . . . . . . . . . 103 7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 8 Experiments 107 8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.2 Features and Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 113 8.2.1 Numerical Ranges for Features . . . . . . . . . . . . . . . . . . 115 8.3 Generation of Synthetic Experiments . . . . . . . . . . . . . . . . . . 116 8.3.1 Synthetic Datasets and Images . . . . . . . . . . . . . . . . . 118 8.3.2 Generation of Result Plots . . . . . . . . . . . . . . . . . . . . 120 8.4 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . 122 8.4.1 Convergence Criteria . . . . . . . . . . . . . . . . . . . . . . . 122 8.4.2 Visibility Management . . . . . . . . . . . . . . . . . . . . . . 122 8.4.3 Scale of Homographies . . . . . . . . . . . . . . . . . . . . . . 125 8.4.4 Minimization of Jacobian Operations . . . . . . . . . . . . . . 126 8.5 Additive Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 8.5.1 Experimental Hypotheses . . . . . . . . . . . . . . . . . . . . 126 8.5.2 Experiments with Synthetic Rigid data . . . . . . . . . . . . . 127 8.5.3 Experiments with Synthetic Nonrigid data . . . . . . . . . . . 142 8.5.4 Experiments With Nonrigid Sequence . . . . . . . . . . . . . . 151 8.5.5 Experiments with real Rigid data . . . . . . . . . . . . . . . . 154 8.5.6 Experiment with real Nonrigid data . . . . . . . . . . . . . . . 158 8.6 Compositional Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 163 8.6.1 Experimental Hyphoteses . . . . . . . . . . . . . . . . . . . . 163 8.6.2 Experiments with Synthetic Rigid data . . . . . . . . . . . . . 163 8.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 9 Conclusions and Future work 179 9.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 179 9.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 9.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 A Gauss-Newton Optimization 201 B Plane-induced Homography 203 vii
  • 10. C Plane+Parallax-constrained Homography 205 C.1 Compositional Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 D Methodical Factorization 209 D.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 D.2 Lemmas that Re-organize Product of Matrices . . . . . . . . . . . . . 211 D.3 Lemmas that Re-organize Kronecker Products . . . . . . . . . . . . . 215 D.4 Lemmas that Re-organize Sums of Matrices . . . . . . . . . . . . . . 216 E Methodical Factorization of f3DTM 219 F Methodical Factorization of f3DMM (Partial case) 223 G Methodical Factorization of f3DMM (Full case) 225 H Detailed Complexity of Algorithms 235 H.1 Warp f3DTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 H.2 Warp f3DMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 H.3 Jacobian of Algorithm HB3DTM . . . . . . . . . . . . . . . . . . . . 237 H.4 Jacobian of Algorithm HB3DTMNF . . . . . . . . . . . . . . . . . . 239 H.5 Jacobian of Algorithm HB3DMMNF . . . . . . . . . . . . . . . . . 241 H.6 Jacobian of Algorithm HB3DMMSF . . . . . . . . . . . . . . . . . . 246 viii
  • 11. List of Figures 1.1 Example of 3D rigid tracking. . . . . . . . . . . . . . . . . . . . . 6 1.2 3D Nonrigid Tracking. . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Image registration. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Industrial applications of 3D tracking. . . . . . . . . . . . . . . 9 1.5 Motion capture in the film industry. . . . . . . . . . . . . . . . 10 1.6 Markerless facial motion capture. . . . . . . . . . . . . . . . . . 11 3.1 Imaging geometry. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Iterative gradient descent image registration. . . . . . . . . . . 24 3.3 Generic descent method for image registration. . . . . . . . . . 26 3.4 Lucas-Kanade image registration. . . . . . . . . . . . . . . . . . 28 3.5 Hager-Belhumeur image registration. . . . . . . . . . . . . . . . 32 3.6 Forward compositional image registration. . . . . . . . . . . . . 34 3.7 Inverse compositional image registration. . . . . . . . . . . . . 36 4.1 Depiction of Image Gradients. . . . . . . . . . . . . . . . . . . . 41 4.2 Image Gradient in P2 . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3 Image gradient in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.4 Comparison between BCC and GEE. . . . . . . . . . . . . . . . 47 4.5 Gradients and Convergence. . . . . . . . . . . . . . . . . . . . . . 49 4.6 Open Subsets in Various Domains. . . . . . . . . . . . . . . . . . 49 5.1 3D Textured Model. . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.2 Shape-induced homographies. . . . . . . . . . . . . . . . . . . . . 58 5.3 Warp defined on the reference frame. . . . . . . . . . . . . . . . 59 5.4 Reference frame advantages. . . . . . . . . . . . . . . . . . . . . . 60 5.5 Nonrigid Morphable Models. . . . . . . . . . . . . . . . . . . . . 65 5.6 Nonrigid shape-induced homographies. . . . . . . . . . . . . . . 67 5.7 Deformable warp defined on the reference frame. . . . . . . . 68 6.1 Change of variables in IC. . . . . . . . . . . . . . . . . . . . . . . 80 6.2 Forward compositional image registration. . . . . . . . . . . . . 83 6.3 Generalized inverse compositional image registration. . . . . . 88 7.1 Complexity of Additive Algorithms. . . . . . . . . . . . . . . . . 102 7.2 Complexities of Compositional Algorithms . . . . . . . . . . . 105 ix
  • 12. 8.1 Registration vs. Tracking. . . . . . . . . . . . . . . . . . . . . . . 109 8.2 Algorithm initialization . . . . . . . . . . . . . . . . . . . . . . . . 110 8.3 Accuracy and convergence. . . . . . . . . . . . . . . . . . . . . . 114 8.4 Ground Truth and Noise Variance. . . . . . . . . . . . . . . . . 117 8.5 Definition of Datasets. . . . . . . . . . . . . . . . . . . . . . . . . 118 8.6 Example of Synthetic Datasets. . . . . . . . . . . . . . . . . . . . 119 8.7 Experimental Evaluation with Synthetic Data . . . . . . . . . 121 8.8 Visibility management. . . . . . . . . . . . . . . . . . . . . . . . . 123 8.9 Efficiently solving of WLS. . . . . . . . . . . . . . . . . . . . . . . 125 8.10 The cube model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 8.11 The face model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 8.12 The tea box model. . . . . . . . . . . . . . . . . . . . . . . . . . . 129 8.13 Results from dataset DS1 for cube. . . . . . . . . . . . . . . . . . 130 8.14 Results from dataset DS2 for cube. . . . . . . . . . . . . . . . . . 131 8.15 Results from dataset DS3 for cube. . . . . . . . . . . . . . . . . . 132 8.16 Results from dataset DS4 for cube. . . . . . . . . . . . . . . . . . 133 8.17 Results from dataset DS5 for cube. . . . . . . . . . . . . . . . . . 134 8.18 Results from dataset DS6 for cube. . . . . . . . . . . . . . . . . . 135 8.19 tea box sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 8.20 Results for the tea box sequence. . . . . . . . . . . . . . . . . . . 137 8.21 Estimated parameters from teabox sequence. . . . . . . . . . . 138 8.22 Estimated parameters from face sequence. . . . . . . . . . . . . 140 8.23 Good texture vs. bad texture. . . . . . . . . . . . . . . . . . . . 141 8.24 The face-deform model. . . . . . . . . . . . . . . . . . . . . . . . . 142 8.25 Distribution of Synthetic Datasets. . . . . . . . . . . . . . . . . 143 8.26 Results from dataset DS1 for face-deform. . . . . . . . . . . . . 145 8.27 Results from dataset DS2 for face-deform. . . . . . . . . . . . . 146 8.28 Results from dataset DS3 for face-deform. . . . . . . . . . . . . 147 8.29 Results from dataset DS4 for face-deform. . . . . . . . . . . . . 148 8.30 Results from dataset DS5 for face-deform. . . . . . . . . . . . . 149 8.31 Results from dataset DS6 for face-deform. . . . . . . . . . . . . 150 8.32 face-deform sequence. . . . . . . . . . . . . . . . . . . . . . . . . . 151 8.33 Results from face-deform sequence. . . . . . . . . . . . . . . . . 152 8.34 Estimated parameters from face-deform sequence. . . . . . . . 153 8.35 The cube-real model. . . . . . . . . . . . . . . . . . . . . . . . . . 154 8.36 The cube-real sequence. . . . . . . . . . . . . . . . . . . . . . . . 156 8.37 Results from cube-real sequence. . . . . . . . . . . . . . . . . . . 157 8.38 Selected facial scans used to build the model. . . . . . . . . . . 158 8.39 Unfolded texture model. . . . . . . . . . . . . . . . . . . . . . . . 159 8.40 The face-real sequence. . . . . . . . . . . . . . . . . . . . . . . . 160 8.41 Anchor points in the model. . . . . . . . . . . . . . . . . . . . . . 161 8.42 Results for the face-real sequence. . . . . . . . . . . . . . . . . 162 8.43 The plane model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 8.44 Distribution of Synthetic Datasets. . . . . . . . . . . . . . . . . 165 x
  • 13. 8.45 Results from dataset DS1 for plane. . . . . . . . . . . . . . . . . 167 8.46 Results from dataset DS2 for plane. . . . . . . . . . . . . . . . . 168 8.47 Results from dataset DS3 for plane. . . . . . . . . . . . . . . . . 169 8.48 Results from dataset DS4 for plane. . . . . . . . . . . . . . . . . 170 8.49 Results from dataset DS5 for plane. . . . . . . . . . . . . . . . . 171 8.50 Results from dataset DS6 for plane. . . . . . . . . . . . . . . . . 172 8.51 Average Time per iteration. . . . . . . . . . . . . . . . . . . . . . 176 9.1 Spiderweb Plots for Image Registration Algorithms. . . . . . 182 9.2 Spherical Harmonics-based Illumination Model . . . . . . . . . 184 9.3 Tracking by simultaneously using texture and edges infor- mation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 9.4 Efficient tracking using multiple views . . . . . . . . . . . . . . 186 B.1 Plane-induced homography. . . . . . . . . . . . . . . . . . . . . . 203 C.1 Plane+Parallax-constrained homograpy. . . . . . . . . . . . . . 206 xi
  • 14.
  • 15. List of Tables 4.1 Characteristics of the warps . . . . . . . . . . . . . . . . . . . . . 50 6.1 Relationship between compositional algorithms and warps . . 89 6.2 Requirements for Optimization Algorithms . . . . . . . . . . . 90 7.1 Complexity of matrix operations. . . . . . . . . . . . . . . . . . 93 7.2 Additive testing algorithms. . . . . . . . . . . . . . . . . . . . . . 95 7.3 Additive testing algorithms. . . . . . . . . . . . . . . . . . . . . . 96 7.4 Complexity of Algorithm LK3DTM. . . . . . . . . . . . . . . . . 97 7.5 Complexity of Algorithm HB3DTM. . . . . . . . . . . . . . . . 98 7.6 Complexity of Algorithm LK3DMM. . . . . . . . . . . . . . . . 98 7.7 Complexity of Algorithm HB3DMMNF. . . . . . . . . . . . . . 99 7.8 Complexity of Algorithm HB3DMM. . . . . . . . . . . . . . . . 100 7.9 Complexity of Algorithm HB3DMMSF. . . . . . . . . . . . . . 101 7.10 Complexities of Additive Algorithms. . . . . . . . . . . . . . . . 101 7.11 Complexity of Algorithm LKH8. . . . . . . . . . . . . . . . . . . 103 7.12 Complexity of Algorithm ICH8. . . . . . . . . . . . . . . . . . . 103 7.13 Complexity of Algorithm HBH8. . . . . . . . . . . . . . . . . . . 104 7.14 Complexity of Algorithm GICH8. . . . . . . . . . . . . . . . . . 104 7.15 Complexities of Compositional Algorithms. . . . . . . . . . . . 106 7.16 Comparison of Relative Complexities for Additive Algorithms106 7.17 Comparison of Relative Complexities for Compositional Al- gorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 8.1 Registration vs. tracking in efficient methods . . . . . . . . . . 111 8.2 Features and Measures. . . . . . . . . . . . . . . . . . . . . . . . . 115 8.3 Numerical Ranges for Features. . . . . . . . . . . . . . . . . . . . 115 8.4 Evaluated Additive Algorithms . . . . . . . . . . . . . . . . . . . 127 8.5 Ranges of parameters for cube experiments. . . . . . . . . . . . 129 8.6 Average reprojection error vs. noise for cube. . . . . . . . . . . 129 8.7 Ranges of parameters for face-deform experiments. . . . . . . 144 8.8 Average reprojection error vs. noise for face-deform. . . . . . 144 8.9 Evaluated Compositional Algorithms . . . . . . . . . . . . . . . 164 8.10 Ranges of motion parameters for each dataset. . . . . . . . . . 165 8.11 Average reprojection error vs. noise for plane. . . . . . . . . . 166 xiii
  • 16. 9.1 Classification of Motion Warps. . . . . . . . . . . . . . . . . . . . 181 D.1 Lemmas used to re-arrange matrices product. . . . . . . . . . 214 D.2 Lemmas used to re-arrange Kronecker matrix products. . . . 216 xiv
  • 17. List of Algorithms 1 Outline of the basic GN-based descent method for image registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2 Outline of the Lucas-Kanade algorithm. . . . . . . . . . . . . . 28 3 Outline of the Hager-Belhumeur algorithm. . . . . . . . . . . . 31 4 Outline of the Forward Compositional algorithm. . . . . . . . 34 5 Outline of the Inverse Compositional algorithm. . . . . . . . . 36 6 Iterative factorization of the Jacobian matrix. . . . . . . . . . 54 7 Outline of the HB3DTM algorithm. . . . . . . . . . . . . . . . . 64 8 Outline of the full-factorized HB3DMM algorithm. . . . . . . 75 9 Outline of the HB3DMMSF algorithm. . . . . . . . . . . . . . . 76 10 Outline of the Efficient Forward Compositional algorithm. . . 82 11 Outline of the Generalized Inverse Compositional algorithm. 88 12 Creating the synthetic datasets. . . . . . . . . . . . . . . . . . . 119 13 Outline of the GN algorithm. . . . . . . . . . . . . . . . . . . . . 202 xv
  • 18.
  • 19. Resumen Esta tesis trata el problema de seguimiento eficiente de objectos 3D en secuencias de im´agenes. Tratamos el problema del seguimiento 3D usando registrado de im´agenes directo, una t´ecnica que permite alinear dos im´agenes usando sus niveles de inten- sidad. El registrado de im´agenes se suele resolver usando m´etodos de optimizaci´on iterativa, donde la funci´on a minimizar depende del error en los niveles de intensidad. En esta tesis examinaremos los m´etodos de registrado de im´agenes m´as comunes, haciendo hincapi´e en aquellos que usan algoritmos eficientes de optimizaci´on. En esta tesis investigaremos dos formas de registrado eficiente. La primera in- cluye a los m´etodos aditivos de registrado: los par´ametros de movimiento se calculan incrementalmente mediante una aproximaci´on lineal de la funci´on de error. Dentro de este tipo de algoritmos, nos centraremos en el m´etodo de factorizaci´on de Hager y Belhumeur. Introduciremos un requisito necesario que el algoritmo de factorizaci´on debe cumplir para tener una buena convergencia. Adem´as, proponemos un pro- cedimiento autom´atico de factorizaci´on que nos permitir´a seguir objetos 3D tanto r´ıgidos como deformables. El segundo tipo son los llamados m´etodos composicionales de registrado, donde la norma de error se reescribe usando composici´on de funciones. Estudiaremos los m´etodos composicionales m´as usuales, haciendo hincapi´e en el m´etodo de registrado m´as r´apido, el algoritmo composicional inverso. Introduciremos un nuevo m´etodo de registrado composicional, el algoritmo Efficient Forward Compositional, que nos permite interpretar los mecanismos de funcionamiento del algoritmo composicional inverso. Gracias a esta interpretaci´on novedosa, enunciaremos dos requisitos funda- mentales para algoritmos composicionales eficientes. Por ´ultimo, realizaremos una serie de experimentos con datos reales y sint´eticos para comprobar los postulados te´oricos. Adem´as, diferenciaremos entre los proble- mas de registrado y seguimiento para algoritmos eficientes: aquellos algoritmos que cumplan su(s) requisito(s) podr´an usarse para registrado de im´agenes, pero no para seguimiento. xvii
  • 20.
  • 21. Abstract This thesis deals with the problem of efficiently tracking 3D objects in sequences of images. We tackle the efficient 3D tracking problem by using direct image registra- tion. This problem is posed as an iterative optimization procedure that minimizes a brightness error norm. We review the most popular iterative methods for image registration in the literature, turning our attention to those algorithms that use efficient optimization techniques. Two forms of efficient registration algorithms are investigated. The first type comprises the additive registration algorithms: these algorithms incrementally com- pute the motion parameters by linearly approximating the brightness error function. We centre our attention on Hager and Belhumeur’s factorization-based algorithm for image registration. We propose a fundamental requirement that factorization-based algorithms must satisfy to guarantee good convergence, and introduce a systematic procedure that automatically computes the factorization. Finally, we also bring out two warp functions to register rigid and nonrigid 3D targets that satisfy the requirement. The second type comprises the compositional registration algorithms, where the brightness function error is written by using function composition. We study the current approaches to compositional image alignment, and we emphasize the impor- tance of the Inverse Compositional method, which is known to be the most efficient image registration algorithm. We introduce a new algorithm, the Efficient Forward Compositional image registration: this algorithm avoids the necessity of inverting the warping function, and provides a new interpretation of the working mechanisms of the inverse compositional alignment. By using this information, we propose two fundamental requirements that guarantee the convergence of compositional image registration methods. Finally, we support our claims by using extensive experimental testing with synthetic and real-world data. We propose a distinction between image registration and tracking when using efficient algorithms. We show that, depending whether the fundamental requirements are hold, some efficient algorithms are eligible for image registration but not for tracking. xix
  • 22.
  • 23. Notations Specific Sets and Constants X Set of target points or target region. Ω Set of target points currently visible. N Number of points in the target region—i.e., N = X . NΩ Number of visible target points—i.e., NΩ = Ω . P Dimension of the parameter space. C Number of image channels. K Dimension of the deformations space. F Number of frames in the image sequence. Vectors and Matrices a Lowercase bold letters denote vectors. Am×n Monospace uppercase letters denote m × n matrices. vec(A) Vectorization of matrix A: if A is a m × n matrix, vec(A) is a mn × 1 vector. Ik ∈ Mk×k k × k identity matrix. I 3 × 3 identity matrix. 0k ∈ Rk k × 1 vector full with zeroes. 0m×n ∈ Mm×n m × n matrix full with zeroes. Camera Model Notations x ∈ R2 Pixel location at the image. ˆx ∈ P2 Location in the Projective space. X ∈ R3 Point in Cartesian coordinates Xc ∈ R3 Point expressed in the camera reference system. K ∈ M3×3 3 × 3 camera intrinsics matrix. P ∈ M3×4 3 × 4 camera projection matrix. 1
  • 24. Imaging Notations T (x) ∈ Rc Brightness value of the template image at pixel x. I(x, t) ∈ Rc Brightness value of the current image for pixel x at instant t. It(x) Another notation for I(x, t). T,It Vector forms of functions T and It. [ ] Composite function of I ◦ p, that is I[x] = I(p(x)). Optimization Notations µ ∈ RP Column vector of motion parameters. µ0 ∈ RP Initial guess of the optimization. µi ∈ RP Parameters at the i-th iteration of the optimization. µ∗ ∈ RP Actual optimum of the optimization. µt ∈ RP Parameters at image t. µJ ∈ RP Parameters where the Jacobian is computed for efficient algorithms. δµ ∈ RP Incremental step at the current state of the optimization. ℓ(δµ) Linear model for the incremental step δµ. L(δµ) Local minimizer for the incremental step δµ. r(µ) ∈ RN N × 1 vector-valued residuals function at parameters µ. ∇ˆxf(x) Derivatives of function f with respect to variables x, instantiated at x. J(µ) ∈ MN×P Jacobian matrix of the brightness dissimilarity at µ (i.e., J(µ) = ∇ˆµD(X; µ)). H(µ) ∈ MP×P Hessian matrix of the brightness dissimilarity at µ (i.e., H(µ) = ∇2 ˆµD(X; µ)). Warp Function Notations f(x; µ) : Rn × RP → Rn Motion model or Warp. p : Rn → R2 Projection into the Cartesian plane. R ∈ M3×3 3 × 3 rotation matrix. ri ∈ R3 Columns of the rotation matrix R (i.e., R = (r1, r2, r3)). t ∈ R3 Translation vector in Euclidean space. D : R2 × Rp → R Dissimilarity function. U : Rp × Rp → Rp Parameters update function. ψ : Rp × Rp → Rp Jacobian update function for algorithm GIC. 2
  • 25. Factorization Notations ⊗ Kronecker product. ⊙ Row-wise Kronecker product. S(x) Constant matrix in the factorization method that is computed from the target structure and camera calibration. M(µ) Variable matrix in the factorization methods that is computed from motion parameters. W ∈ Mp×p Weighting matrix for Weighted Least-Squares. π : Rn → Rn Permutation of the set {1, . . . , n}. Pπ(n) ∈ Mn×n Permutation matrix of the set {1, . . . , n}. π(n, q) Permutation of the set {1, . . . , n} with ratio q. 3D Models Notations F ⊂ R2 Reference frame for algorithm HB. S : F → R3 Target shape function. T : F → RC Target texture function. u ∈ F Target coordinates in the reference frame. S ∈ M3×Nv Target 3D shape. s ∈ R3 Shape coordinates in the Euclidean space. s0 ∈ R3 Mean shape of the target generative model. si ∈ R3 i-th basis of deformation of the target generative model. n⊤ ∈ R3 Normal vector to a given triangle. n⊤ is normalized with the triangle depth (i.e., if x belongs to the triangle, then n⊤ x = 1). Bs ∈ M3×K Basis of deformations. c ∈ RK Vector containing K deformation coefficients. HA ∈ M3×3 Affine warp between the image reference frame and F. ˙R∆ Derivatives of the rotation matrix R with respect to the Euler angle ∆ = {α, β, γ}. λ ∈ R Homogeneous scale factor. v ∈ R3 Change of variables defined as v = K−1 HA ˆu. Function Naming Conventions fH82D : P2 → P2 8-dof homography. fH6P : P2 → P2 Plane-induced homography. fH6S : P2 → P2 Shape-induced homography. f3DTM : P2 → P2 3D Textured Model motion model. fH6D : P2 → P2 Deformable shape-induced homography. f3DMM : P2 → P2 3D Textured Morphable Model motion model. ε : Rp → R Reprojection error function. 3
  • 26. Algorithms Naming Conventions LK Lucas-Kanade algorithm [Lucas and Kanade, 1981]1 . HB Hager-Belhumeur factorization algorithm [Hager and Belhumeur, 1998]. IC Inverse Compositional algorithm [Baker and Matthews, 2004]. FC Forward Compositional algorithm [Baker and Matthews, 2004]. GIC Generalized Inverse Compositional algorithm [Brooks and Arbel, 2010]. EFC Efficient Forward Compositional algorithm. LKH8 Lucas-Kanade algorithm for homographies. LKH6 Lucas-Kanade algorithm for plane-induced homographies. LK3DTM Lucas-Kanade algorithm for 3D Textured Models (rigid). LK3DMM Lucas-Kanade algorithm for 3D Morphable Models (deformable). HB3DTR Full-factorized HB algorithm for 6-dof motion in R3 [Sepp, 2006]. HB3DTM Full-factorized HB algorithm for 3D Textured Models (rigid). HB3DMM Full-factorized HB algorithm for 3D Morphable Models (deformable). HB3DMMSF Semi-factorized HB algorithm for 3D Morphable Models. HB3DMMNF HB algorithm for 3D Morphable Models without the factorization stage. ICH8 IC algorithm for homographies. ICH6 IC algorithm for plane-induced homographies. GICH8 IC algorithm for homographies. GICH6 IC algorithm for plane-induced homographies. IC3DRT IC algorithm for 6-dof motion in R3 [Mu˜noz et al., 2005]. FCH6PP FC algorithm for plane+parallax homographies. 1 We only show the most relevant cite for each algorithm 4
  • 27. Chapter 1 Introduction This thesis deals with the problems of registration and tracking in sequences of images. Both problems are classical topics in Computer Vision and Image Processing that have been widely studied in the past. We summarize the subjects of this thesis in the dissertation title: Efficient Model-based 3D Tracking by using Direct Image Registration What is 3D Tracking? Let the target be a part of the scene—e.g. the cube in Figure 1.1. We define tracking as the process of repeatedly computing the target state in a sequence of images. When we describe this state as the relative 3D orientation and location of the target with respect the coordinate system of the camera (or another arbitrary reference system), we refer to this process as 3D rigid tracking (see Figure 1.1). If we also include state parameters that describe the possible deformation of the object, we have 3D nonrigid or deformable tracking (see Figure 1.2). We use 3D tracking to refer to both the rigid or the nonrigid case. What is Direct Image Registration? When the target is imaged by two cam- eras with different point-of-view, the resulting images are different although they represent the same portion of the scene (see Figure 1.3). Image Registration or Image Alignment computes the geometric transformation that best aligns the coor- dinate systems of both images such that their pixel-wise differences are minimal (cf. Figure 1.3). We say that the image registration is a direct method when we register the coordinate systems by just using the brightness differences of the images. What is Model-based? We say that a technique is model-based when we re- strict the information from the real world by using certain assumptions: on the target dynamics, on the target structure, on the camera sensing process, etc—e.g. in Figure 1.1 we model the target with a cube structure and rigid body dynamics. 5
  • 28. Figure 1.1: Example of 3D rigid tracking (Left) Selected frames of a scene containing a textured cube. We track the object and we overlay its state in blue. (Right) The relative position of the camera—represented by a coloured pyramid—and the cube is computed from the estimated 3D parameters. Figure 1.2: 3D Nonrigid Tracking. Selected frames from a sequence of a cushion under a bending motion. We track some landmarks on the cushion through the sequence, and we plot the resulting triangular mesh for the selected frames. The motion of the landmarks is both global—translation of the mesh—and local—changes on the relative position of the mesh vertices due to the deformation. Source: Alessio del Bue. And Finally, What does Efficient mean? We say that a method is efficient if it substantially improves the computation time with respect to gold-standard techniques. In a more practical way, efficient is equivalent to real-time—i.e. the 6
  • 29. Figure 1.3: Image registration (Top-row)Image of a portion of the scene under two distinct point-of-views. We have outlined the target in blue (Top-left) and green (Top- right). (Bottom)The left image is warped such that the coordinates of the target match up in both images. Source:Graffiti sequence, from Oxford Visual Geometry Group. tracking procedure operates at 25 frames per second. 1.1 Motivation In less than thirty years, and quite enclosed to academic or military environments, video tracking has a widespread acknowledgement mainly thanks to the media. 7
  • 30. Thus, video tracking is now a staple in sci-fi shows and films where futuristic Head- up Displays (hud) work in a show-and-tell fashion, a camera surveillance system can locate an object or a person, or a robot can address people and even recognize their mood. However, tv is, sadly to say, years ahead of reality. Actual video tracking systems are still in a primitive stage: they are inaccurate, sloppy, slow, and usually work in laboratory conditions only. Anyway, video tracking progression increases by leaps and bounds and it will probably match some sci-fi standards soon. We investigate the problem of efficiently tracking an object in a video sequence. Nowadays there exists several efficient optimization algorithms for video tracking or image registration. We study two of the fastest algorithms available: the Hager- Belhumeur factorization algorithm and the Baker-Matthews inverse compositional algorithm. Both algorithms, although very efficient for planar registration, present diverse problems for 3D tracking. This thesis studies which assumptions can be done with these algorithms whilst underlining their limitations through extensive testing. Eventually, the objective is to provide a detail description of each algorithm, pointing out pros and cons, leading to a kind of Quick Guide to Efficient Tracking Algorithms. 1.2 Applications Typical applications for 3D tracking include target localization for military oper- ations; security and surveillance tasks such as person counting, face identification, people detection, determining people activity or detecting left objects; it also in- cludes human-computer interaction for computer security, aids for disabled people or even controlling video-games. Tracking is used for augmenting video sequences with additional information such as advertisements, expanding information about the scene, or adding or removing objects of the scene. We show some examples of actual industrial applications in Figure 1.4. A tracking process that is widely used in film industry is Motion Capture: we track the motion of the different parts of the an actor’s body using a suit equipped with reflective markers; then, we transfer the estimated motion to a computer- generated character (see Figure 1.5). Using this technique, we can animate a syn- thetic 3D character in a movie as Gollum in the Lord of the Rings trilogy (2001), or Jar-Jar Binks in the new Star Wars trilogy (1999). Another relevant movies that employ these techniques are Polar Express (2004), King Kong (2005), Beowulf (2007), A Christmas Carol (2009), and Avatar (2009). Furthermore, we can generate a complete computer-generated movie populated with characters animated through motion capture. Facial motion capture is of special interest for us: we animate a computer-generated facial expression by facial expression tracking (see Figure 1.5). We turn our attention to markerless facial motion capture, that is, the process of recovering the face expression and orientation without using fiducial markers. Markerless motion capture does not require special equipment—such as close-up 8
  • 31. Figure 1.4: Industrial applications of 3D tracking. (Top-left) Augmented reality inserts virtual objects into the scene. (Top-middle) Augmented reality shows additional information about tracked objects in the scene. Source:Hawk-eye, Hawk-Eye Innovations Ltd., copyright c 2008. Top-right Tracking a pedestrian for video surveillance. Source: Martin Communications, copyright c 1998-2007. Bottom-left People flow counter by tracking. Source: EasyCount, by Keeneo, copyright c 2010. Bottom-middle Car track- ing detects possible traffic infractions or estimates car speed. Source: Fibridge, copy- right c . Bottom-right Body tracking is used for interactive controlling of video-games. Source: Kinect, Microsoft, copyright c 2010. cameras—or a complicated set-up on the actor’s face—such as special reflective make-up or facial stickers. In this thesis we propose a technique that captures facial expressions motion by only using brightness information and a prior knowledge on the deformation of the target (see Figure 1.6). 1.3 Contributions of the Thesis We outline the remaining chapters of the thesis and their principal contributions as follows: Chapters2: Literature Review We provide a detailed survey of the literature on techniques for both image registration and tracking. Chapters3: Efficient Image Registration We review the state-of-the-art on efficient methods. We introduce the taxonomy for efficient registration algorithms: 9
  • 32. Figure 1.5: Motion capture in the film industry. Facial and body motion capture from AvatarTM (Top-row) and Polar ExpressTM (Bottom-row). (Left-column) The body motion and head pose are computed using reflective fiducial markers—grey spheres of the motion capture jumpsuit. For facial expression capture they use plenty of smaller markers and even close-up cameras. (Right-column) They use the estimated motion to animate characters in the movie. Source: Avatar, 20th Century Fox, copyright c 2009; Polar Express, Warner Bros. Pictures, copyright c 2004. an algorithm is classified as either additive or compositional. Chapter 4: Equivalence of Gradients We introduce the gradient equiva- lence equation constraint: we show that the accomplishment of this assumption has positive effects on the performance of the algorithms. Chapter 5: Additive Algorithms We review which constraints determine the convergence of additive registration algorithms, specially the factorization approach. We provide a methodical procedure to factorize an algorithm in general form; we state a basic set of theorems and lemmas that enable us to systematize the factor- ization. We introduce two tracking algorithms using factorization: one for rigid 3D objects, and other for deformable 3D objects. 10
  • 33. Figure 1.6: Markerless facial motion capture. (Top) Several frames where the face modifies both its orientation—due to a rotation—and its shape structure—due to changes in facial expression. (Bottom) The tracking state vector includes both pose and deformation. Legend: Blue Actual projection of the target shape using the estimated parameters; Pink Highlighted projections corresponding to profiles of the jaw, eyebrows, lips and nasolabial wrinkles. Chapter 6: Compositional Algorithms We review the basic inverse composi- tional algorithm. We introduce an alternative efficient compositional algorithm that is equivalent to the inverse compositional algorithm under certain assumptions. We show that if the gradient equivalent equation holds then both efficient compositional methods shall converge. Chapter 7: Computational Complexity We study the resources used by the registration algorithms in terms of their computational complexity. We compare the theoretical complexities of efficient and nonefficient algorithms. Chapter8: Experiments We devise a set of experimental tests that shall con- firm our assumptions on the registration algorithms, that is, (1) the dependence of the convergence on the algorithm constraint, and (2) evaluate the theoretical complexities with actual data. Chapter 9: Conclusions and Future Work Finally, we drawn conclusions about where each technique is more suitable to be used, and we provide insight into future work to improve the proposed methods. 11
  • 34.
  • 35. Chapter 2 Literature Review In this chapter we review the basic literature on tracking and image registration. First we introduce the basic similarities and differences between image registration and tracking. Then, we review the usual methods for both tracking and image registration. 2.1 Image Registration vs. Tracking The frontier between image registration and tracking is a bit fuzzy: tracking identi- fies the location of an object in a sequence of images, whereas registration finds the pixel-to-pixel correspondence between a pair of images. Note that in both cases we compute a geometric and photometric transformation between images: pairwise in the context of image registration and among multiple images for the tracking case. Although we may indistinctly use the terms registration and tracking, we define the following subtle semantic differences between them: • Image registration finds the best alignment between two images of the same scene. We use use a geometric transformation to align the images of both cameras. We consider that image registration emphasizes in finding the best alignment between two images in visual terms, not in accurately recovering parameters of the transformation—this is usually the case in e.g., medical applications. • Tracking finds the location of a target object in each frame of a sequence. We assume that the difference of object position between two consecutive frames is small. In tracking we are typically interested in recovering the parameters de- scribing the state of the object rather than the coordinates of the location: we can describe an object using richer information that just its position (e.g. 3D orientation, modes of deformation, lighting changes, etc.). This is usually the case in robotics [Benhimane and Malis, 2007; Cobzas et al., 2009; Nick Molton, 2004], or augmented reality [Pilet et al., 2008; Simon et al., 2000; Zhu et al., 2006]. 13
  • 36. Also, image registration involves two images with arbitrary baseline whereas track- ing usually operates in a sequence with a small inter-frame baseline. We assume that tracking is a higher level problem than image registration. Furthermore, we propose a tracking-by-registration approach: we track an object through a sequence by iteratively registering pairs of consecutive images [Baker and Matthews, 2004]; however, we can perform tracking without any registration at all (e.g. tracking- by-detection [Viola and Jones, 2004], or tracking-by-classification [Vacchetti et al., 2004]). 2.2 Image Registration Image registration is a classic topic in computer vision and numerous approaches have been proposed in the literature; two good surveys in the subject are [Brown, 1992] and [Zitova, 2003]. The process involves computing the pixel-to-pixel corre- spondence between the two images: that is, for each pixel on one image we find the corresponding pixel in the other image so that both pixels project from the same actual point in the scene (cf. Figure 1.1). Applications include image mo- saicing [Capel, 2004; Irani and Anandan, 1999; Shum and Szeliski, 2000], video stitching [Caspi and Irani, 2002], super-resolution [Capel, 2004; Irani and Peleg, 1991], region tracking [Baker and Matthews, 2004; Hager and Belhumeur, 1998; Lu- cas and Kanade, 1981], recovering scene/camera motion [Bartoli et al., 2003; Irani et al., 2002], or medical image analysis [Lester and Arridge, 1999]. Image registration methods commonly fall in one of the two following groups [Bar- toli, 2008; Capel, 2004; Irani and Anandan, 1999]: Direct methods A direct image registration method aligns two images by only using the colour—or intensity in greyscale data—values of the pixels that are common to both images (namely, the region of support). Direct meth- ods minimize an error measure based on image brightness from the region of support. Typical error measures include a L2 -norm of the brightness differ- ence [Irani and Anandan, 1999; Lucas and Kanade, 1981], normalized cross- correlation [Brooks and Arbel, 2010; Lewis, 1995], or mutual information [Dow- son and Bowden, 2008; Viola and Wells, 1997]. Feature-based methods In feature-based methods, we align two images by com- puting the geometric transformation between a set of salient features that we detect in each image. The idea is to abstract distinct geometric image features that are more reliable than the raw intensity values; typically these features show invariance with respect to modifications of the camera point-of- view, illumination conditions, scale, or orientation of the scene [Schmid et al., 2000]. Corners or interest points [Bay et al., 2008; Harris and Stephens, 1988; Lowe, 2004; Torr and Zisserman, 1999] are classical features in the literature, although we can use other features such us edges [Bartoli et al., 2003], or extremal image regions [Matas et al., 2002]. 14
  • 37. Direct or feature-based methods? Choosing between direct or feature-based methods is not an easy task: we have to know the strong points of each method and for what applications it is more suitable. A good comparison between the two types of methods is [Capel, 2004]. Feature-based methods typically show strong invariance to a wide range of photometric and geometric transformation of the im- age, and they are more robust to partial occlusions of the scene that their direct counterparts [Capel, 2004; Torr and Zisserman, 1999]. On the other hand, direct methods can align images with sub-pixel accuracy, estimate dominant motion even when multiple motion are present, and they can provide dense motion field in case of 3D estimation [Irani and Anandan, 1999]. Moreover, direct methods do not require high-frequency textured surfaces (corners) to operate, but have optimal performance with smooth graylevel transitions [Benhimane et al., 2007]. 2.3 Model-based 3D Tracking In this section we define what is model-based tracking, and we review the previous literature on 3D tracking of rigid and nonrigid objects. A special case of interest for nonrigid objects is the 3D tracking of human faces or facial motion capture. Recovering the 3D orientation and position of the target can be done with respect to the camera (or an arbitrary reference system), or the relative displacement and orientation of the camera with respect to the target (or another arbitrary reference system in the scene), [Sepp, 2008]. A good survey on the subject is [Lepetit and Fua, 2005]. 2.3.1 Modelling assumptions In model-based techniques we use a priori knowledge about the scene, the target, or the sensing device, as a basis for the tracking procedure. We classify these assumptions on the real-world information as follows: Target model The target model specifies how to represent the information about the structure of the scene in our algorithms. Template tracking or template matching simply repre- sents the target as the pixel intensity values inside a region defined on one image: we call this region—or the image itself—the reference image or template. One of the first proposed technique for template matching was [Lucas and Kanade, 1981], although it was initially devised for solving optical flow problems. The literature proposes numerous extensions to this technique [Baker and Matthews, 2004; Benhi- mane and Malis, 2007; Brooks and Arbel, 2010; Hager and Belhumeur, 1998; Jurie and Dhome, 2002a]. We may also allow the target to deform its shape: this deformation induces changes in the target projected appearance. We model these changes in target texture by using generative models such as eigenimages [Black and Jepson, 1998; 15
  • 38. Buenaposada et al., 2009], Active Appearance Models (aam) [Cootes et al., 2001], active blobs [Sclaroff and Isidoro, 2003], or subspace representation [Ross et al., 2004]. Instead of modelling brightness variations we may represent target shape deformation by using a linear model representing the location of a set of feature points [Blanz and Vetter, 2003; Bregler et al., 2000; Del Bue et al., 2004], or Finite Element Meshes [Pilet et al., 2005; Zhu et al., 2006]. Alternative approaches model non-rigid motion of the target by using anthropometric data [Decarlo and Metaxas, 2000], or by using a probability distribution of the intensity values of the target region [Comaniciu et al., 2000; Zimmermann et al., 2009]. These techniques are suitable to track planar objects of the scene. If we add fur- ther knowledge about the scene, we can track more complex objects: with a proper model we are able to recover 3D information. Typically, we use a wireframe 3D model of the target and tracking consists on finding the best alignment between the sensed image and the 3D model [Cipolla and Drummond, 1999; Kollnig and Nagel, 1997; Marchand et al., 1999]. We can augment this model by adding further texture priors either from the image stream [Cobzas et al., 2009; Mu˜noz et al., 2005; Sepp and Hirzinger, 2003; Vacchetti et al., 2004; Xiao et al., 2004a; Zimmermann et al., 2006], or from and external source (e.g. a 3D scanner or a texture mosaic) [Hong and Chung, 2007; La Cascia et al., 2000; Masson et al., 2004, 2005; Pressigout and Marchand, 2007; Romdhani and Vetter, 2003]. Motion model The motion model describes the target kinematics (i.e. how the object modifies its position in the image/scene). The motion model is tightly coupled to the tar- get model: it is usually represented by a geometric transformation that maps the coordinates of the target model into a different set of coordinates. For a planar target, these geometric transformations are typically affine [Hager and Belhumeur, 1998], homographic [Baker and Matthews, 2004; Buenaposada and Baumela, 1999], or spline-based warps [Bartoli and Zisserman, 2004; Brunet et al., 2009; Lester and Arridge, 1999; Masson et al., 2005]. For actual 3D targets, the geometric warps account for computing the rotation and translation of the object using a 6 degree- of-freedom (dof) rigid body transformation [Cipolla and Drummond, 1999; La Cascia et al., 2000; Marchand et al., 1999; Sepp and Hirzinger, 2003]. Camera model The camera model specifies how the images are sensed by the camera. The pin- hole camera models the imaging device as a projector of the coordinates of the scene [Hartley and Zisserman, 2004]. For tracking zoomed objects located far away, we may use orthographic projection [Brand and R.Bhotika, 2001; Del Bue et al., 2004; Tomasi and Kanade, 1992; Torresani et al., 2002]. The perspective projection accounts for perspective distortion, and it is more suitable for close-up views [Mu˜noz et al., 2005, 2009]. The camera model may also account for model deviations such as lens distortion [Claus and Fitzgibbon, 2005; Tsai, 1987]. 16
  • 39. Other model assumptions We can also model prior photometric knowledge about the target/scene such as illumination cues [La Cascia et al., 2000; Lagger et al., 2008; Romdhani and Vetter, 2003], or global colour [Bartoli, 2008]. 2.3.2 Rigid Objects We can follow two strategies to recover the 3D parameters of a rigid object: 2D Tracking The first group of methods involves a two-step process: first we compute the 2D motion of the object as a displacement of the target projection on the image; second, we recover the actual 3D parameters from the computed 2D displacements by using the scene geometry. A natural choice is to use optical flow: [Irani et al., 1997] computes the dominant 2D parametric motion between two frames to register the images; the residual displacement—the image regions that cannot be registered—is used to recover the 3D motion. When the object is a 3D plane, we can use a homographic transformation to compute plane-to-plane correspondences between two images; then we recover the actual 3D motion of the plane using the camera geometry [Buenaposada and Baumela, 2002; Lourakis and Argyros, 2006; Simon et al., 2000]. We can also compute the inter-frame displacements by using linear regressors or predictors, and then we robustly adjust the projections to a target model— using RANSAC—to compute the 3D parameters [Zimmermann et al., 2009]. An alternative method is to compute pixel-to-pixel correspondences by using a classifier [Lepetit and Fua, 2006], and then recover the target 3D pose using POSIT [Dementhon and Davis, 1995], or equivalent methods [Lepetit et al., 2009]. 3D Tracking These methods directly compute the actual 3D motion of the object from the image stream. They mainly use a 3D model of the target to compute the motion parameters; the 3D model contains a priori knowledge of the target that improves the estimation of motion parameters (e.g. to get rid of projec- tive ambiguities). The simplest way to represent a 3D target is using a texture model—a set of image patches sensed from one or several reference images—as in [Cobzas et al., 2009; Devernay et al., 2006; Jurie and Dhome, 2002b; Masson et al., 2004; Sepp and Hirzinger, 2003; Xu and Roy-Chowdhury, 2008]. The main drawback of these methods is the lack of robustness against changes in scene illumination, specular reflections. We can alternatively fit the projection of a 3D wireframe model (e.g. a cad model) to the edges of the image [Drum- mond and Cipolla, 2002]. However, these methods have also problems with cluttered backgrounds [Lepetit and Fua, 2005]. To gain robustness, we can use hybrid models of texture and contours such as [Marchand et al., 1999; Masson et al., 2003; Vacchetti et al., 2004], or simply use an additional model to deal with illumination [Romdhani and Vetter, 2003]. 17
  • 40. 2.3.3 Nonrigid Objects Tracking methods for nonrigid objects fall in the same categories that we used for rigid ones. Point-to-point correspondences of the deformable target can recover the pose and/or deformation parameters using subspace methods [Del Bue, 2010; Torresani et al., 2008], or fitting a deformable triangle mesh [Pilet et al., 2008; Salzmann et al., 2007]. We can alternatively fit the 2D silhouette of the target to a 3D skeletal deformable model of the object [Bowden et al., 2000]. Direct estimation of the 3D parameters unifies the processes of matching pixel correspondences, and estimating the pose and deformation of the target. [Brand, 2001; Brand and R.Bhotika, 2001] constrains the optical flow by using a linear generative model to represent the deformation of the object. [Gay-Bellile et al., 2010] models the object 3D deformations, including self-occlusions, by using a set of Radial Basis Functions (rbf). 2.3.4 Facial Motion Capture Estimation of facial motion parameters is a challenging task; head 3D orientation was typically estimated by using fiducial markers to overcome the inherent difficulty of the problem [Bickel et al., 2007]. However, markerless methods have been also developed in recent years. Facial motion capture involves recovering head 3D orientation and/or face deformation due to changes in expression. We first review techniques for recovering head 3D pose, then we review techniques for recovering both pose and expression. Head pose estimation There are numerous techniques to compute head pose or 3D orientation. In the following, we review a number of them—a recent detailed survey on the subject is [Murphy-Chutorian and Trivedi, 2009]. The main difficulty of estimating head pose lies on the nonconvex structure of the human head. Classic 2D approaches such as [Black and Yacoob, 1997; Hager and Belhumeur, 1998] are only suitable to track motions of the head parallel to the image plane: the rea- son is that these methods only use information from a single reference image. To fully recover the 3D rotation parameters of the head we need additional informa- tion. [La Cascia et al., 2000] uses a texture map that was computed by cylindrical projection of different point-of-view images of the head; [Baker et al., 2004a; Jang and Kanade, 2008] also use an analogous cylindrical model. In a similar fashion, we can use a 3D ellipsoid shape [An and Chung, 2008; Basu et al., 1996; Choi and Kim, 2008; Malciu and Prˆeteux, 2000]. Instead of using a cylinder or an ellipsoid, we can have a detailed model of the head like a 3D Morphable Model (3dmm) [Blanz and Vetter, 2003; Mu˜noz et al., 2009; Xu and Roy-Chowdhury, 2008], an aam coupled together with a 3dmm [Faggian et al., 2006], or a triangular mesh model of the face [Vacchetti et al., 2004]. The latter is robustly tracked in [Strom et al., 1999] using an Extended Kalman Filter. We can also have a head model with reduced complexity as in [B. Tordoff et al., 2002]. 18
  • 41. Face expression estimation A change of facial expression induces a deforma- tion in the 3D structure of the face. The estimation of this deformation can be used for face expression recognition, expression detection, or facial motion trans- fer. Classic 2D approaches such as aams [Cootes et al., 2001; Matthews and Baker, 2004] are only suitable to recover expressions from a frontal face. 3D aams are the three-dimensional extension to these 2D methods: they adjust a statistical model of 3D shapes and texture—typically a PCA model—to the pixel intensities of the image [Chen and Wang, 2008; Dornaika and Ahlberg, 2006]. Hybrid methods that combine 2D and 3D aams show both real-time performance and actual 3D head pose estimation: we can use the 3D aams to simultaneously constrain the 2D aams motion and compute the 3D pose [Xiao et al., 2004b], or directly compute the fa- cial motion from the 2D aams parameters [Zhu et al., 2006]. In contrast to pure 2D aams, 3D aams can recover actual 3D pose and expression from faces that are not frontal to the camera. However, the out-of-plane rotations that can be recov- ered by these methods are typically smaller than using a pure 3D model (e.g. a 3dmm). [Blanz and Vetter, 2003; Romdhani and Vetter, 2003] search the best con- figuration for a 3dmm such that the differences between the rendered model and the image are minimal; both methods also show great performance recovering strong fa- cial deformations. Real-time alternatives using 3dmm include [Hiwada et al., 2003; Mu˜noz et al., 2009]. [Pighin et al., 1999] uses a linear combination of 3D face models fitted to match the images to estimate realistic facial expressions. Finally, [Decarlo and Metaxas, 2000] derives an anthropometric physically-based face model that may be adjusted to each individual face target; besides, they solve a dynamic system for the face pose and expression parameters by using optical flow constrained by the edges of the face. 19
  • 42.
  • 43. Chapter 3 Efficient Direct Image Registration 3.1 Introduction This chapter reviews the problem of efficiently registering two images. We define Direct Image Alignment (dia) problem as the process that computes the trans- formation between two frames using only image brightness information. We orga- nize the chapter as follows: Section 3.2 introduces basic registration notions; Sec- tion 3.3 reviews additive registration algorithms such as Lucas-Kanade or Hager- Belhumeur; Section 3.4 reviews compositional registration algorithms such as Baker and Matthews’ Forward Compositional and Inverse Compositional; finally, other methods are reviewed in Section 3.5. 3.2 Modelling Assumptions This section reviews those assumptions on the real world that we use to mathemat- ically model the registration procedure. We introduce the notation on the imaging process through a pinhole camera. We ascertain the Brightness Constancy Assump- tion or Brightness Constancy Constraint (bcc) as the cornerstone of the direct image registration techniques. We also pose the registration problem as an itera- tive optimization problem. Finally, we provide a classification of the existing direct registration algorithms. 3.2.1 Imaging Geometry We represent points of the scene using Cartesian coordinates in R3 (e.g. X = (X, Y, Z)⊤ ). We represent points on the image with homogeneous coordinates, so that the pixel position x = (i, j)⊤ is represented using the notation for augmented points as ˜x = (i, j, 1)⊤ . The homogeneous point ˜x = (x1, x2, x3)⊤ is conversely represented in Cartesian coordinates using the mapping p : P2 → R2 , such that p(˜x) = x = (x1/x3, x2/x3). The scene is imaged through a perfect pin-hole cam- era [Hartley and Zisserman, 2004]; by abuse of notation, we define the perspective 21
  • 44. Figure 3.1: Imaging geometry. An object of the scene is imaged through camera centres C1 and C2 onto two distinct images I1 and I2 (related by a rotation R and a translation t). The point X is projected to the points x1 = p(K I|0 ˜X) and x2 = p(K R − Rt ˜X) in the two images. projection p : R3 → R2 that maps scene coordinates onto image points, x = p(Xc) = k⊤ 1 Xc k⊤ 3 Zc , k⊤ 2 Yc k⊤ 3 Zc ⊤ , where K = (k⊤ 1 , k⊤ 2 , k⊤ 3 )⊤ is the 3 × 3 matrix that contains the camera intrinsics (cf. [Hartley and Zisserman, 2004]), and Xc = (Xc, Yc, Zc)⊤ . We implicitly assume that Xc represents a point in the camera reference system. If the points to project are expressed in an arbitrary reference system of the scene we need an additional mapping; hence, the perspective projection for a point X in the scene is ˜x = K R − Rt X 1 , where R and t are the rotation and translation between the scene and the camera coordinate system (see Figure 3.1). Our input is a smooth sequence of images—i. e. inter-frame differences are small—where It is the t-th frame of the sequence. We de- note T as the reference image or template. Images are discrete matrices of brightness values, although we represent them as functions from R2 to RC , where C is the num- ber of image channels (i.e. C = 3 for colour, and C = 1 for gray-scale images): It(x) is the brightness value at pixel x. For non-discrete pixel coordinates, we use bilinear in- terpolation. If X is a set of pixels, we collect the brightness values of I(x), ∀x ∈ X in a single column vector as I(X)—i.e., I(X) = (I(x1), . . . , I(xN))⊤ , {x1, . . . , xN} ∈ X. 22
  • 45. 3.2.2 Brightness Constancy Constraint The bcc relates brightness information between two frames of a sequence [Hager and Belhumeur, 1998; Irani and Anandan, 1999]. The reference image T is one arbitrary image of the sequence. We define the target region X as a set of pixel coordinates X = {x1, . . . , xN} defined on T (see Figure 3.2). We define the template as the image values of the target region, that is, T (X). Let us assume we know the transformation of the target region between T and another arbitrary image of the sequence, It. The motion model f defines this transformation as Xt = f(X; µt), where the set of coordinates Xt is the target region on It and µt are the motion parameters. The bcc states that the brightness values of the template T and the input image It warped by f with parameters µt should be equal, T (X) = It(f(X; µt)). (3.1) The direct conclusion from Equation 3.1 is that the brightness of the target does not depend on its motion—i.e., the relative position and orientation of the camera with respect the target does not affect the brightness of the latter. However, we may aug- ment the bcc to include appearance changes [Black and Jepson, 1998; Buenaposada et al., 2009; Matthews and Baker, 2004], and changes in illumination conditions due to ambient [Bartoli, 2008; Basri and Jacobs, 2003] or specular lighting [Blanz and Vetter, 2003]. 3.2.3 Image Registration by Optimization Direct image registration is usually posed as an optimization problem. We minimize an error function based on the brightness pixel-wise difference that is parameterized by motion variables: µ∗ = arg min µ {D(X; µ)2 }, (3.2) where D(X; µ) = T (X) − It(f(X; µ)) (3.3) is a dissimilarity measure based on the bcc (Equation 3.1). Descent Methods Recovering these parameters is typically a non-linear problem as it depends on image brightness—which is usually non-linearly related to the motion parameters. The usual approach is iterative gradient-based descent (GD): from a starting point µ0 in the search space, the method iteratively computes a series of partial solu- tions µ1, µ2, . . . µk that, under certain conditions, converge to the local minimizer µ∗ [Madsen et al., 2004] (see Figure 3.2). We typically use Gauss-Newton (GN) methods for efficient registration because they provide good convergence without computing second derivatives (see Appendix A). Hence, the basic GN-based algo- rithm for image registration operates as we outline in Algorithm 1 and depict in Figure 3.3. We describe the four stages of the algorithm in the following: 23
  • 46. Figure 3.2: Iterative gradient descent image registration. Top-left Template image for the registration. We highlight the target region as a green quadrangle. Top- right Image that we register against the template. We generate the image by rotating the image around its centre and translating it in the X-axis. We highlight the corresponding target region in yellow. We also display the initial guess for the optimization as a green quadrangle. Notice that it exactly corresponds to the position of the target region at the template. Bottom-left Contour plot of the image brightness dissimilarity. The axis show the values of the search space: image rotation and translation. We show the successive iterations in the search space: we reach the solution in four steps—µ0 to µ4. Bottom- right We show the target region that corresponds to the parameters of each iteration. The colour of each quadrangle matches the colour of the parameters that generated it as seen in the Bottom-left figure. 24
  • 47. Dissimilarity measure The dissimilarity measure is a function on the image bright- ness error between two images. The usual measure for image registration is the Sum of Squared Differences (ssd), that is, the L2 -norm of the difference of pixel brightness (Equation 3.3) [Brooks and Arbel, 2010; Hager and Bel- humeur, 1998; Irani and Anandan, 1999; Lucas and Kanade, 1981]. However, we can use other measures such as normalized cross-correlation [Brooks and Arbel, 2010; Lewis, 1995], or mutual information [Brooks and Arbel, 2010; Dowson and Bowden, 2008; Viola and Wells, 1997]. Linearize the dissimilarity The next stage linearizes the brightness function about the current search parameters µ; this linearization enables us to transform the problem into a system of linear equations on the search variables. We typically approximate the function using Taylor series expansion; depending on how many terms—derivatives—we compute, we have optimisation methods like Gradient Descent [Amberg and Vetter, 2009], Newton-Raphson [Lucas and Kanade, 1981; Shi and Tomasi, 1994], Gauss-Newton [Baker and Matthews, 2004; Brooks and Arbel, 2010; Hager and Belhumeur, 1998] or even higher- order methods [Benhimane and Malis, 2007; Keller and Averbuch, 2004, 2008; Megret et al., 2008]. This is theoretically a good approximation when the dis- similarity is small [Irani and Anandan, 1999], although the estimation can be improved by using coarse-to-fine iterative methods [Irani and Anandan, 1999], or by selecting appropriate pixels [Benhimane et al., 2007]. Although Taylor series expansion is the usual approach to compute the coefficients of the sys- tem, other approaches such as linear regression [Cootes et al., 2001; Jurie and Dhome, 2002a] or numeric differentiation [Gleicher, 1997] may be used. Compute the descent direction The descent direction is a vector δµ in the search space such that D(µ+δµ) < D(µ). In a GN-based algorithm, we solve the linear system of equations of the previous stage using least-squares [Baker and Matthews, 2004; Madsen et al., 2004]. Note that we do not perform the line search stage—i.e., we implicitly assume that the step size α = 1, cf. Appendix A. Update the search parameters Once we have determined the search direction, δµ, we compute the next point in the series by using the update function U : RP → RP : µ1 = U(µ0, δµ). We compute the dissimilarity value at µ1 to check convergence: if the dissimilarity is below a given threshold, then µ1 is the minimizer µ∗ —i.e., µ∗ = µ1; in other case, we repeat the whole process (i.e. µ1 are the actual current parameters µ) until we find a suitable minimizer. 3.2.4 Additive vs. Compositional We turn our attention to the step 4 of Algorithm 1: how to compute the new es- timation of the optimization parameters. In a GN optimization scheme, the new 25
  • 48. Algorithm 1 Outline of the basic GN-based descent method for image registration On-line: Let µi = µ0 be the initial guess. 1: while no convergence do 2: Compute the dissimilarity function at D(µi). 3: Compute the search direction: linearize the dissimilarity and compute the descent direction, δµi. 4: Update the optimization parameters:µi+1 = U(µi, δµi). 5: end while Figure 3.3: Generic descent method for image registration. We initialize the current parameter estimation at frame It+1 (µ = µ0) using the local minimizer at the previous frame It (µ0 = µ∗ t ). We compute the Dissimilarity Measure between the Im- age and the Template using µ (Equation 3.3). We linearize the dissimilarity measure to compute the descent direction of the search parameters (δµ). We update the search parameters using the search direction and we obtain an approximation to the minimum (µ1). We check if µ1 is a local minimizer by using the brightness dissimilarity: if D is small enough, then µ1 is the local minimizer (µ∗ = µ1); in other case, we repeat the process with using µ1 as the current parameters estimation (µ = µ1). 26
  • 49. parameters are typically computed by adding the former optimization parameters to the search direction vector: µt+1 = µt + δµt (cf. Appendix A); this summation is a direct consequence of the definition of Taylor series [Madsen et al., 2004]. We call additive approaches to those methods that update parameters by using addi- tion [Hager and Belhumeur, 1998; Irani and Anandan, 1999; Lucas and Kanade, 1981]. Nonetheless, Baker and Matthews [Baker and Matthews, 2004] subsequently proposed a GN-based method that updated the parameters using composition— i.e., µt+1 = µt ◦ δµt. We call these methods compositional approaches [Baker and Matthews, 2004; Cobzas et al., 2009; Mu˜noz et al., 2005; Romdhani and Vetter, 2003; Xu and Roy-Chowdhury, 2008]. 3.3 Additive approaches In this section we review some works that use additive update. We introduce the Lucas-Kanade algorithm, the fundamental work on direct image registration. We show the basic algorithm as well as the common problems regarding the method. We also introduce the Hager-Belhumeur approach to image registration and we point out its highlights. 3.3.1 Lucas-Kanade Algorithm The Lucas-Kanade (LK) algorithm [Lucas and Kanade, 1981] solves the registration problem using a GN optimization scheme. The algorithm defines the residuals r of Equation 3.3 as r(µ) ≡ T(x) − I(f(x; µ)). (3.4) The corresponding linear model for these residuals is r(µ + δµ) ≃ ℓ(δµ) ≡ r(µ) + r′ (µ)δµ = r(µ) + J(µ)δµ, (3.5) where r(µ) ≡ T(x) − I(f(x; µ)), and J(µ) ≡ ∂I(f(x; ˆµ) ∂ ˆµ ˆµ=µ . (3.6) Hence, our optimization process amounts to minimise now δµ∗ = arg min δµ {ℓ(δµ)⊤ ℓ(δµ)} = arg min δµ {L(δµ)}. (3.7) We compute the local minimizer of L(δµ) as follows: 0 = L′ (δµ) = ∇δµ r(µ)⊤ r(µ) + 2δµ⊤ J(µ)r(µ) + δµ⊤ J(µ)⊤ J(µ)δµ = J(µ)r(µ) + J(µ)⊤ J(µ)δµ. (3.8) Again, we obtain an approximation to the local minimum at δµ = − J(µ)⊤ J(µ) −1 J(µ)⊤ r(µ), (3.9) which we iteratively refine until we find a suitable solution. We summarize the optimization process in Algorithm 2 and Figure 3.4. 27
  • 50. Algorithm 2 Outline of the Lucas-Kanade algorithm. On-line: Let µi = µ0 be the initial guess. 1: while no convergence do 2: Compute the residual function at r(µi) from Equation 3.4. 3: Linearize the dissimilarity: J = ∇µr(µi) 4: Compute the search direction: δµi = − J(µi)⊤ J(µi) −1 J(µi)⊤ r(µi). 5: Update the optimization parameters:µi+1 = µi + δµi. 6: end while Figure 3.4: Lucas-Kanade image registration. We initialize the current parameter estimation at frame It+1 (µ ≡ µ0) using the local minimizer at the previous frame It (µ0 ≡ µ∗ t ). We compute the dissimilarity residuals between the Image and the Template using µ (Equation 3.4). We linearize the residuals at the current parameters µ, and we compute the descent direction of the search parameters (δµ). We additively update the search parameters using the search direction and we obtain an approximation to the minimum—i.e. µ1 = µ0 + δµ. We check if µ1 is a local minimizer by using the brightness dissimilarity: if D is small enough, then µ1 is the local minimizer (µ∗ ≡ µ1); in other case, we repeat the process with using µ1 as the current parameters estimation (µ ≡ µ1). 28
  • 51. Known Issues The LK algorithm is one instance of a well known technique for object tracking, [Baker and Matthews, 2004]. The most remarkable feature of this algorithm is its robust- ness: given a suitable bcc, the LK algorithm typically ensures a good convergence. However, the algorithm has a series of weaknesses that degrades the overall perfor- mance of the tracking: Computational Cost The LK algorithm computes the Jacobian at each iteration of the optimization loop. Furthermore, the minimization cycle is repeated between each two consecutive frames of the video sequence. The consequence is that the Jacobian is computed F × L times, where F is the number of frames and L is the number of iterations in the optimization loop. The computational burden of these operations is really high if the Jacobian is large: we have to compute the derivatives at each point of the target region, and each point contributes to a row in the Jacobian. As an example, Table 7.15—page 106— compares the computational complexity of LK algorithm with respect to other efficient methods. Local Minima The GN optimization scheme, which is the basis for the LK al- gorithm, is prone to get trapped in local minima. The very essence of the minimization implies that the algorithm converges to the closest minimum to the starting point. So, we must choose the initial guess of the optimiza- tion very carefully to assure convergence to the true optimum. The best way to guarantee that the starting point for tracking and the optimum are close enough is imposing that the differences between consecutive images are small. On the contrary, images with large baseline will cause problems to LK as falling into local minima is more likely, which leads to incorrect alignment. To solve this problem, common to all direct approaches, a pyramidal implementation of the optimization may be used [Bouguet, 2000]. 3.3.2 Hager-Belhumeur Factorization Algorithm We review now an efficient algorithm for determining the motion parameters of the target. The algorithm is similar to LK, but uses a priori information about the tar- get motion and structure to save computation time. The Hager-Belhumeur (HB) or factorization algorithm was first proposed by G. Hager and P. Belhumeur in [Hager and Belhumeur, 1998]. The authors noticed the high computational cost when lin- earizing the brightness error function in the LK algorithm: the dissimilarity depends on each different frame of the sequence, It. The method focuses on how to efficiently compute the Jacobian matrix of step 3 of the LK algorithm (see Algorithm 2). The computation of the Jacobian in the HB algorithm has two separate stages: 1. Gradient replacement The key idea is to use the derivatives at the template T instead of computing the derivatives at frame It when estimating J. Hager and Belhumeur dealt with 29
  • 52. this issue in a very neat way: they noticed that, if the bcc (Equation 3.1) re- lated image and template brightness values, it could possibly relate also image and template derivatives—cf. [Hager and Belhumeur, 1998]. The derivatives of both sides of Equation 3.1 with respect to the target region coordinates are ∇xT (x) =∇xIt(f(x; µt)), x ∈ X, =∇xIt(x)∇xf(x; µ), x ∈ X. (3.10) On the other hand, we compute the Jacobian as J =∇µt It(f(x; µt)), =∇xIt(x)∇µt f(x; µ). (3.11) We isolate the term ∇tIx(x) in Equations 3.10 and 3.11, and we equal the remaining terms as follows: J = ∇xT (x)∇xf(x; µ)−1 ∇µt f(x; µ). (3.12) Notice that in Equation 3.12 the Jacobian depends on the template derivatives, ∇xT (x), which are constant. Using template derivatives speed up the whole process up to 10-fold (cf. Table 7.16–page 106). 2. Factorization Equation 3.12 reveals the internal structure of the Jacobian: it comprises the product of three matrices: a matrix ∇xT (x) that depends on template brightness values and two matrices,∇xf(x; µ)−1 and ∇µt f(x; µ), whose values depend on both the target shape coordinates and the motion parameters µt. The factorization stage re-arranges the Jacobian internal structure such that we speed up the computation of this matrix product. A word about factorization In the literature, matrix factorization or ma- trix decomposition refers to the process that expresses the values of a matrix as the product of matrices of special types. One mayor example is to factorize a matrix A into the product of a lower triangular ma- trix L and and upper triangular matrix U, A = LU. This factorization is called lu decomposition and it allows us to solve the linear system Ax = b more efficiently: solving Ux = L−1 b require fewer additions and multiplications than the original system, [Golub and Van Loan, 1996]. Other famous examples of matrix factorization are spectral decomposi- tion, Cholesky factorization, Singular Value Decomposition (svd) and qr factorization (see [Golub and Van Loan, 1996] for more information). The key concept on using factorization in this problem states as follows: Given a matrix product whose operands contain both constant and variable terms, we want to re-arrange the product such that one operand contains only constant values and the other one only con- tains variable terms. 30
  • 53. We rewrite this idea in equation as follows: J = ∇xT (x)∇xf(x; µ)−1 ∇µt f(x; µ) = S(x)M(µ), (3.13) where S(x) contains only target coordinate values and M(µ) contains only motion parameters. The process to decompose the matrix J into the product S(x)M(µ) is generally ad hoc: we must gain insight of the analytic structure of the matrices ∇xf(x; µ)−1 and ∇µt f(x; µ) to re-arrange their entries into S(x)M(µ) [Hager and Belhumeur, 1998]. This process is not obvious at all and it has been a frequent source of criticism for the HB algorithm [Baker and Matthews, 2004]. However, we shall introduce procedures for systematic factorization in Chapter 5 We outline the basic HB optimization in Algorithm 3.3; notice that the only difference with the LK algorithm lies on the Jacobian computation. We depict the differences more clearly in Figure 3.5: in the dissimilarity linearization stage we use the derivatives of the template instead of the frame. Algorithm 3 Outline of the Hager-Belhumeur algorithm. Off-line: Let µi = µ0 be the initial guess. 1: Compute S(x) On-line: 2: while no convergence do 3: Compute the residual function at r(µi) from Equation 3.4. 4: Compute the matrix M(µi) 5: Compute the Jacobian: J(µi) = S(x)M(µi) 6: Compute the search direction: δµi = − J(µi)⊤ J(µi) −1 J(µi)⊤ r(µi). 7: Update the optimization parameters:µi+1 = µi + δµi. 8: end while 3.4 Compositional approaches From Section 3.2.4 we recall the definition of compositional method: a GN-like optimization method that updates the search parameters using function composition. We review two compositional algorithms: the Forward Compositional (FC) and the Inverse Compositional (IC), [Baker and Matthews, 2004]. A word about composition Function composition is usually defined as the ap- plication of the results of a function onto another. Let f : X → Y, and g : Y → Z be two function applications. We define the composite func- tion g ◦ f : X → Z as (g ◦ f)(x) = g(f(x)). In the literature on image registration the problem is posed as follows: Let f : R2 → R2 be the tar- get motion model parameterized by µ. We compose the target motion as 31
  • 54. Figure 3.5: Hager-Belhumeur image registration. We initialize the current param- eter estimation at frame It+1 (µ ≡ µ0) using the local minimizer at the previous frame It (µ0 ≡ µ∗ t ). We additionally create the matrix S(x) whose entries depend on the target values. We compute the dissimilarity residuals between the Image and the Template using µ (Equation 3.4). Instead of linearizing the residuals, we compute the Jacobian matrix at µ using Equation 3.12, and we solve for the descent direction using Equation 3.9. We additively update the search parameters using the search direction and we obtain an ap- proximation to the minimum— i.e. µ1 = µ0 + δµ. We check if µ1 is a local minimizer by using the brightness dissimilarity: if D is small enough, then µ1 is the local minimizer (µ∗ ≡ µ1); in other case, we repeat the process with using µ1 as the current parameters estimation (µ ≡ µ1). 32
  • 55. z = f(f(x; µ1); µ2) = f(x; µ1 ◦ µ2) ≡ f(x; µ3), that is, the coordinates z are the result of mapping x onto y = f(x; µ1) and y onto z = f(y; µ2). We represent the composite parameters as µ3 = µ1 ◦ µ2 such that z = f(x; µ3). 3.4.1 Forward Compositional Algorithm The FC algorithm was first proposed in [Shum and Szeliski, 2000], although the terminology was introduced in [Baker and Matthews, 2001]: FC is an optimization algorithm, equivalent to the LK approach, that relies in a compositional update step. Compositional algorithms for image registration uses a dissimilarity brightness function slightly different from Equation 3.3; we pose the image registration problem as the following optimization: µ∗ = arg min µ {D(X; µ)2 }, (3.14) with D(X; µ) = T (X) − It+1(f(f(X; µ); µt)), (3.15) where µt comprises the optimal parameters at the image It. Note that our search variables µ are those parameters that should be composed with the current estima- tion to yield the minimum. The residuals corresponding to Equation 3.15 are r(µ) ≡ T(x) − It+1(f(f(x; µ); µt)), (3.16) As in the LK algorithm, we compute the linear model of the residuals, but now at the point µ = 0 in the search space: r(0 + δµ) ≃ ℓ(δµ) ≡ r(0) + r′ (0)δµ = r(0) + J(0)δµ, (3.17) where r(0) ≡ T(x) − It+1(f(f(x; 0); µt)), and J(0) ≡ ∂It+1(f(f(x; ˆµ); µt) ∂ ˆµ ˆµ=0 . (3.18) Notice that, in this case, µt acts as a constant in the derivative. Again, the local minimizer is δµ = − J(0)⊤ J(0) −1 J(0)⊤ r(0). (3.19) We iterate the above procedure until convergence. The next point in the iterative series is not computed as µt+1 = µt +δµ, but as µt+1 = µt ◦δµ to be coherent with Equation 3.16. Also notice that the Jacobian J(0) (Equation 3.18) is not constant as it depends both in the image It+1 and the parameters µt. Figure 3.6 shows a graphical depiction of the algorithm that is outlined in Algorithm 4. 33
  • 56. Algorithm 4 Outline of the Forward Compositional algorithm. On-line: Let µi = µ0 be the initial guess. 1: while no convergence do 2: Compute the residual function at r(µi) from Equation 3.16. 3: Linearize the dissimilarity: J = ∇ˆµr(0), using Equation 3.18. 4: Compute the search direction: δµi = − J(0)⊤ J(0) −1 J(0)⊤ r(0). 5: Update the optimization parameters:µi+1 = µi ◦ δµi. 6: end while Figure 3.6: Forward compositional image registration. We initialize the current parameter estimation at frame It+1 (µ ≡ µ0) using the local minimizer at the previous frame It (µ0 ≡ µ∗ t ). We compute the dissimilarity residuals between the Image and the Template using µ (Equation 3.15). We linearize the residuals at µ = 0 and we compute the descent direction δµ using Equation 3.19. We update the parameters using function composition— i.e. µ1 = µ0 ◦ δµ. We check if µ1 is a local minimizer by using the brightness dissimilarity: if D (Equation 3.15) is small enough, then µ1 is the local minimizer (µ∗ ≡ µ1); in other case, we repeat the process with using µ1 as the current parameters estimation (µ ≡ µ1). 34
  • 57. 3.4.2 Inverse Compositional Algorithm The IC algorithm reinterprets the FC optimization scheme by changing the roles of the template and the image. The key feature of IC is that its GN Jacobian is constant: we compute the Jacobian using only template brightness values, therefore it is constant. Using a constant Jacobian speeds up the whole computation as the linearization stage is the most critical in time. The IC algorithm receives its name because we reverse the roles of the template and the current frame (i.e. we compute the Jacobian on the template). We rewrite the residuals function from FC (Equation 3.16) as follows: r(µ) ≡ T(f(x; µ)) − It+1(f(x; µt)), (3.20) yielding the residuals for IC. Notice that the template brightness values now depend on the search parameters µ. We linearize the Equation 3.20 around the point µ = 0 in the search space: r(0 + δµ) ≃ ℓ(δµ) ≡ r(0) + r′ (0)δµ = r(0) + J(0)δµ, (3.21) where r(0) ≡ T(f(x; 0)) − It+1(f(x; µt)), and J(0) ≡ ∂T(f(x; ˆµ)) ∂ ˆµ ˆµ=0 . (3.22) We compute the local minimizer of Equation 3.7 by deriving it respect δµ and equalling to zero, 0 = L′ (δµ) = ∇δµ r(0)⊤ r(0) + 2δµ⊤ J(0)r(0) + δµ⊤ J(0)⊤ J(0)δµ = J(0)r(0) + J(0)⊤ J(0)δµ. (3.23) Again, we obtain an approximation to the local minimum at δµ = − J(0)⊤ J(0) −1 J(0)⊤ r(0), (3.24) which we iteratively refine until we find a suitable solution. We summarize the optimization process in Algorithm 5 and Figure 3.7. Note that the Jacobian matrix J(0) is constant as it is computed on the template image—which is fixed—at the point µ = 0 (cf. Equation 3.22). Notice that the crucial point of the derivation of the algorithm lies in the change of variables in Equation 3.20. Solving for the search direction only consists on computing the IC residuals and computing the least-squares approximation (Equation 3.24). The Dissimilarity Linearization stage from Algorithm 1 is no longer required, which results in a boost of the performance of the algorithm. 35
  • 58. Algorithm 5 Outline of the Inverse Compositional algorithm. Off-line: Compute J(0) = ∇µr(0) using Equation 3.22. On-line: Let µi = µ0 be the initial guess. 1: while no convergence do 2: Compute the residual function at r(µi) from Equation 3.20. 3: Compute the search direction: δµi = − J(0)⊤ J(0) −1 J(0)⊤ r(0). 4: Update the optimization parameters:µi+1 = µi ◦ δµ−1 i . 5: end while Figure 3.7: Inverse compositional image registration. We initialize the current parameter estimation at frame It+1 (µ ≡ µ0) using the local minimizer at the previous frame It (µ0 ≡ µ∗ t ). At this point we compute the Jacobian J(0) using Equation 3.22. We compute the dissimilarity residuals between the Image and the Template using µ (Equation 3.15). Using J(0) we compute the descent direction δµ (Equation 3.24). We update the parameters using inverse function composition— i.e. µ1 = µ0 ◦ δµ−1 . We check if µ1 is a local minimizer by using the brightness dissimilarity: if D (Equation 3.15) is small enough, then µ1 is the local minimizer (µ∗ ≡ µ1); in other case, we repeat the process with using µ1 as the current parameters estimation (µ ≡ µ1). 36
  • 59. Relevance of IC The IC algorithm is known to be the most efficient optimization technique for direct image registration [Baker and Matthews, 2004]. The algorithm was initially pro- posed for template tracking, although it was later improved to use aams [Matthews and Baker, 2004], register 3D Morphable Models [Romdhani and Vetter, 2003; Xu and Roy-Chowdhury, 2008], account for photometric changes [Bartoli, 2008] and allow for appearance variation [Gonzalez-Mora et al., 2009]. Some efficient algorithms using a constant residual Jacobian with additive in- crements have been proposed in literature but no one shows reliable performance: in [Cootes et al., 2001] an iterative regression-based gradient scheme is proposed to align AAM to frontal images of faces. The regression matrix (similar to our Jaco- bian matrix) is numerically computed off-line and it remains constant during the Gauss-Newton optimisation. The method shows good performance because the so- lution does not depart far from the initial guess. The method is revisited in [Donner et al., 2006] using Canonical Correlation Analysis instead of numerical differentia- tion to achieve better convergence rate and range. In [La Cascia et al., 2000] the authors propose a Gauss-Newton scheme with constant Jacobian matrix for 6-dof 3D tracking of heads. The method needs regularisation constraints to improve the convergence of the optimisation. Recently, [Brooks and Arbel, 2010] augmented the scope of the IC framework with the Generalized Inverse Compositional (GIC) image registration: they propose an additive update to the parameters that is equivalent to the compositional update from IC; therefore, they can adapt the IC to other optimization methods than GN, such as Broyden-Fletche-Goldfarb-Shanno (bfgs) [Press et al., 1992]. 3.5 Other Methods Iterative gradient-based optimization algorithms (see Figure 3.4) can improve their efficiency in two different ways: (1) by speeding up the linearization of the dissim- ilarity function, and (2) by reducing the number of iterations of the process. The algorithms that we have presented—i.e. HB and IC—belong to the first type. The second type of methods achieve efficiency by using a more involved linearization that converges faster to the minimum. [Averbuch and Keller, 2002] approximates the error function in both the template and the current image and average the least-squares solution to both. They show it converges with less iterations than LK although the time per iteration is higher. Malis et. al [Benhimane and Malis, 2007] propose a similar method called Efficient Second-Order Minimization (esm) which differs from the latter in using an efficient linearization on the template by means of Lie algebra properties. Recently, both methods have been revisited and reformulated in a common Bi-directional Framework in [Megret et al., 2008]. [Keller and Averbuch, 2008] derives a high-order approximation to the error function that leads to a faster algorithm with a wider convergence basin. Unfortunately—with the exception of esm—none of these algorithm are appropriate for real-time image 37
  • 60. registration. 3.6 Summary We have introduced the basic concepts on direct image registration. We pose the reg- istration problem as the result of gradient-descent optimizing a dissimilarity function based on brightness differences. We classify the direct image registration algorithms as either additive or compositional: in the former group we highlight the LK and the HB algorithms, whereas the FC and IC algorithms belong to the latter. 38
  • 61. Chapter 4 Equivalence of Gradients In this chapter we introduce the concept of Equivalence of Gradients, that is, the process of replacing the gradient of a brightness function for an equivalent alterna- tive. In chapter 3 we have shown that some efficient algorithms for direct image registration use a gradient replacement technique as a basis for their speed improve- ment: (1) HB algorithm transforms the template derivatives using the target warp to yield the image derivatives; and (2) IC algorithm replaces the image derivatives by the template derivatives without any modification, but they change the parameters update rule so the GN-like optimization converges. We introduce a new constraint, the Gradient Equivalence Equation, and we show that this constraint is a necessary requirement for the high computational efficiency of both HB and IC algorithms. We organize the chapter as follows: Section 4.1 introduces the basic concepts on image gradients in R2 , and its extension to higher dimension spaces such as P2 and R3 ; Section 4.2 introduces the Gradient Equivalence Equation, that shall be subsequently used to impose some requirements on the registration algorithms. 4.1 Image Gradients We introduce the concept of gradient of a scalar function below. We consider images as functions in two dimensions that assign a brightness value to an image pixel position. The Concept of Gradient The gradient of a scalar function f : Rn → R at a point x ∈ Rn is a vector ∇f(x) ∈ Rn that points towards the direction of greatest rate of increase of f(x). The length of the gradient vector |∇f(x)| is the greatest rate of change of the function. Image Gradients Grayscale images are discrete scalar functions I : R2 → R ranging from 0 (black) to 255 (white)—see Figure 4.1. We turn our attention to grayscale images, but we may deal with colour-channelled images (e.g. rgb images) by simply considering them as one grayscale image per colour plane. Grayscale 39
  • 62. images are discrete functions: we represent an image as a matrix whose elements I(i, j) are the brightness function values. We continuously approximate the discrete function by using interpolation (see Figure 4.1). We introduce the image gradients in the most common domains in Computer Vision—R2 , P2 , and R3 . Image gradients are naturally defined in R2 , since the images are functions defined in such domain. In some Computer Vision applications the domain of x, D, is not constrained to R2 , but to P2 [Buenaposada and Baumela, 2002; Cobzas et al., 2009], or to R3 [Sepp, 2006; Xu and Roy-Chowdhury, 2008]. In the following, the target coordinates are expressed in a domain D ∈ {R3 , P2 }, so we need a projection function to map the target coordinates onto the image. We generically define the projection mapping as p : D → R2 . The corresponding projectors are the homogeneous to Cartesian mapping, p : P2 → R2 , and the perspective projection, p : R3 → R2 . Image gradients in domains other than R2 are computed by using the chain rule with the projector p : Rn → R2 : ∇ˆx(I ◦ p(x)) = ∇ˆxI(p(x)) = ∇ˆxI(x)∇ˆxp(x), =   ∂I( ˆX) ∂ ˆX ˆX=p(x)   ∂p( ˆY) ∂ ˆY ˆY=x , = ∇ ˆp(x)I(p(x))∇ˆxp(x), x ∈ D ⊂ Rn . (4.1) Equation 4.1 represents image gradients in domain D as the image gradient in R2 lifted up onto the higher-dimension space D by means of the Jacobian matrix ∇ˆxp(x). Notation We use operator [ ] to denote the composite function I ◦ p, that is, I(p(x)) = I[x]. 4.1.1 Image Gradients in R2 If the target and its kinematics are expressed in R2 , there is no need to use a projector as both the target and the image share a common reference frame. The gradient of a grayscale image at point x = (i, j)⊤ is the vector ∇ˆxI(x) = (∇iI(x), ∇jI(x)) = ∂I(x) ∂i , ∂I(x) ∂j , (4.2) that flows from the darker areas of the image to the brighter ones (see Figure 4.1). Moreover, the direction of the gradient vector at point x ∈ R2 is orthogonal to the level set of the brightness function at the point (see Figure 4.1). 40
  • 63. Figure 4.1: Depiction of Image Gradients. (Top-left) An image is a rectangular array where each element is a brightness value. (Top-right) Continuous representation of the image brightness values; we compute the values from the discrete array by interpo- lation. (Bottom-left) Image gradients are vectors from each image array element in the direction of maximum increase of brightness (compare to the top-right image). (Bottom- right) Gradient vectors are orthogonal to the brightness function contour curves. Legend: blue Gradient vectors. different colours Contour curves. 41