Lec16: Medical Image Registration (Advanced): Deformable Registration
1. MEDICAL IMAGE COMPUTING (CAP 5937)
LECTURE 16: Medical Image Registration II (Advanced):
Deformable Registration
Dr. Ulas Bagci
HEC 221, Center for Research in Computer
Vision (CRCV), University of Central Florida
(UCF), Orlando, FL 32814.
bagci@ucf.edu or bagci@crcv.ucf.edu
1SPRING 2017
36. Example: Elastic Registration
• Model the deformation as a physical process resembling the
stretching of an elastic material
– The physical process is governed by the internal force & external force
– described by the Navier linear elastic partial differential equation
• The external force drives the registration process
– The external force can be the gradient of a similarity measure
• e.g. local correlation measure based on intensities, intensity differences or
intensity features such as edge and curvature
– Or the distance between the curves and surfaces of corresponding
anatomical structures.
36
37. Example: Free-Form Deformation (FFD)
• The general idea is to deform an image by manipulating a
regular grid of control points that are distributed across the
image at an arbitrary mesh resolution.
• Control points can be moved and the position of individual
pixels between the control points is computed from the
positions of surrounding control points.
37
40. Local Motion Model
• The affine transformation captures only the global motion.
• An additional transformation is required, which models the
local deformation
40
41. Local Motion Model
• The affine transformation captures only the global motion.
• An additional transformation is required, which models the
local deformation
41
50. Interpolation
• For computation of the cost function, the new value is
evaluated at non-voxel positions, for which intensity
interpolation is needed.
50
51. Interpolation
• For computation of the cost function, the new value is
evaluated at non-voxel positions, for which intensity
interpolation is needed.
51
(nearest neighbor)
52. Interpolation
• For computation of the cost function, the new value is
evaluated at non-voxel positions, for which intensity
interpolation is needed.
52
(tri-linear)
63. Sampling Strategy
• The most straightforward strategy is to use all voxels
from the fixed image, which has as an obvious
downside that it is time-consuming for large images.
63
64. Sampling Strategy
• The most straightforward strategy is to use all voxels
from the fixed image, which has as an obvious
downside that it is time-consuming for large images.
• A common approach is to use a subset of voxels,
selected on a uniform grid, or sampled randomly!
64
65. Sampling Strategy
• The most straightforward strategy is to use all voxels
from the fixed image, which has as an obvious
downside that it is time-consuming for large images.
• A common approach is to use a subset of voxels,
selected on a uniform grid, or sampled randomly!
• Another strategy is to pick only those points that are
located on striking image features, such as edges
65
66. Sampling Strategy
• The most straightforward strategy is to use all voxels
from the fixed image, which has as an obvious
downside that it is time-consuming for large images.
• A common approach is to use a subset of voxels,
selected on a uniform grid, or sampled randomly!
• Another strategy is to pick only those points that are
located on striking image features, such as edges
• 2K samples are often enough for good performance!
66
67. Sampling Strategy
• The most straightforward strategy is to use all voxels
from the fixed image, which has as an obvious
downside that it is time-consuming for large images.
• A common approach is to use a subset of voxels,
selected on a uniform grid, or sampled randomly!
• Another strategy is to pick only those points that are
located on striking image features, such as edges
• 2K samples are often enough for good performance!
• Masks can be used to select a region of interest or to
avoid the undesired alignment of artificial edges in
the images
67
69. Deformation Model
69
Small deformation models do not necessarily
enforce a one-to-one mapping.
If the inverse transformation is correct,
these two should be identity!
70. Diffeomorphic Registration
• The large-deformation or diffeomorphic setting is a much
more elegant framework. A diffeomorphism is a globally one-
to-one (objective) smooth and continuous mapping with
derivatives that are invertible.
70
71. Diffeomorphic Registration
• The large-deformation or diffeomorphic setting is a much
more elegant framework. A diffeomorphism is a globally one-
to-one (objective) smooth and continuous mapping with
derivatives that are invertible.
• If the mapping is not diffeomorphic, then topology is not
necessarily preserved.
71
72. Diffeomorphic Registration
• The large-deformation or diffeomorphic setting is a much
more elegant framework. A diffeomorphism is a globally one-
to-one (objective) smooth and continuous mapping with
derivatives that are invertible.
• If the mapping is not diffeomorphic, then topology is not
necessarily preserved.
• it is easier to parameterize using a number
of velocity fields corresponding to different time periods over
the course of the evolution of the diffeomorphism (consider
u(t) as a velocity field at time t
72
73. Diffeomorphic Registration
• The large-deformation or diffeomorphic setting is a much
more elegant framework. A diffeomorphism is a globally one-
to-one (objective) smooth and continuous mapping with
derivatives that are invertible.
• If the mapping is not diffeomorphic, then topology is not
necessarily preserved.
• it is easier to parameterize using a number
of velocity fields corresponding to different time periods over
the course of the evolution of the diffeomorphism (consider
u(t) as a velocity field at time t
• Diffeomorphisms are generated by initializing with an identity
transform and integrating over unit time to obtain
73
75. DARTEL: A Fast Diffeomorphic Method
(Ashburner, NeuroImage 2007)
Diffeomorphic
Anatomical
Registration
Through
Exponentiated
Lie Algebra
75
76. DARTEL: A Fast Diffeomorphic Method
(Ashburner, NeuroImage 2007)
• The DARTEL model assumes a flow field (u) that remains
constant over time. With this model, the differential equation
describing the evolution of a deformation is
76
77. DARTEL: A Fast Diffeomorphic Method
(Ashburner, NeuroImage 2007)
• The DARTEL model assumes a flow field (u) that remains
constant over time. With this model, the differential equation
describing the evolution of a deformation is
• The Euler method is a simple integration approach to extend
from identity (initial) transform, which involves computing new
solutions after many successive small time-steps (h).
77
78. DARTEL: A Fast Diffeomorphic Method
(Ashburner, NeuroImage 2007)
• The DARTEL model assumes a flow field (u) that remains
constant over time. With this model, the differential equation
describing the evolution of a deformation is
• The Euler method is a simple integration approach to extend
from identity (initial) transform, which involves computing new
solutions after many successive small time-steps (h).
• Each of these Euler steps is equivalent to
78
79. DARTEL: A Fast Diffeomorphic Method
(Ashburner, NeuroImage 2007)
• The use of a large number of small time steps will produce a
more accurate solution, for instance (8 steps)
79
80. Optimization of DARTEL
• Image registration procedures use a mathematical model to
explain the data. Such a model will contain a number of
unknown parameters that describe how an image is
deformed.
80
81. Optimization of DARTEL
• Image registration procedures use a mathematical model to
explain the data. Such a model will contain a number of
unknown parameters that describe how an image is
deformed.
• A true diffeomorphism has an infinite number of dimensions
and is infinitely differential.
81
82. Optimization of DARTEL
• Image registration procedures use a mathematical model to
explain the data. Such a model will contain a number of
unknown parameters that describe how an image is
deformed.
• A true diffeomorphism has an infinite number of dimensions
and is infinitely differential.
• The discrete parameterization of the velocity field, u(x), can
be considered as a linear combination of basis functions.
82
ρi(x) is the ith first degree B-spline basis
function at position x
v is a vector of coefficients
83. Optimization of DARTEL
• The aim is to estimate the single “best” set of values for these
parameters (v). The objective function, which is the measure
of “goodness”, is formulated as the most probable
deformation, given the data (D).
83
84. Optimization of DARTEL
• The aim is to estimate the single “best” set of values for these
parameters (v). The objective function, which is the measure
of “goodness”, is formulated as the most probable
deformation, given the data (D).
• The objective is to find the most probable parameter values
and not the actual probability density, so this factor is ignored.
The single most probable estimate of the parameters is
known as the maximum a posteriori (MAP) estimate.
84
85. Optimization of DARTEL
• The aim is to estimate the single “best” set of values for these
parameters (v). The objective function, which is the measure
of “goodness”, is formulated as the most probable
deformation, given the data (D).
• The objective is to find the most probable parameter values
and not the actual probability density, so this factor is ignored.
The single most probable estimate of the parameters is
known as the maximum a posteriori (MAP) estimate.
Or
85
86. Optimization of DARTEL
• Many nonlinear registration approaches search for a
maximum a posteriori (MAP) estimate of the parameters
defining the warps, which corresponds to the mode of the
probability density
86
87. Optimization of DARTEL
• Many nonlinear registration approaches search for a
maximum a posteriori (MAP) estimate of the parameters
defining the warps, which corresponds to the mode of the
probability density
• The Levenberg–Marquardt (LM) algorithm is a very good
general purpose optimization strategy
87
88. Optimization of DARTEL
• Many nonlinear registration approaches search for a
maximum a posteriori (MAP) estimate of the parameters
defining the warps, which corresponds to the mode of the
probability density
• The Levenberg–Marquardt (LM) algorithm is a very good
general purpose optimization strategy
88
A: second order tensor field, H: concentration matrix, b: first derivative of likelihood func.
89. Optimization of DARTEL
Simultaneously minimize the sum of
– Likelihood component
• Sum of squares difference
• ½ ∑i∑k(tk(xi) – μk(φ(1)(xi)))2
• φ(1) parameterized by u
– Prior component
• A measure of deformation roughness
• ½uTHu
89
90. Optimization of DARTEL
PRIOR TERM
• ½uTHu
• DARTEL has three different models for H
– Membrane energy
– Linear elasticity
– Bending energy
• H is very sparse
• H: deformation roughness
90
An example H for 2D
registration of 6x6
images (linear elasticity)
91. Optimization of DARTEL
LIKELIHOOD TERM
• Images assumed to be partitioned into different tissue
classes.
– E.g., a 3 class registration simultaneously matches:
• Grey matter with grey matter
• White matter wit white matter
• Background (1 – GM – WM) with background
91
92. “Membrane Energy”
Convolution KernelSparse Matrix Representation
Penalizes first derivatives.
Sum of squares of the elements of the Jacobian (matrices) of the
flow field.
94. “Linear Elasticity”
• Decompose the Jacobian of the flow field into
– Symmetric component
• ½(J+JT)
• Encodes non-rigid part.
– Anti-symmetric component
• ½(J-JT)
• Encodes rigid-body part.
• Penalise sum of squares
of symmetric part.
• Trace of Jacobian
encodes volume changes.
Also penalized.
95. Gauss-Newton Optimization
• Uses Gauss-Newton
– Requires a matrix solution to a very large set of equations at each
iteration
u(k+1) = u(k) - (H+A)-1 b
– b are the first derivatives of objective function
– A is a sparse matrix of second derivatives
– Computed efficiently, making use of scaling and squaring
95
97. Large Deformation Model
• In the small deformation model, the displacements are stored
at each voxel w.r.t. their initial position.
97
98. Large Deformation Model
• In the small deformation model, the displacements are stored
at each voxel w.r.t. their initial position.
• Since regularization acts on the displacement field, highly
localized deformations cannot be modeled since the
deformation energy caused by stress increases proportionally
with the strength of the deformation.
98
99. Large Deformation Model
• In the small deformation model, the displacements are stored
at each voxel w.r.t. their initial position.
• Since regularization acts on the displacement field, highly
localized deformations cannot be modeled since the
deformation energy caused by stress increases proportionally
with the strength of the deformation.
• In the large deformation model, the displacement is
generated via a time dependent velocity field v
99
100. Large Deformation Model
• In the small deformation model, the displacements are stored
at each voxel w.r.t. their initial position.
• Since regularization acts on the displacement field, highly
localized deformations cannot be modeled since the
deformation energy caused by stress increases proportionally
with the strength of the deformation.
• In the large deformation model, the displacement is
generated via a time dependent velocity field v
100
where u(x,y,z,0)=(x,y,z).
120. Example: CNN based Registration!
120
address the two major limitations of
existing intensity-based 2-D/3-D
registration technology:
1) slow computation and
2) small capture range.
directly estimates transformation parameters
unlike iterative optimization
121. Example: CNN based Registration!
1. Parameter Space Partition (PSP)
– partition the transformation parameter space into zones and train
CNN regressors in each zone separately, to break down the complex
regression task into multiple simpler sub-tasks.
2. Local Image Residual (LIR)
– simplify the underlying mapping to be captured by the regressors. LIR
is calculated as the difference between the DRR rendered using
transformation parameters t, and the X-ray image in local patches.
3. Hierarchical Parameter Regression
– decompose the transformation parameters and regress them in a
hierarchical manner, to achieve highly accurate parameter estimation
in each step.
4. CNN Regression Model
121
122. Example: CNN based Registration!
122
Workflow of LIR feature
extraction, demonstrated on
X-ray Echo Fusion
data.
The local ROIs determined
by the 3-D points and the
transformation
parameters are shown as red
boxes.
The blue box shows a large
ROI that covers the entire
object.
126. Landmark Based Metrics
• Identify a set of anatomical landmarks, and measure the
distance between them before/after registration
126
127. Landmark Based Metrics
• Non-rigid registration error can be quantified with certainty
only at the available landmarks and with increasing
uncertainty as distance from these landmarks increases.
127
128. Landmark Based Metrics
• Non-rigid registration error can be quantified with certainty
only at the available landmarks and with increasing
uncertainty as distance from these landmarks increases.
• A very dense set of landmarks, which is generally not
available, would thus be required to gain a complete, global
understanding of registration accuracy.
128
130. Jacobian Determinants
• The Jacobian determinant is used to define a measure for the
plausibility of the transformation by calculating the number of
voxels with a negative value or a value of zero:
130
A value det(∇φ(x)) > 1 implies an expansion of volume, 0 < det(∇φ(x)) < 1
indicates a contraction
φ(x): deformation field
131. However,…..
• Intensity, landmark, and segmentation based methods are
highly unreliable !!!!
131
CURT: Completely Useless
Registration Tool
132. However,…..
• Intensity, landmark, and segmentation based methods are
highly unreliable !!!!
132
CURT: Completely Useless
Registration Tool
Tissue overlap measures (Jaccard index)
133. Inverse Consistency Error
• For each pair of fixed and moving images, A and B, and for
each registration algorithm, the inverse consistency error
between the forward transformation, TAB, and the backward
transformation, TBA, is computed as follows:
133
134. Rohlfing’s TMI Paper (2012) suggests that
• tissue overlap, image similarity, and inverse consistency error
are not reliable surrogates for registration accuracy, whether
they are used in isolation or combined.
134
135. Rohlfing’s TMI Paper (2012) suggests that
• tissue overlap, image similarity, and inverse consistency error
are not reliable surrogates for registration accuracy, whether
they are used in isolation or combined.
• Ideally, actual registration errors measured at a large number
of densely distributed landmarks (i.e., identifiable anatomical
locations) should become the standard for reporting
registration errors.
135
136. Rohlfing’s TMI Paper (2012) suggests that
• tissue overlap, image similarity, and inverse consistency error
are not reliable surrogates for registration accuracy, whether
they are used in isolation or combined.
• Ideally, actual registration errors measured at a large number
of densely distributed landmarks (i.e., identifiable anatomical
locations) should become the standard for reporting
registration errors.
• make the evaluation of registration performance as
independent as possible from the registration itself. Some
obvious dependencies exist between images used for
registration and features derived from them for evaluation.
136
138. Slide Credits and References
• Darko Zikic, MICCAI 2010 Tutorial
• Stefan Klein, MICCAI 2010 Tutorial
• Marius Staring, MICCAI 2010 Tutorial
• J. Ashburner, NeuroImage 2007.
• M. F. Beg, M. I. Miller, A. Trouvé and L. Younes. “Computing
Large Deformation Metric Mappings via Geodesic Flows of
Diffeomorphisms”. International Journal of Computer Vision
61(2):139–157 (2005).
• M. Vaillant, M. I. Miller, L. Younes and A. Trouvé. “Statistics on
diffeomorphisms via tangent space representations”.
NeuroImage 23:S161–S169 (2004).
• L. Younes, “Jacobi fields in groups of diffeomorphisms and
applications”. Quart. Appl. Math. 65:113–134 (2007).
• www.nirep.org
• Rueckert and Schnabel’s Chapter 5, Medical Image
Registration, Springer 2010. Biomedical Image Processing.
138