Learning to Reconstruct

Learning to reconstruct
Jonas Adler
Department of Mathematics
KTH - Royal Institute of Technology
Research Scientist
Research and Physics Group
Elekta

A glossary
Artiﬁcial Intelligence: Any technique
that enables computers to mimic
human intelligence
Machine Learning: A subset of AI
using statistical techniques to enable
machines to learn from data
Deep Learning: A subset of ML
where complicated tasks are
performed by breaking them down into
several layers

Why do we need machine learning?
Task: Identify a rabbit in an image

Why do we need machine learning?
Task: Identify a rabbit in an image
Proposed solution: If the animal is within this range of colors and has long ears and fur
and has a slightly elliptical shape and has a nose like... then it is a rabbit

Machine Learning
Supervised learning: Inferring a function from training data.
Example: Image classiﬁcation, Translation, Caption generation
Reinforcement Learning: Learning the best ”action” by experimentation.
Example: Playing Go
Unsupervised learning: Describing the hidden structure in training data.
Example: PCA, Clustering, Density estimation, Generative models

Supervised learning
We want to approximate an (unknown) operator A: X → Y
We are given training data (f, g) which is a X × Y valued random variable
such that A(f) ≈ g.
Finite data: (f, g) uniform distribution on {(fi , gi )}i
We give a class of operators AΘ : X → Y
Parametrized by Θ which we learn
Selected by optimization of a loss function L(Θ)
Θ∗
= arg min
Θ
L(Θ)
mean L2 sup L2 mean cross-entropy
E(f,g) AΘ(f) − g 2
2 sup
(f,g)
AΘ(f) − g 2
2 E(f,g) − AΘ(f) log g

Inverse problems
What is an inverse problem?
g = T (ftrue) + δg.
g ∈ Y tomographic data (sinogram)
T : X → Y forward operator
ftrue ∈ X image we want to recover
δg ∈ Y noise

Solution methods
Analytic pseudoinverse (FBP, FDK)
f = T †
(g)
Iterative methods (ART, SART)
fi+1 = fi − ω T ∗
(T (fi ) − g)
Variational methods (TV, TGV, Huber)
f = arg min
f
|| T (f ) − g||2
Y + λ|| f ||1

Variational methods
Strategy, solve an optimization problem:
f = arg min
f
|| T (f ) − g||2
Y + λ|| f ||1
Equivalent to Maximum a Posteriori (MAP)
f = arg max
f
P(f | g)
= arg max
f
P(g|f )
e
−|| T (f )−g||2
Y
P(f )
e−λ|| f ||1

Variational methods
Several issues:
Prior is typically unknown - have to ”guess”
Parameters (λ) need to be selected
Large computational burden

Solution methods
Analytic pseudoinverse (FBP, FDK)
f = T †
(g)
Iterative methods (ART, SART)
fi+1 = fi − ω T ∗
(T (fi ) − g)
Variational methods (TV, TGV, Huber)
f = arg min
f
|| T (f ) − g||2
Y + λ|| f ||1
Machine learning
f = T †
Θ(g)

Learned priors
Idea: Instead of selecting the prior, learn it!
Parameter selection
Bilevel parameter learning for higher-order total variation regularisation models
De Los Reyes et. al. J Math Imaging Vis 2017
Dictionary learning
Low-dose x-ray ct reconstruction via dictionary learning
Xu et. al. IEEE TMI 2012
Learned proximal
Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging
Problems
Meinhardt et. al. ICCV 2017
Scattering transform
Inverse problems with invariant multiscale statistics
Dokmanic et. al. CoRR 2016
Problem: Does not solve the computational burden!

Learned solvers
Idea: Learn an optimization solver
Learned ISTA
Learning Fast Approximations of Sparse Coding
Gregor and LeCun, ICML 2010
Problem: Does not solve the prior problem!
Observation: We need to learn both the prior and how to reconstruct!

Machine Learning in imaging
End goal, reconstruction operator T †
Θ : Y → X:
g = T (ftrue) + δg =⇒ T †
Θ(g) ≈ ftrue
Parametrized by Θ which we learn
Selected by optimization of a loss function L(Θ)

Example: L2 loss
Example: L2-loss of reconstruction
L(Θ) = E(f,g)∼µ
1
2
T †
Θ(g) − f
2
X
.
(g, f)-µ distributed training data (e.g. high dose CTs, simulation)
”Learning” by Stochastic Gradient Descent (SGD):
Θi+1 = Θi − α L(Θi )
With the above loss:
L(Θ) = E(f,g)∼µ [∂Θ T †
Θ(g)]∗
T †
Θ(g) − f
Observation: The reconstruction operator must be diﬀerentiable w.r.t Θ!

Learned inversion methods
Fully learned
Learned post-processing
Learned iterative schemes

Fully learned reconstruction
Goal: Learn ”the whole” mapping from data to signal
Tomographic image reconstruction based on artiﬁcial neural network (ANN)
techniques
Argyrou et. al. NSS/MIC 2012
Tomographic image reconstruction using artiﬁcial neural networks.
Paschalis et. al. Nucl Instrum Methods Phys Res A 2004
Problem: T typically has symmetries, but the network has to learn them.
Example: 3d CBCT, data: 332 × 720 × 780 (pixels) and 448 × 448 × 448 voxels.
332 × 720 × 780 × 448 × 448 × 448 ≈ 1016
connections!

Learning a post-processing
Goal: Use deep learning to improve the result of another reconstruction
Structure:
T †
Θ(g) = ΛΘ T †
(g)
where T †
is some reconstruction (FBP, TV, . . . ) and ΛΘ is a learned post-processing
operator.
Allows separation of inversion and learning, data can be seen as (T †
(g), ftrue).
The problem becomes an image processing problem =⇒ easy to solve.

In wavelet space
Can also add more structure (e.g. denoise in transform domain)
ΛΘ = W −1
◦ ˆΛΘ ◦ W
W is some transform (Fourier, Wavelet, Shearlet, etc)
Won AAPM Low-Dose CT Grand Challenge:
A deep convolutional neural network using directional wavelets for low-dose X-ray CT
reconstruction
Kang et. al. 2016

Both pre-and post-processing
Can also do both pre-processing (of data) and post processing (of the reconstruction)
T †
Θ(g) = ΛΘ T †
ΓΘ(g)
Does not admit separation of inversion and learning.
Fast tomographic reconstruction from limited data using artiﬁcial neural networks
Pelt and Batenburg, IEEE TIP 2013

Learned iterative reconstruction
Problem: Data g ∈ Y , reconstruction f ∈ X
How to include data in each iteration?
Inspiration from iterative optimization methods
f = arg min || T (f ) − g||2
Y
Algorithm 1 Generic gradient based optimization algorithm
1: for i = 1, . . . do
2: fi+1 ← BetterGuess fi , [∂ T (fi )]∗
(T (f ) − g)
Gradient descent:
BetterGuess fi , [∂ T (fi )]∗
(T (fi ) − g) = fi − α[∂ T (fi )]∗
(T (fi ) − g)

Learned gradient descent
Set a stopping criteria (ﬁxed number of steps)
Learn the function BetterGuess
Algorithm 2 Learned gradient descent
1: for i = 1, . . . , I do
2: f i+1
← ΛΘ fi , [∂ T (fi )]∗
(T (fi ) − g)
3: T †
Θ(g) ← fI
We separate problem dependent (and possibly global) into [∂ T (fi )]∗
(T (fi ) − g), and
local into ΛΘ!

Structure of updating operators
What is a natural structure of ΛΘ?
Requirements:
Fast to compute
Span a rich set of functions
Easy to evaluate [∂ΘΛΘ]∗
Standard answer:
ΛΘ = WΘ1 ◦ ρ ◦ WΘ2 ◦ ρ · · · ◦ ρ ◦ WΘn
ρ pointwise nonlinearity
WΘi
aﬃne operator

Structure of updating operators
What class of aﬃne operators should we use?
Some options:
Any aﬃne operator X → X: ”Fully connected layer”
Any translational invariant operator X → X: ”Convolutional layer”
Fully connected layers are ”stronger” but require far more parameters.
In several (CT, MRI) applications, [∂ T (fi )]∗
◦ T is (approximately) translation
invariant!

Further improvements
Some observations:
Can extend the learning to the dual space - allows exploiting symmetries in both
reconstruction and data
No need to have same update in each iterate - let it vary
Allow ”memory” by letting f ∈ Xn
Elaluating the forward operator is expensive - Learn where to evaluate

Learned Primal-Dual algorithm
Algorithm 3 Learned Primal-Dual
1: for i = 1, . . . , I do
2: hi ← ΓΘd
i
hi−1, K(f
(2)
i−1), g
3: fi ← ΛΘp
i
fi−1, [∂ K(f
(1)
i−1)]∗
(h
(1)
i )
4: T †
Θ(g) ← f
(1)
I
Learning:
L(Θ) = Eµ T †
Θ(g) − f
2
X
.

References:
ADMM-Net: A Deep Learning Approach for Compressive Sensing MRI
Yang et. al. NIPS 2016
Recurrent inference machines for solving inverse problems
Putzky and Welling, arXiv 2017
Learning a Variational Network for Reconstruction of Accelerated MRI Data
Hammernick et. al., arXiv 2017
Solving ill-posed inverse problems using iterative deep neural networks
Adler and ¨Oktem, Inverse Problems 2017
Learned Primal-Dual Reconstruction
Adler and ¨Oktem, arXiv 2017

Results
ellipses
Results for ray transform inversion in 2D.
Inverse problem:
g = P(f ) + δg
Geometry: Parallel beam, sparse view (30 angles)
Noise: 5% additive Gaussian
Data: 128 × 128 pixel ellipses
Compare to:
FBP
Total Variation
Post-processing deep learning by U-Net

Phantom Learned post-processing

Results
Quantitative
Method PSNR (dB) SSIM Runtime (ms) Parameters
FBP 19.75 0.597 4 1
TV 28.06 0.928 5 166 1
Learned U-Net 29.20 0.943 9 107
Learned Primal-Dual 38.28 0.988 49 2.4 · 105

Comments
Very large quantitative improvement =⇒ PSNR not good metric
Noticable visual improvement
Speedup enables clinical implementation
We are remarkably close to the theoretical optimum

Results
human data
Inverse problem:
g = P(f ) + δg
Geometry: fan beam 1000 angles
Noise: Poisson noise (low dose CT)
Data: 512 × 512 pixel human data

Results
human data
Compare to:
FBP
Total Variation
Post-processing deep learning by U-Net

Results
Quantitative
Method PSNR (dB) SSIM Runtime (ms) Parameters
FBP 33.65 0.829 423 1
TV 37.48 0.946 64 371 1
Learned U-Net 41.92 0.941 463 107
Learned Primal-Dual 44.11 0.969 620 2.4 · 105

Future work
Learned iterative reconstruction requires diﬀerentiating the whole solver =⇒
computationally prohibitive
Possible solution: Learn one iterate at a time (Gradient Boosting)
Model based learning for accelerated, limited-view 3D photoacoustic tomography
Hauptmann et. al. arXiv 2017
The L2 loss is sub-optimal
Possible solution: Perceptual losses
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Johnson et. al. ECCV 2016
Possible solution: Adversarial losses
Deep Generative Adversarial Networks for Compressed Sensing Automates MRI
Mardani et. al. arXiv 2017

Future work
Bias - Variance trade-oﬀ. What is the best network size?
How do we acquire good data?
Clinical validation!!

Conclusions
Machine learning allows us to handle complicated priors
Fully learned reconstruction is in-feasible
Learned post-processing gives good results
Learned iterative reconstruction gives better results,
but is computationally challenging
Related articles:
”Solving ill-posed inverse problems using iterative deep neural networks”
”Learned Primal-Dual Reconstruction”
Source:
github.com/adler-j
github.com/odlgroup/odl
Contact:
jonasadl@kth.se

Learning to Reconstruct

More Related Content

What's hot

Similar to Learning to Reconstruct

Recently uploaded

Learning to Reconstruct