Learning to reconstruct
Jonas Adler
Department of Mathematics
KTH - Royal Institute of Technology
Research Scientist
Research and Physics Group
Elekta
In the news
In the news
A glossary
Artificial Intelligence: Any technique
that enables computers to mimic
human intelligence
Machine Learning: A subset of AI
using statistical techniques to enable
machines to learn from data
Deep Learning: A subset of ML
where complicated tasks are
performed by breaking them down into
several layers
Why do we need machine learning?
Task: Identify a rabbit in an image
Why do we need machine learning?
Task: Identify a rabbit in an image
Proposed solution: If the animal is within this range of colors and has long ears and fur
and has a slightly elliptical shape and has a nose like... then it is a rabbit
Why do we need machine learning?
Task: Identify a rabbit in an image
Proposed solution: If the animal is within this range of colors and has long ears and fur
and has a slightly elliptical shape and has a nose like... then it is a rabbit
Machine Learning
Supervised learning: Inferring a function from training data.
Example: Image classification, Translation, Caption generation
Reinforcement Learning: Learning the best ”action” by experimentation.
Example: Playing Go
Unsupervised learning: Describing the hidden structure in training data.
Example: PCA, Clustering, Density estimation, Generative models
Supervised learning
We want to approximate an (unknown) operator A: X → Y
We are given training data (f, g) which is a X × Y valued random variable
such that A(f) ≈ g.
Finite data: (f, g) uniform distribution on {(fi , gi )}i
We give a class of operators AΘ : X → Y
Parametrized by Θ which we learn
Selected by optimization of a loss function L(Θ)
Θ∗
= arg min
Θ
L(Θ)
mean L2 sup L2 mean cross-entropy
E(f,g) AΘ(f) − g 2
2 sup
(f,g)
AΘ(f) − g 2
2 E(f,g) − AΘ(f) log g
Supervised learning
We want to approximate an (unknown) operator A: X → Y
We are given training data (f, g) which is a X × Y valued random variable
such that A(f) ≈ g.
Finite data: (f, g) uniform distribution on {(fi , gi )}i
We give a class of operators AΘ : X → Y
Parametrized by Θ which we learn
Selected by optimization of a loss function L(Θ)
Θ∗
= arg min
Θ
L(Θ)
mean L2 sup L2 mean cross-entropy
E(f,g) AΘ(f) − g 2
2 sup
(f,g)
AΘ(f) − g 2
2 E(f,g) − AΘ(f) log g
Inverse problems
What is an inverse problem?
g = T (ftrue) + δg.
g ∈ Y tomographic data (sinogram)
T : X → Y forward operator
ftrue ∈ X image we want to recover
δg ∈ Y noise
Solution methods
Analytic pseudoinverse (FBP, FDK)
f = T †
(g)
Iterative methods (ART, SART)
fi+1 = fi − ω T ∗
(T (fi ) − g)
Variational methods (TV, TGV, Huber)
f = arg min
f
|| T (f ) − g||2
Y + λ|| f ||1
Variational methods
Strategy, solve an optimization problem:
f = arg min
f
|| T (f ) − g||2
Y + λ|| f ||1
Equivalent to Maximum a Posteriori (MAP)
f = arg max
f
P(f | g)
= arg max
f
P(g|f )
e
−|| T (f )−g||2
Y
P(f )
e−λ|| f ||1
Variational methods
Several issues:
Prior is typically unknown - have to ”guess”
Parameters (λ) need to be selected
Large computational burden
Variational methods
Several issues:
Prior is typically unknown - have to ”guess”
Parameters (λ) need to be selected
Large computational burden
Solution methods
Analytic pseudoinverse (FBP, FDK)
f = T †
(g)
Iterative methods (ART, SART)
fi+1 = fi − ω T ∗
(T (fi ) − g)
Variational methods (TV, TGV, Huber)
f = arg min
f
|| T (f ) − g||2
Y + λ|| f ||1
Machine learning
f = T †
Θ(g)
Learned priors
Idea: Instead of selecting the prior, learn it!
Parameter selection
Bilevel parameter learning for higher-order total variation regularisation models
De Los Reyes et. al. J Math Imaging Vis 2017
Dictionary learning
Low-dose x-ray ct reconstruction via dictionary learning
Xu et. al. IEEE TMI 2012
Learned proximal
Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging
Problems
Meinhardt et. al. ICCV 2017
Scattering transform
Inverse problems with invariant multiscale statistics
Dokmanic et. al. CoRR 2016
Problem: Does not solve the computational burden!
Learned solvers
Idea: Learn an optimization solver
Learned ISTA
Learning Fast Approximations of Sparse Coding
Gregor and LeCun, ICML 2010
Problem: Does not solve the prior problem!
Observation: We need to learn both the prior and how to reconstruct!
Machine Learning in imaging
End goal, reconstruction operator T †
Θ : Y → X:
g = T (ftrue) + δg =⇒ T †
Θ(g) ≈ ftrue
Parametrized by Θ which we learn
Selected by optimization of a loss function L(Θ)
Example: L2 loss
Example: L2-loss of reconstruction
L(Θ) = E(f,g)∼µ
1
2
T †
Θ(g) − f
2
X
.
(g, f)-µ distributed training data (e.g. high dose CTs, simulation)
”Learning” by Stochastic Gradient Descent (SGD):
Θi+1 = Θi − α L(Θi )
With the above loss:
L(Θ) = E(f,g)∼µ [∂Θ T †
Θ(g)]∗
T †
Θ(g) − f
Observation: The reconstruction operator must be differentiable w.r.t Θ!
Learned inversion methods
Fully learned
Learned post-processing
Learned iterative schemes
Fully learned reconstruction
Goal: Learn ”the whole” mapping from data to signal
Tomographic image reconstruction based on artificial neural network (ANN)
techniques
Argyrou et. al. NSS/MIC 2012
Tomographic image reconstruction using artificial neural networks.
Paschalis et. al. Nucl Instrum Methods Phys Res A 2004
Problem: T typically has symmetries, but the network has to learn them.
Example: 3d CBCT, data: 332 × 720 × 780 (pixels) and 448 × 448 × 448 voxels.
332 × 720 × 780 × 448 × 448 × 448 ≈ 1016
connections!
Fully learned reconstruction
Goal: Learn ”the whole” mapping from data to signal
Tomographic image reconstruction based on artificial neural network (ANN)
techniques
Argyrou et. al. NSS/MIC 2012
Tomographic image reconstruction using artificial neural networks.
Paschalis et. al. Nucl Instrum Methods Phys Res A 2004
Problem: T typically has symmetries, but the network has to learn them.
Example: 3d CBCT, data: 332 × 720 × 780 (pixels) and 448 × 448 × 448 voxels.
332 × 720 × 780 × 448 × 448 × 448 ≈ 1016
connections!
Learned inversion methods
Fully learned
Learned post-processing
Learned iterative schemes
Learning a post-processing
Goal: Use deep learning to improve the result of another reconstruction
Structure:
T †
Θ(g) = ΛΘ T †
(g)
where T †
is some reconstruction (FBP, TV, . . . ) and ΛΘ is a learned post-processing
operator.
Allows separation of inversion and learning, data can be seen as (T †
(g), ftrue).
The problem becomes an image processing problem =⇒ easy to solve.
In wavelet space
Can also add more structure (e.g. denoise in transform domain)
ΛΘ = W −1
◦ ˆΛΘ ◦ W
W is some transform (Fourier, Wavelet, Shearlet, etc)
Won AAPM Low-Dose CT Grand Challenge:
A deep convolutional neural network using directional wavelets for low-dose X-ray CT
reconstruction
Kang et. al. 2016
Both pre-and post-processing
Can also do both pre-processing (of data) and post processing (of the reconstruction)
T †
Θ(g) = ΛΘ T †
ΓΘ(g)
Does not admit separation of inversion and learning.
Fast tomographic reconstruction from limited data using artificial neural networks
Pelt and Batenburg, IEEE TIP 2013
Learned inversion methods
Fully learned
Learned post-processing
Learned iterative schemes
Learned iterative reconstruction
Problem: Data g ∈ Y , reconstruction f ∈ X
How to include data in each iteration?
Inspiration from iterative optimization methods
f = arg min || T (f ) − g||2
Y
Algorithm 1 Generic gradient based optimization algorithm
1: for i = 1, . . . do
2: fi+1 ← BetterGuess fi , [∂ T (fi )]∗
(T (f ) − g)
Gradient descent:
BetterGuess fi , [∂ T (fi )]∗
(T (fi ) − g) = fi − α[∂ T (fi )]∗
(T (fi ) − g)
Learned gradient descent
Set a stopping criteria (fixed number of steps)
Learn the function BetterGuess
Algorithm 2 Learned gradient descent
1: for i = 1, . . . , I do
2: f i+1
← ΛΘ fi , [∂ T (fi )]∗
(T (fi ) − g)
3: T †
Θ(g) ← fI
We separate problem dependent (and possibly global) into [∂ T (fi )]∗
(T (fi ) − g), and
local into ΛΘ!
Structure of updating operators
What is a natural structure of ΛΘ?
Requirements:
Fast to compute
Span a rich set of functions
Easy to evaluate [∂ΘΛΘ]∗
Standard answer:
ΛΘ = WΘ1 ◦ ρ ◦ WΘ2 ◦ ρ · · · ◦ ρ ◦ WΘn
ρ pointwise nonlinearity
WΘi
affine operator
Structure of updating operators
What class of affine operators should we use?
Some options:
Any affine operator X → X: ”Fully connected layer”
Any translational invariant operator X → X: ”Convolutional layer”
Fully connected layers are ”stronger” but require far more parameters.
In several (CT, MRI) applications, [∂ T (fi )]∗
◦ T is (approximately) translation
invariant!
Further improvements
Some observations:
Can extend the learning to the dual space - allows exploiting symmetries in both
reconstruction and data
No need to have same update in each iterate - let it vary
Allow ”memory” by letting f ∈ Xn
Elaluating the forward operator is expensive - Learn where to evaluate
Learned Primal-Dual algorithm
Algorithm 3 Learned Primal-Dual
1: for i = 1, . . . , I do
2: hi ← ΓΘd
i
hi−1, K(f
(2)
i−1), g
3: fi ← ΛΘp
i
fi−1, [∂ K(f
(1)
i−1)]∗
(h
(1)
i )
4: T †
Θ(g) ← f
(1)
I
Learning:
L(Θ) = Eµ T †
Θ(g) − f
2
X
.
References:
ADMM-Net: A Deep Learning Approach for Compressive Sensing MRI
Yang et. al. NIPS 2016
Recurrent inference machines for solving inverse problems
Putzky and Welling, arXiv 2017
Learning a Variational Network for Reconstruction of Accelerated MRI Data
Hammernick et. al., arXiv 2017
Solving ill-posed inverse problems using iterative deep neural networks
Adler and ¨Oktem, Inverse Problems 2017
Learned Primal-Dual Reconstruction
Adler and ¨Oktem, arXiv 2017
Results
ellipses
Results for ray transform inversion in 2D.
Inverse problem:
g = P(f ) + δg
Geometry: Parallel beam, sparse view (30 angles)
Noise: 5% additive Gaussian
Data: 128 × 128 pixel ellipses
Compare to:
FBP
Total Variation
Post-processing deep learning by U-Net
Phantom Training Phantom
Phantom FBP
Phantom TV
Phantom Learned post-processing
Phantom Learned Primal-Dual
Results
Quantitative
Method PSNR (dB) SSIM Runtime (ms) Parameters
FBP 19.75 0.597 4 1
TV 28.06 0.928 5 166 1
Learned U-Net 29.20 0.943 9 107
Learned Primal-Dual 38.28 0.988 49 2.4 · 105
Comments
Very large quantitative improvement =⇒ PSNR not good metric
Noticable visual improvement
Speedup enables clinical implementation
We are remarkably close to the theoretical optimum
Results
human data
Results for ray transform inversion in 2D.
Inverse problem:
g = P(f ) + δg
Geometry: fan beam 1000 angles
Noise: Poisson noise (low dose CT)
Data: 512 × 512 pixel human data
Dual 0 Primal 0
Dual 2 Primal 2
Dual 4 Primal 4
Dual 6 Primal 6
Dual 8 Primal 8
Dual 10 Primal 10
Results
human data
Results for ray transform inversion in 2D.
Compare to:
FBP
Total Variation
Post-processing deep learning by U-Net
Phantom FBP
Phantom TV
Phantom Learned U-Net
Phantom Learned Primal-Dual
Results
Quantitative
Method PSNR (dB) SSIM Runtime (ms) Parameters
FBP 33.65 0.829 423 1
TV 37.48 0.946 64 371 1
Learned U-Net 41.92 0.941 463 107
Learned Primal-Dual 44.11 0.969 620 2.4 · 105
Future work
Learned iterative reconstruction requires differentiating the whole solver =⇒
computationally prohibitive
Possible solution: Learn one iterate at a time (Gradient Boosting)
Model based learning for accelerated, limited-view 3D photoacoustic tomography
Hauptmann et. al. arXiv 2017
The L2 loss is sub-optimal
Possible solution: Perceptual losses
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Johnson et. al. ECCV 2016
Possible solution: Adversarial losses
Deep Generative Adversarial Networks for Compressed Sensing Automates MRI
Mardani et. al. arXiv 2017
Future work
Bias - Variance trade-off. What is the best network size?
How do we acquire good data?
Clinical validation!!
Conclusions
Machine learning allows us to handle complicated priors
Fully learned reconstruction is in-feasible
Learned post-processing gives good results
Learned iterative reconstruction gives better results,
but is computationally challenging
Related articles:
”Solving ill-posed inverse problems using iterative deep neural networks”
”Learned Primal-Dual Reconstruction”
Source:
github.com/adler-j
github.com/odlgroup/odl
Contact:
jonasadl@kth.se

Learning to Reconstruct

  • 1.
    Learning to reconstruct JonasAdler Department of Mathematics KTH - Royal Institute of Technology Research Scientist Research and Physics Group Elekta
  • 2.
  • 3.
  • 4.
    A glossary Artificial Intelligence:Any technique that enables computers to mimic human intelligence Machine Learning: A subset of AI using statistical techniques to enable machines to learn from data Deep Learning: A subset of ML where complicated tasks are performed by breaking them down into several layers
  • 5.
    Why do weneed machine learning? Task: Identify a rabbit in an image
  • 6.
    Why do weneed machine learning? Task: Identify a rabbit in an image Proposed solution: If the animal is within this range of colors and has long ears and fur and has a slightly elliptical shape and has a nose like... then it is a rabbit
  • 7.
    Why do weneed machine learning? Task: Identify a rabbit in an image Proposed solution: If the animal is within this range of colors and has long ears and fur and has a slightly elliptical shape and has a nose like... then it is a rabbit
  • 8.
    Machine Learning Supervised learning:Inferring a function from training data. Example: Image classification, Translation, Caption generation Reinforcement Learning: Learning the best ”action” by experimentation. Example: Playing Go Unsupervised learning: Describing the hidden structure in training data. Example: PCA, Clustering, Density estimation, Generative models
  • 9.
    Supervised learning We wantto approximate an (unknown) operator A: X → Y We are given training data (f, g) which is a X × Y valued random variable such that A(f) ≈ g. Finite data: (f, g) uniform distribution on {(fi , gi )}i We give a class of operators AΘ : X → Y Parametrized by Θ which we learn Selected by optimization of a loss function L(Θ) Θ∗ = arg min Θ L(Θ) mean L2 sup L2 mean cross-entropy E(f,g) AΘ(f) − g 2 2 sup (f,g) AΘ(f) − g 2 2 E(f,g) − AΘ(f) log g
  • 10.
    Supervised learning We wantto approximate an (unknown) operator A: X → Y We are given training data (f, g) which is a X × Y valued random variable such that A(f) ≈ g. Finite data: (f, g) uniform distribution on {(fi , gi )}i We give a class of operators AΘ : X → Y Parametrized by Θ which we learn Selected by optimization of a loss function L(Θ) Θ∗ = arg min Θ L(Θ) mean L2 sup L2 mean cross-entropy E(f,g) AΘ(f) − g 2 2 sup (f,g) AΘ(f) − g 2 2 E(f,g) − AΘ(f) log g
  • 11.
    Inverse problems What isan inverse problem? g = T (ftrue) + δg. g ∈ Y tomographic data (sinogram) T : X → Y forward operator ftrue ∈ X image we want to recover δg ∈ Y noise
  • 12.
    Solution methods Analytic pseudoinverse(FBP, FDK) f = T † (g) Iterative methods (ART, SART) fi+1 = fi − ω T ∗ (T (fi ) − g) Variational methods (TV, TGV, Huber) f = arg min f || T (f ) − g||2 Y + λ|| f ||1
  • 13.
    Variational methods Strategy, solvean optimization problem: f = arg min f || T (f ) − g||2 Y + λ|| f ||1 Equivalent to Maximum a Posteriori (MAP) f = arg max f P(f | g) = arg max f P(g|f ) e −|| T (f )−g||2 Y P(f ) e−λ|| f ||1
  • 14.
    Variational methods Several issues: Prioris typically unknown - have to ”guess” Parameters (λ) need to be selected Large computational burden
  • 15.
    Variational methods Several issues: Prioris typically unknown - have to ”guess” Parameters (λ) need to be selected Large computational burden
  • 16.
    Solution methods Analytic pseudoinverse(FBP, FDK) f = T † (g) Iterative methods (ART, SART) fi+1 = fi − ω T ∗ (T (fi ) − g) Variational methods (TV, TGV, Huber) f = arg min f || T (f ) − g||2 Y + λ|| f ||1 Machine learning f = T † Θ(g)
  • 17.
    Learned priors Idea: Insteadof selecting the prior, learn it! Parameter selection Bilevel parameter learning for higher-order total variation regularisation models De Los Reyes et. al. J Math Imaging Vis 2017 Dictionary learning Low-dose x-ray ct reconstruction via dictionary learning Xu et. al. IEEE TMI 2012 Learned proximal Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems Meinhardt et. al. ICCV 2017 Scattering transform Inverse problems with invariant multiscale statistics Dokmanic et. al. CoRR 2016 Problem: Does not solve the computational burden!
  • 18.
    Learned solvers Idea: Learnan optimization solver Learned ISTA Learning Fast Approximations of Sparse Coding Gregor and LeCun, ICML 2010 Problem: Does not solve the prior problem! Observation: We need to learn both the prior and how to reconstruct!
  • 19.
    Machine Learning inimaging End goal, reconstruction operator T † Θ : Y → X: g = T (ftrue) + δg =⇒ T † Θ(g) ≈ ftrue Parametrized by Θ which we learn Selected by optimization of a loss function L(Θ)
  • 20.
    Example: L2 loss Example:L2-loss of reconstruction L(Θ) = E(f,g)∼µ 1 2 T † Θ(g) − f 2 X . (g, f)-µ distributed training data (e.g. high dose CTs, simulation) ”Learning” by Stochastic Gradient Descent (SGD): Θi+1 = Θi − α L(Θi ) With the above loss: L(Θ) = E(f,g)∼µ [∂Θ T † Θ(g)]∗ T † Θ(g) − f Observation: The reconstruction operator must be differentiable w.r.t Θ!
  • 21.
    Learned inversion methods Fullylearned Learned post-processing Learned iterative schemes
  • 22.
    Fully learned reconstruction Goal:Learn ”the whole” mapping from data to signal Tomographic image reconstruction based on artificial neural network (ANN) techniques Argyrou et. al. NSS/MIC 2012 Tomographic image reconstruction using artificial neural networks. Paschalis et. al. Nucl Instrum Methods Phys Res A 2004 Problem: T typically has symmetries, but the network has to learn them. Example: 3d CBCT, data: 332 × 720 × 780 (pixels) and 448 × 448 × 448 voxels. 332 × 720 × 780 × 448 × 448 × 448 ≈ 1016 connections!
  • 23.
    Fully learned reconstruction Goal:Learn ”the whole” mapping from data to signal Tomographic image reconstruction based on artificial neural network (ANN) techniques Argyrou et. al. NSS/MIC 2012 Tomographic image reconstruction using artificial neural networks. Paschalis et. al. Nucl Instrum Methods Phys Res A 2004 Problem: T typically has symmetries, but the network has to learn them. Example: 3d CBCT, data: 332 × 720 × 780 (pixels) and 448 × 448 × 448 voxels. 332 × 720 × 780 × 448 × 448 × 448 ≈ 1016 connections!
  • 24.
    Learned inversion methods Fullylearned Learned post-processing Learned iterative schemes
  • 25.
    Learning a post-processing Goal:Use deep learning to improve the result of another reconstruction Structure: T † Θ(g) = ΛΘ T † (g) where T † is some reconstruction (FBP, TV, . . . ) and ΛΘ is a learned post-processing operator. Allows separation of inversion and learning, data can be seen as (T † (g), ftrue). The problem becomes an image processing problem =⇒ easy to solve.
  • 26.
    In wavelet space Canalso add more structure (e.g. denoise in transform domain) ΛΘ = W −1 ◦ ˆΛΘ ◦ W W is some transform (Fourier, Wavelet, Shearlet, etc) Won AAPM Low-Dose CT Grand Challenge: A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction Kang et. al. 2016
  • 27.
    Both pre-and post-processing Canalso do both pre-processing (of data) and post processing (of the reconstruction) T † Θ(g) = ΛΘ T † ΓΘ(g) Does not admit separation of inversion and learning. Fast tomographic reconstruction from limited data using artificial neural networks Pelt and Batenburg, IEEE TIP 2013
  • 28.
    Learned inversion methods Fullylearned Learned post-processing Learned iterative schemes
  • 29.
    Learned iterative reconstruction Problem:Data g ∈ Y , reconstruction f ∈ X How to include data in each iteration? Inspiration from iterative optimization methods f = arg min || T (f ) − g||2 Y Algorithm 1 Generic gradient based optimization algorithm 1: for i = 1, . . . do 2: fi+1 ← BetterGuess fi , [∂ T (fi )]∗ (T (f ) − g) Gradient descent: BetterGuess fi , [∂ T (fi )]∗ (T (fi ) − g) = fi − α[∂ T (fi )]∗ (T (fi ) − g)
  • 30.
    Learned gradient descent Seta stopping criteria (fixed number of steps) Learn the function BetterGuess Algorithm 2 Learned gradient descent 1: for i = 1, . . . , I do 2: f i+1 ← ΛΘ fi , [∂ T (fi )]∗ (T (fi ) − g) 3: T † Θ(g) ← fI We separate problem dependent (and possibly global) into [∂ T (fi )]∗ (T (fi ) − g), and local into ΛΘ!
  • 31.
    Structure of updatingoperators What is a natural structure of ΛΘ? Requirements: Fast to compute Span a rich set of functions Easy to evaluate [∂ΘΛΘ]∗ Standard answer: ΛΘ = WΘ1 ◦ ρ ◦ WΘ2 ◦ ρ · · · ◦ ρ ◦ WΘn ρ pointwise nonlinearity WΘi affine operator
  • 32.
    Structure of updatingoperators What class of affine operators should we use? Some options: Any affine operator X → X: ”Fully connected layer” Any translational invariant operator X → X: ”Convolutional layer” Fully connected layers are ”stronger” but require far more parameters. In several (CT, MRI) applications, [∂ T (fi )]∗ ◦ T is (approximately) translation invariant!
  • 33.
    Further improvements Some observations: Canextend the learning to the dual space - allows exploiting symmetries in both reconstruction and data No need to have same update in each iterate - let it vary Allow ”memory” by letting f ∈ Xn Elaluating the forward operator is expensive - Learn where to evaluate
  • 34.
    Learned Primal-Dual algorithm Algorithm3 Learned Primal-Dual 1: for i = 1, . . . , I do 2: hi ← ΓΘd i hi−1, K(f (2) i−1), g 3: fi ← ΛΘp i fi−1, [∂ K(f (1) i−1)]∗ (h (1) i ) 4: T † Θ(g) ← f (1) I Learning: L(Θ) = Eµ T † Θ(g) − f 2 X .
  • 35.
    References: ADMM-Net: A DeepLearning Approach for Compressive Sensing MRI Yang et. al. NIPS 2016 Recurrent inference machines for solving inverse problems Putzky and Welling, arXiv 2017 Learning a Variational Network for Reconstruction of Accelerated MRI Data Hammernick et. al., arXiv 2017 Solving ill-posed inverse problems using iterative deep neural networks Adler and ¨Oktem, Inverse Problems 2017 Learned Primal-Dual Reconstruction Adler and ¨Oktem, arXiv 2017
  • 36.
    Results ellipses Results for raytransform inversion in 2D. Inverse problem: g = P(f ) + δg Geometry: Parallel beam, sparse view (30 angles) Noise: 5% additive Gaussian Data: 128 × 128 pixel ellipses Compare to: FBP Total Variation Post-processing deep learning by U-Net
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
    Results Quantitative Method PSNR (dB)SSIM Runtime (ms) Parameters FBP 19.75 0.597 4 1 TV 28.06 0.928 5 166 1 Learned U-Net 29.20 0.943 9 107 Learned Primal-Dual 38.28 0.988 49 2.4 · 105
  • 43.
    Comments Very large quantitativeimprovement =⇒ PSNR not good metric Noticable visual improvement Speedup enables clinical implementation We are remarkably close to the theoretical optimum
  • 44.
    Results human data Results forray transform inversion in 2D. Inverse problem: g = P(f ) + δg Geometry: fan beam 1000 angles Noise: Poisson noise (low dose CT) Data: 512 × 512 pixel human data
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
    Results human data Results forray transform inversion in 2D. Compare to: FBP Total Variation Post-processing deep learning by U-Net
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
    Results Quantitative Method PSNR (dB)SSIM Runtime (ms) Parameters FBP 33.65 0.829 423 1 TV 37.48 0.946 64 371 1 Learned U-Net 41.92 0.941 463 107 Learned Primal-Dual 44.11 0.969 620 2.4 · 105
  • 57.
    Future work Learned iterativereconstruction requires differentiating the whole solver =⇒ computationally prohibitive Possible solution: Learn one iterate at a time (Gradient Boosting) Model based learning for accelerated, limited-view 3D photoacoustic tomography Hauptmann et. al. arXiv 2017 The L2 loss is sub-optimal Possible solution: Perceptual losses Perceptual Losses for Real-Time Style Transfer and Super-Resolution Johnson et. al. ECCV 2016 Possible solution: Adversarial losses Deep Generative Adversarial Networks for Compressed Sensing Automates MRI Mardani et. al. arXiv 2017
  • 58.
    Future work Bias -Variance trade-off. What is the best network size? How do we acquire good data? Clinical validation!!
  • 59.
    Conclusions Machine learning allowsus to handle complicated priors Fully learned reconstruction is in-feasible Learned post-processing gives good results Learned iterative reconstruction gives better results, but is computationally challenging Related articles: ”Solving ill-posed inverse problems using iterative deep neural networks” ”Learned Primal-Dual Reconstruction” Source: github.com/adler-j github.com/odlgroup/odl Contact: jonasadl@kth.se