Your SlideShare is downloading. ×
0
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

A hand-waving introduction to sparsity for compressed tomography reconstruction

1,810

Published on

An introduction to the basic concepts needed to understand sparse reconstruction in computed tomography.

An introduction to the basic concepts needed to understand sparse reconstruction in computed tomography.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,810
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
55
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A hand-waving introduction to sparsity for compressed tomography reconstruction Ga¨l Varoquaux and Emmanuelle Gouillart eDense slides. For future reference: http://www.slideshare.net/GaelVaroquaux
  • 2. 1 Sparsity for inverse problems 2 Mathematical formulation 3 Choice of a sparse representation 4 Optimization algorithmsG Varoquaux 2
  • 3. 1 Sparsity for inverse problems Problem setting IntuitionsG Varoquaux 3
  • 4. 1 Tomography reconstruction: a linear problem y = Ax 0 20 40 60 80 0 50 100 150 200 250 y ∈ Rn, A ∈ Rn×p, x ∈ Rp n ∝ number of projections p: number of pixels in reconstructed image We want to find x knowing A and yG Varoquaux 4
  • 5. 1 Small n: an ill-posed linear problem y = Ax admits multiple solutions The sensing operator A has a large null space: images that give null projections In particular it is blind to high spatial frequencies:G Varoquaux 5
  • 6. 1 Small n: an ill-posed linear problem y = Ax admits multiple solutions The sensing operator A has a large null space: images that give null projections In particular it is blind to high spatial frequencies: Large number of projections Ill-conditioned problem: “short-sighted” rather than blind, ⇒ captures noise on those componentsG Varoquaux 5
  • 7. 1 A toy example: spectral analysis Recovering the frequency spectrum Signal Freq. spectrum signal = A · frequencies 0 2 4 6 8 ... 0 10 20 30 40 50 60G Varoquaux 6
  • 8. 1 A toy example: spectral analysis Sub-sampling Signal Freq. spectrum signal = A · frequencies 0 2 4 6 8 ... 0 10 20 30 40 50 60 Recovery problem becomes ill-posedG Varoquaux 6
  • 9. 1 A toy example: spectral analysis Problem: aliasing Signal Freq. spectrum Information in the null-space of A is lostG Varoquaux 7
  • 10. 1 A toy example: spectral analysis Problem: aliasing Signal Freq. spectrum Solution: incoherent measurements Signal Freq. spectrum i.e. careful choice of null-space of AG Varoquaux 7
  • 11. 1 A toy example: spectral analysis Incoherent measurements, but scarcity of data Signal Freq. spectrum The null-space of A is spread out in frequency Not much data ⇒ large null-space = captures “noise”G Varoquaux 8
  • 12. 1 A toy example: spectral analysis Incoherent measurements, but scarcity of data Signal Freq. spectrum The null-space of A is spread out in frequency Not much data ⇒ large null-space Sparse Freq. spectrum = captures “noise” Impose sparsity Find a small number of frequencies to explain the signalG Varoquaux 8
  • 13. 1 And for tomography reconstruction? Original image Non-sparse reconstruction Sparse reconstruction 128 × 128 pixels, 18 projections http://scikit-learn.org/stable/auto_examples/applications/ plot_tomography_l1_reconstruction.htmlG Varoquaux 9
  • 14. 1 Why does it work: a geometric explanation Two coefficients of x not in the null-space of A: x2 y= Ax xtrue x1 The sparsest solution is in the blue cross It corresponds to the true solution (xtrue ) if the slope is > 45◦G Varoquaux 10
  • 15. 1 Why does it work: a geometric explanation Two coefficients of x not in the null-space of A: x2 y= Ax xtrue x1 The sparsest solution is in the blue cross It corresponds to the true solution (xtrue ) if the slope is > 45◦ The cross can be replaced by its convex hullG Varoquaux 10
  • 16. 1 Why does it work: a geometric explanation Two coefficients of x not in the null-space of A: x2 y= Ax xtrue x1 The sparsest solution is in the blue cross It corresponds to the true solution (xtrue ) if the slope is > 45◦ In high dimension: large acceptable setG Varoquaux 10
  • 17. Recovery of sparse signal Null space of sensing operator incoherent with sparse representation ⇒ Excellent sparse recovery with little projections Minimum number of observations necessary: nmin ∼ k log p, with k: number of non zeros [Candes 2006] Rmk Theory for i.i.d. samples Related to “compressive sensing”G Varoquaux 11
  • 18. 2 Mathematical formulation Variational formulation Introduction of noiseG Varoquaux 12
  • 19. 2 Maximizing the sparsity 0 number of non-zeros min 0(x) x s.t. y = A x x2 y= Ax xtrue x1 “Matching pursuit” problem [Mallat, Zhang 1993] “Orthogonal matching pursuit” [Pati, et al 1993] Problem: Non-convex optimizationG Varoquaux 13
  • 20. 2 Maximizing the sparsity 1 (x) = i |xi | min 1(x) x s.t. y = A x x2 y= Ax xtrue x1 “Basis pursuit” [Chen, Donoho, Saunders 1998]G Varoquaux 13
  • 21. 2 Modeling observation noise y = Ax + e e = observation noise New formulation: 2 min 1 (x) x s.t. y = A x y − Ax 2 ≤ ε2 Equivalent: “Lasso estimator” [Tibshirani 1996] 2 min y − Ax x 2 + λ 1(x)G Varoquaux 14
  • 22. 2 Modeling observation noise y = Ax + e e = observation noise New formulation: 2 min 1 (x) x s.t. y = A x y − Ax 2 ≤ ε2 Equivalent: “Lasso estimator” [Tibshirani 1996] 2 x2 min y − Ax x 2 + λ 1(x) Data fit Penalization x1 Rmk: kink in the 1 ball creates sparsityG Varoquaux 14
  • 23. 2 Probabilistic modeling: Bayesian interpretation P(x|y) ∝ P(y|x) P(x) ( ) “Posterior” Forward model “Prior” Quantity of interest Expectations on x Forward model: y = A x + e, e: Gaussian noise ⇒ P(y|x) ∝ exp − 2 1 2 y − A x 2 σ 2 1 Prior: Laplacian P(x) ∝ exp − µ x 1 1 2 1 Negated log of ( ): 2σ 2 y − Ax 2 + µ 1 (x) Maximum of posterior is Lasso estimate Note that this picture is limited and the Lasso is not a goodG Varoquaux Bayesian estimator for the Laplace prior [Gribonval 2011]. 15
  • 24. 3 Choice of a sparse representation Sparse in wavelet domain Total variationG Varoquaux 16
  • 25. 3 Sparsity in wavelet representation Typical images Haar decomposition Level 1 Level 2 Level 3 are not sparse Level 4 Level 5 Level 6 ⇒ Impose sparsity in Haar representation A → A H where H is the Haar transformG Varoquaux 17
  • 26. 3 Sparsity in wavelet representation Original image Non-sparse reconstruction Typical images Haar decomposition Level 1 Level 2 Level 3 are not sparse Level 4 Level 5 Level 6 Sparse image Sparse in Haar ⇒ Impose sparsity in Haar representation A → A H where H is the Haar transformG Varoquaux 17
  • 27. 3 Total variation Impose a sparse gradient 2 min y − Ax x 2 +λ ( x)i 2 i 12 norm: 1 norm of the gradient magnitude Sets x and y to zero jointly Original image Haar wavelet TV penalizationG Varoquaux 18
  • 28. 3 Total variation Impose a sparse gradient 2 min y − Ax x 2 +λ ( x)i 2 i 12 norm: 1 norm of the gradient magnitude Sets x and y to zero jointly Original image Error for Haar wavelet Error for TV penalizationG Varoquaux 18
  • 29. 3 Total variation + interval Bound x in [0, 1] 2 min y−Ax x 2+λ ( x)i 2 +I([0, 1]) i Original image TV penalization TV + intervalG Varoquaux 19
  • 30. 3 Total variation + interval Bound x in [0, 1] 2 min y−Ax x 2+λ ( x)i 2 +I([0, 1]) i TV Rmk: Constraint Histograms: TV + interval does more than fold- ing values outside of the range back in. 0.0 0.5 1.0 Original image TV penalization TV + intervalG Varoquaux 19
  • 31. 3 Total variation + interval Bound x in [0, 1] 2 min y−Ax x 2+λ ( x)i 2 +I([0, 1]) i TV Rmk: Constraint Histograms: TV + interval does more than fold- ing values outside of the range back in. 0.0 0.5 1.0 Original image Error for TV penalization Error for TV + intervalG Varoquaux 19
  • 32. Analysis vs synthesis 2 Wavelet basis min y − A H x 2 + x 1 H Wavelet transform Total variation min y − A x 2 + Dx 2 1 D Spatial derivation operator ( )G Varoquaux 20
  • 33. Analysis vs synthesis Wavelet basis min y − A H x 2 + x 2 1 H Wavelet transform “synthesis” formulation Total variation min y − A x 2 + Dx 2 1 D Spatial derivation operator ( ) “analysis” formulation Theory and algorithms easier for synthesis Equivalence iif D is invertibleG Varoquaux 20
  • 34. 4 Optimization algorithms Non-smooth optimization ⇒ “proximal operators”G Varoquaux 21
  • 35. 4 Smooth optimization fails! Gradient descent Gradient descent x2Energy Iterations x1 Smooth optimization fails in non-smooth regions These are specifically the spots that interest usG Varoquaux 22
  • 36. 4 Iterative Shrinkage-Thresholding Algorithm Settings: min f + g; f smooth, g non-smooth f and g convex, f L-Lipschitz Typically f is the data fit term, and g the penalty 1 2 1 ex: Lasso 2σ 2 y − Ax 2 + µ 1 (x)G Varoquaux 23
  • 37. 4 Iterative Shrinkage-Thresholding Algorithm Settings: min f + g; f smooth, g non-smooth f and g convex, f L-Lipschitz Minimize successively: (quadratic approx of f ) + g f (x) < f (y) + x − y, f (y) 2 +L x − y 2 2 Proof: by convexity f (y) ≤ f (x) + f (y) (y − x) in the second term: f (y) → f (x) + ( f (y) − f (x)) upper bound last term with Lipschitz continuity of f   L 1 2 xk+1 = argmin g(x) + x − xk − f (xk ) 2  x 2 L [Daubechies 2004]G Varoquaux 23
  • 38. 4 Iterative Shrinkage-Thresholding Algorithm Step 1: Gradient descentsmooth, g non-smooth Settings: min f + g; f on f f and g convex, f L-Lipschitz Step 2: Proximal operator of g: Minimize proxλg (x) def argmin y − x 2 + λ g(y) successively: = 2 y (quadratic approx of f ) + g Generalization of Euclidean projection f (x) < f (y) + x − y, f (y)1} on convex set {x, g(x) ≤ 2 Rmk: if g is the indicator function +L x − y 2 2 of a set S, the proximal operator is the Euclidean projection. Proof: by convexity f (y) ≤ f (x) + f (y) (y − x) prox in theλ 1 (x) = sign(x ) x − λ second term: f (y) → f (x) + ( i f (y) − f (x)) i + upper bound last term with Lipschitz continuity of f “soft thresholding”   L 1 2 xk+1 = argmin g(x) + x − xk − f (xk ) 2  x 2 L [Daubechies 2004]G Varoquaux 23
  • 39. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Gradient descent stepG Varoquaux 24
  • 40. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Projection on 1 ballG Varoquaux 24
  • 41. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Gradient descent stepG Varoquaux 24
  • 42. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Projection on 1 ballG Varoquaux 24
  • 43. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Gradient descent stepG Varoquaux 24
  • 44. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Projection on 1 ballG Varoquaux 24
  • 45. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Gradient descent stepG Varoquaux 24
  • 46. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Projection on 1 ballG Varoquaux 24
  • 47. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1G Varoquaux 24
  • 48. 4 Fast Iterative Shrinkage-Thresholding Algorithm FISTA Gradient descent x2 ISTA FISTAEnergy Iterations x1 As with conjugate gradient: add a memory term ISTA tk −1 dxk+1 = dxk+1 + tk+1 (dxk − dxk−1 ) √ 1+ 2 1+4 tk t1 = 1, tk+1 = 2 ⇒ O(k −2 ) convergence [Beck Teboulle 2009]G Varoquaux 25
  • 49. 4 Proximal operator for total variation Reformulate to smooth + non-smooth with a simple projection step and use FISTA: [Chambolle 2004] 2 proxλTV x = argmin y − x 2 +λ ( x)i 2 x iG Varoquaux 26
  • 50. 4 Proximal operator for total variation Reformulate to smooth + non-smooth with a simple projection step and use FISTA: [Chambolle 2004] 2 proxλTV x = argmin y − x 2 +λ ( x)i 2 x i 2 = argmax λ div z + y 2 z, z ∞ ≤1 Proof: “dual norm”: v 1 = max v, z z ∞ ≤1 div is the adjoint of : v, z = v, −div z Swap min and max and solve for x Duality: [Boyd 2004] This proof: [Michel 2011]G Varoquaux 26
  • 51. Sparsity for compressed tomography reconstruction @GaelVaroquaux 27
  • 52. Sparsity for compressed tomography reconstruction Add penalizations with kinks x2 Choice of prior/sparse representation Non-smooth optimization (FISTA) x1Further discussion: choice of prior/parameters Minimize reconstruction error from degraded data of gold-standard acquisitions Cross-validation: leave half of the projections and minimize projection error of reconstruction Python code available: https://github.com/emmanuelle/tomo-tv @GaelVaroquaux 27
  • 53. Bibliography (1/3)[Candes 2006] E. Cand`s, J. Romberg and T. Tao, Robust uncertainty eprinciples: Exact signal reconstruction from highly incomplete frequencyinformation, Trans Inf Theory, (52) 2006[Wainwright 2009] M. Wainwright, Sharp Thresholds forHigh-Dimensional and Noisy Sparsity Recovery Using 1 constrainedquadratic programming (Lasso), Trans Inf Theory, (55) 2009[Mallat, Zhang 1993] S. Mallat and Z. Zhang, Matching pursuits withTime-Frequency dictionaries, Trans Sign Proc (41) 1993[Pati, et al 1993] Y. Pati, R. Rezaiifar, P. Krishnaprasad, Orthogonalmatching pursuit: Recursive function approximation with plications towavelet decomposition, 27th Signals, Systems and Computers Conf 1993 @GaelVaroquaux 28
  • 54. Bibliography (2/3)[Chen, Donoho, Saunders 1998] S. Chen, D. Donoho, M. Saunders,Atomic decomposition by basis pursuit, SIAM J Sci Computing (20) 1998[Tibshirani 1996] R. Tibshirani, Regression shrinkage and selection via thelasso, J Roy Stat Soc B, 1996[Gribonval 2011] R. Gribonval, Should penalized least squares regressionbe interpreted as Maximum A Posteriori estimation?, Trans Sig Proc,(59) 2011[Daubechies 2004] I. Daubechies, M. Defrise, C. De Mol, An iterativethresholding algorithm for linear inverse problems with a sparsityconstraint, Comm Pure Appl Math, (57) 2004 @GaelVaroquaux 29
  • 55. Bibliography (2/3)[Beck Teboulle 2009], A. Beck and M. Teboulle, A fast iterativeshrinkage-thresholding algorithm for linear inverse problems, SIAM JImaging Sciences, (2) 2009[Chambolle 2004], A. Chambolle, An algorithm for total variationminimization and applications, J Math imag vision, (20) 2004[Boyd 2004], S. Boyd and L. Vandenberghe, Convex Optimization,Cambridge University Press 2004 — Reference on convex optimization and duality[Michel 2011], V. Michel et al., Total variation regularization forfMRI-based prediction of behaviour, Trans Med Imag (30) 2011 — Proof of TV reformulation: appendix C @GaelVaroquaux 30

×