A hand-waving introduction to sparsity for compressed tomography reconstruction

2,558 views

Published on

An introduction to the basic concepts needed to understand sparse reconstruction in computed tomography.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,558
On SlideShare
0
From Embeds
0
Number of Embeds
42
Actions
Shares
0
Downloads
66
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

A hand-waving introduction to sparsity for compressed tomography reconstruction

  1. 1. A hand-waving introduction to sparsity for compressed tomography reconstruction Ga¨l Varoquaux and Emmanuelle Gouillart eDense slides. For future reference: http://www.slideshare.net/GaelVaroquaux
  2. 2. 1 Sparsity for inverse problems 2 Mathematical formulation 3 Choice of a sparse representation 4 Optimization algorithmsG Varoquaux 2
  3. 3. 1 Sparsity for inverse problems Problem setting IntuitionsG Varoquaux 3
  4. 4. 1 Tomography reconstruction: a linear problem y = Ax 0 20 40 60 80 0 50 100 150 200 250 y ∈ Rn, A ∈ Rn×p, x ∈ Rp n ∝ number of projections p: number of pixels in reconstructed image We want to find x knowing A and yG Varoquaux 4
  5. 5. 1 Small n: an ill-posed linear problem y = Ax admits multiple solutions The sensing operator A has a large null space: images that give null projections In particular it is blind to high spatial frequencies:G Varoquaux 5
  6. 6. 1 Small n: an ill-posed linear problem y = Ax admits multiple solutions The sensing operator A has a large null space: images that give null projections In particular it is blind to high spatial frequencies: Large number of projections Ill-conditioned problem: “short-sighted” rather than blind, ⇒ captures noise on those componentsG Varoquaux 5
  7. 7. 1 A toy example: spectral analysis Recovering the frequency spectrum Signal Freq. spectrum signal = A · frequencies 0 2 4 6 8 ... 0 10 20 30 40 50 60G Varoquaux 6
  8. 8. 1 A toy example: spectral analysis Sub-sampling Signal Freq. spectrum signal = A · frequencies 0 2 4 6 8 ... 0 10 20 30 40 50 60 Recovery problem becomes ill-posedG Varoquaux 6
  9. 9. 1 A toy example: spectral analysis Problem: aliasing Signal Freq. spectrum Information in the null-space of A is lostG Varoquaux 7
  10. 10. 1 A toy example: spectral analysis Problem: aliasing Signal Freq. spectrum Solution: incoherent measurements Signal Freq. spectrum i.e. careful choice of null-space of AG Varoquaux 7
  11. 11. 1 A toy example: spectral analysis Incoherent measurements, but scarcity of data Signal Freq. spectrum The null-space of A is spread out in frequency Not much data ⇒ large null-space = captures “noise”G Varoquaux 8
  12. 12. 1 A toy example: spectral analysis Incoherent measurements, but scarcity of data Signal Freq. spectrum The null-space of A is spread out in frequency Not much data ⇒ large null-space Sparse Freq. spectrum = captures “noise” Impose sparsity Find a small number of frequencies to explain the signalG Varoquaux 8
  13. 13. 1 And for tomography reconstruction? Original image Non-sparse reconstruction Sparse reconstruction 128 × 128 pixels, 18 projections http://scikit-learn.org/stable/auto_examples/applications/ plot_tomography_l1_reconstruction.htmlG Varoquaux 9
  14. 14. 1 Why does it work: a geometric explanation Two coefficients of x not in the null-space of A: x2 y= Ax xtrue x1 The sparsest solution is in the blue cross It corresponds to the true solution (xtrue ) if the slope is > 45◦G Varoquaux 10
  15. 15. 1 Why does it work: a geometric explanation Two coefficients of x not in the null-space of A: x2 y= Ax xtrue x1 The sparsest solution is in the blue cross It corresponds to the true solution (xtrue ) if the slope is > 45◦ The cross can be replaced by its convex hullG Varoquaux 10
  16. 16. 1 Why does it work: a geometric explanation Two coefficients of x not in the null-space of A: x2 y= Ax xtrue x1 The sparsest solution is in the blue cross It corresponds to the true solution (xtrue ) if the slope is > 45◦ In high dimension: large acceptable setG Varoquaux 10
  17. 17. Recovery of sparse signal Null space of sensing operator incoherent with sparse representation ⇒ Excellent sparse recovery with little projections Minimum number of observations necessary: nmin ∼ k log p, with k: number of non zeros [Candes 2006] Rmk Theory for i.i.d. samples Related to “compressive sensing”G Varoquaux 11
  18. 18. 2 Mathematical formulation Variational formulation Introduction of noiseG Varoquaux 12
  19. 19. 2 Maximizing the sparsity 0 number of non-zeros min 0(x) x s.t. y = A x x2 y= Ax xtrue x1 “Matching pursuit” problem [Mallat, Zhang 1993] “Orthogonal matching pursuit” [Pati, et al 1993] Problem: Non-convex optimizationG Varoquaux 13
  20. 20. 2 Maximizing the sparsity 1 (x) = i |xi | min 1(x) x s.t. y = A x x2 y= Ax xtrue x1 “Basis pursuit” [Chen, Donoho, Saunders 1998]G Varoquaux 13
  21. 21. 2 Modeling observation noise y = Ax + e e = observation noise New formulation: 2 min 1 (x) x s.t. y = A x y − Ax 2 ≤ ε2 Equivalent: “Lasso estimator” [Tibshirani 1996] 2 min y − Ax x 2 + λ 1(x)G Varoquaux 14
  22. 22. 2 Modeling observation noise y = Ax + e e = observation noise New formulation: 2 min 1 (x) x s.t. y = A x y − Ax 2 ≤ ε2 Equivalent: “Lasso estimator” [Tibshirani 1996] 2 x2 min y − Ax x 2 + λ 1(x) Data fit Penalization x1 Rmk: kink in the 1 ball creates sparsityG Varoquaux 14
  23. 23. 2 Probabilistic modeling: Bayesian interpretation P(x|y) ∝ P(y|x) P(x) ( ) “Posterior” Forward model “Prior” Quantity of interest Expectations on x Forward model: y = A x + e, e: Gaussian noise ⇒ P(y|x) ∝ exp − 2 1 2 y − A x 2 σ 2 1 Prior: Laplacian P(x) ∝ exp − µ x 1 1 2 1 Negated log of ( ): 2σ 2 y − Ax 2 + µ 1 (x) Maximum of posterior is Lasso estimate Note that this picture is limited and the Lasso is not a goodG Varoquaux Bayesian estimator for the Laplace prior [Gribonval 2011]. 15
  24. 24. 3 Choice of a sparse representation Sparse in wavelet domain Total variationG Varoquaux 16
  25. 25. 3 Sparsity in wavelet representation Typical images Haar decomposition Level 1 Level 2 Level 3 are not sparse Level 4 Level 5 Level 6 ⇒ Impose sparsity in Haar representation A → A H where H is the Haar transformG Varoquaux 17
  26. 26. 3 Sparsity in wavelet representation Original image Non-sparse reconstruction Typical images Haar decomposition Level 1 Level 2 Level 3 are not sparse Level 4 Level 5 Level 6 Sparse image Sparse in Haar ⇒ Impose sparsity in Haar representation A → A H where H is the Haar transformG Varoquaux 17
  27. 27. 3 Total variation Impose a sparse gradient 2 min y − Ax x 2 +λ ( x)i 2 i 12 norm: 1 norm of the gradient magnitude Sets x and y to zero jointly Original image Haar wavelet TV penalizationG Varoquaux 18
  28. 28. 3 Total variation Impose a sparse gradient 2 min y − Ax x 2 +λ ( x)i 2 i 12 norm: 1 norm of the gradient magnitude Sets x and y to zero jointly Original image Error for Haar wavelet Error for TV penalizationG Varoquaux 18
  29. 29. 3 Total variation + interval Bound x in [0, 1] 2 min y−Ax x 2+λ ( x)i 2 +I([0, 1]) i Original image TV penalization TV + intervalG Varoquaux 19
  30. 30. 3 Total variation + interval Bound x in [0, 1] 2 min y−Ax x 2+λ ( x)i 2 +I([0, 1]) i TV Rmk: Constraint Histograms: TV + interval does more than fold- ing values outside of the range back in. 0.0 0.5 1.0 Original image TV penalization TV + intervalG Varoquaux 19
  31. 31. 3 Total variation + interval Bound x in [0, 1] 2 min y−Ax x 2+λ ( x)i 2 +I([0, 1]) i TV Rmk: Constraint Histograms: TV + interval does more than fold- ing values outside of the range back in. 0.0 0.5 1.0 Original image Error for TV penalization Error for TV + intervalG Varoquaux 19
  32. 32. Analysis vs synthesis 2 Wavelet basis min y − A H x 2 + x 1 H Wavelet transform Total variation min y − A x 2 + Dx 2 1 D Spatial derivation operator ( )G Varoquaux 20
  33. 33. Analysis vs synthesis Wavelet basis min y − A H x 2 + x 2 1 H Wavelet transform “synthesis” formulation Total variation min y − A x 2 + Dx 2 1 D Spatial derivation operator ( ) “analysis” formulation Theory and algorithms easier for synthesis Equivalence iif D is invertibleG Varoquaux 20
  34. 34. 4 Optimization algorithms Non-smooth optimization ⇒ “proximal operators”G Varoquaux 21
  35. 35. 4 Smooth optimization fails! Gradient descent Gradient descent x2Energy Iterations x1 Smooth optimization fails in non-smooth regions These are specifically the spots that interest usG Varoquaux 22
  36. 36. 4 Iterative Shrinkage-Thresholding Algorithm Settings: min f + g; f smooth, g non-smooth f and g convex, f L-Lipschitz Typically f is the data fit term, and g the penalty 1 2 1 ex: Lasso 2σ 2 y − Ax 2 + µ 1 (x)G Varoquaux 23
  37. 37. 4 Iterative Shrinkage-Thresholding Algorithm Settings: min f + g; f smooth, g non-smooth f and g convex, f L-Lipschitz Minimize successively: (quadratic approx of f ) + g f (x) < f (y) + x − y, f (y) 2 +L x − y 2 2 Proof: by convexity f (y) ≤ f (x) + f (y) (y − x) in the second term: f (y) → f (x) + ( f (y) − f (x)) upper bound last term with Lipschitz continuity of f   L 1 2 xk+1 = argmin g(x) + x − xk − f (xk ) 2  x 2 L [Daubechies 2004]G Varoquaux 23
  38. 38. 4 Iterative Shrinkage-Thresholding Algorithm Step 1: Gradient descentsmooth, g non-smooth Settings: min f + g; f on f f and g convex, f L-Lipschitz Step 2: Proximal operator of g: Minimize proxλg (x) def argmin y − x 2 + λ g(y) successively: = 2 y (quadratic approx of f ) + g Generalization of Euclidean projection f (x) < f (y) + x − y, f (y)1} on convex set {x, g(x) ≤ 2 Rmk: if g is the indicator function +L x − y 2 2 of a set S, the proximal operator is the Euclidean projection. Proof: by convexity f (y) ≤ f (x) + f (y) (y − x) prox in theλ 1 (x) = sign(x ) x − λ second term: f (y) → f (x) + ( i f (y) − f (x)) i + upper bound last term with Lipschitz continuity of f “soft thresholding”   L 1 2 xk+1 = argmin g(x) + x − xk − f (xk ) 2  x 2 L [Daubechies 2004]G Varoquaux 23
  39. 39. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Gradient descent stepG Varoquaux 24
  40. 40. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Projection on 1 ballG Varoquaux 24
  41. 41. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Gradient descent stepG Varoquaux 24
  42. 42. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Projection on 1 ballG Varoquaux 24
  43. 43. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Gradient descent stepG Varoquaux 24
  44. 44. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Projection on 1 ballG Varoquaux 24
  45. 45. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Gradient descent stepG Varoquaux 24
  46. 46. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1 Projection on 1 ballG Varoquaux 24
  47. 47. 4 Iterative Shrinkage-Thresholding Algorithm Gradient descent x2 ISTAEnergy Iterations x1G Varoquaux 24
  48. 48. 4 Fast Iterative Shrinkage-Thresholding Algorithm FISTA Gradient descent x2 ISTA FISTAEnergy Iterations x1 As with conjugate gradient: add a memory term ISTA tk −1 dxk+1 = dxk+1 + tk+1 (dxk − dxk−1 ) √ 1+ 2 1+4 tk t1 = 1, tk+1 = 2 ⇒ O(k −2 ) convergence [Beck Teboulle 2009]G Varoquaux 25
  49. 49. 4 Proximal operator for total variation Reformulate to smooth + non-smooth with a simple projection step and use FISTA: [Chambolle 2004] 2 proxλTV x = argmin y − x 2 +λ ( x)i 2 x iG Varoquaux 26
  50. 50. 4 Proximal operator for total variation Reformulate to smooth + non-smooth with a simple projection step and use FISTA: [Chambolle 2004] 2 proxλTV x = argmin y − x 2 +λ ( x)i 2 x i 2 = argmax λ div z + y 2 z, z ∞ ≤1 Proof: “dual norm”: v 1 = max v, z z ∞ ≤1 div is the adjoint of : v, z = v, −div z Swap min and max and solve for x Duality: [Boyd 2004] This proof: [Michel 2011]G Varoquaux 26
  51. 51. Sparsity for compressed tomography reconstruction @GaelVaroquaux 27
  52. 52. Sparsity for compressed tomography reconstruction Add penalizations with kinks x2 Choice of prior/sparse representation Non-smooth optimization (FISTA) x1Further discussion: choice of prior/parameters Minimize reconstruction error from degraded data of gold-standard acquisitions Cross-validation: leave half of the projections and minimize projection error of reconstruction Python code available: https://github.com/emmanuelle/tomo-tv @GaelVaroquaux 27
  53. 53. Bibliography (1/3)[Candes 2006] E. Cand`s, J. Romberg and T. Tao, Robust uncertainty eprinciples: Exact signal reconstruction from highly incomplete frequencyinformation, Trans Inf Theory, (52) 2006[Wainwright 2009] M. Wainwright, Sharp Thresholds forHigh-Dimensional and Noisy Sparsity Recovery Using 1 constrainedquadratic programming (Lasso), Trans Inf Theory, (55) 2009[Mallat, Zhang 1993] S. Mallat and Z. Zhang, Matching pursuits withTime-Frequency dictionaries, Trans Sign Proc (41) 1993[Pati, et al 1993] Y. Pati, R. Rezaiifar, P. Krishnaprasad, Orthogonalmatching pursuit: Recursive function approximation with plications towavelet decomposition, 27th Signals, Systems and Computers Conf 1993 @GaelVaroquaux 28
  54. 54. Bibliography (2/3)[Chen, Donoho, Saunders 1998] S. Chen, D. Donoho, M. Saunders,Atomic decomposition by basis pursuit, SIAM J Sci Computing (20) 1998[Tibshirani 1996] R. Tibshirani, Regression shrinkage and selection via thelasso, J Roy Stat Soc B, 1996[Gribonval 2011] R. Gribonval, Should penalized least squares regressionbe interpreted as Maximum A Posteriori estimation?, Trans Sig Proc,(59) 2011[Daubechies 2004] I. Daubechies, M. Defrise, C. De Mol, An iterativethresholding algorithm for linear inverse problems with a sparsityconstraint, Comm Pure Appl Math, (57) 2004 @GaelVaroquaux 29
  55. 55. Bibliography (2/3)[Beck Teboulle 2009], A. Beck and M. Teboulle, A fast iterativeshrinkage-thresholding algorithm for linear inverse problems, SIAM JImaging Sciences, (2) 2009[Chambolle 2004], A. Chambolle, An algorithm for total variationminimization and applications, J Math imag vision, (20) 2004[Boyd 2004], S. Boyd and L. Vandenberghe, Convex Optimization,Cambridge University Press 2004 — Reference on convex optimization and duality[Michel 2011], V. Michel et al., Total variation regularization forfMRI-based prediction of behaviour, Trans Med Imag (30) 2011 — Proof of TV reformulation: appendix C @GaelVaroquaux 30

×