Successfully reported this slideshow.
Upcoming SlideShare
×

# Proximal Splitting and Optimal Transport

Presentation at the workshop organized by ANR project ISOTACE in Paris-Dauphine on Feb. 5th 2013.

• Full Name
Comment goes here.

Are you sure you want to Yes No

• Be the first to like this

### Proximal Splitting and Optimal Transport

1. 1. Proximal Splittingand Optimal Transport Gabriel Peyré www.numerical-tours.com
2. 2. Overview• Optimal Transport and Imaging• Convex Analysis and Proximal Calculus• Forward Backward• Douglas Rachford and ADMM• Generalized Forward-Backward• Primal-Dual Schemes
3. 3. ork, Measure Preserving Maps ica-d ofDistributions µ0 , µ1 on Rk . ase.eedsans- thateme rateanceeval t al. µ0 µ1
4. 4. ork, Measure Preserving Maps ica-d ofDistributions µ0 , µ1 on Rk . ase.eeds Mass preserving map T : Rk Rk .ans- that µ1 = T µ0 where (T µ0 )(A) = µ0 (T (A)) 1eme rateance x T (x)eval t al. µ0 µ1
5. 5. ork, Measure Preserving Maps ica-d ofDistributions µ0 , µ1 on Rk . ase.eeds Mass preserving map T : Rk Rk .ans- that µ1 = T µ0 where (T µ0 )(A) = µ0 (T (A)) 1eme rateance x T (x)eval t al. µ0 µ1 Distributions with densities: µi = i (x)dx T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x)
6. 6. Optimal TransportLp optimal transport: W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx) T µ0 =µ1
7. 7. Optimal TransportLp optimal transport: W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx) T µ0 =µ1Regularity condition: µ0 or µ1 does not give mass to “small sets”.Theorem (p > 1): there exists a unique optimal T . T T µ1 µ0
8. 8. Optimal TransportLp optimal transport: W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx) T µ0 =µ1Regularity condition: µ0 or µ1 does not give mass to “small sets”.Theorem (p > 1): there exists a unique optimal T .Theorem (p = 2): T is deﬁned as T = with convex. T T T (x) T (x ) T is monotone: µ1 x T (x) T (x ), x x 0 µ0 x
9. 9. Wasserstein Distance µCouplings: µ, x A Rd , ⇥(A Rd ) = µ(A) y B Rd , ⇥(Rd B) = (B)
10. 10. Wasserstein Distance µCouplings: µ, x A Rd , ⇥(A Rd ) = µ(A) y B Rd , ⇥(Rd B) = (B)Transportation cost: Wp (µ, )p = min c(x, y)d⇥(x, y) µ, Rd Rd
11. 11. Wasserstein Distance µCouplings: µ, x A Rd , ⇥(A Rd ) = µ(A) y B Rd , ⇥(Rd B) = (B)Transportation cost: Wp (µ, )p = min c(x, y)d⇥(x, y) µ, Rd Rd
12. 12. Optimal TransportLet p > 1 and µ does not vanish on small sets. Unique µ, s.t. Wp (µ, )p = c(x, y)d⇥(x, y) Rd RdOptimal transport T : Rd Rd : µ x y (x, T (x))
13. 13. Optimal TransportLet p > 1 and µ does not vanish on small sets. Unique µ, s.t. Wp (µ, )p = c(x, y)d⇥(x, y) Rd RdOptimal transport T : Rd Rd : µ xp = 2: T = unique solution of y ⇥ is convex l.s.c. (x, T (x)) ( ⇥)⇤µ =
14. 14. 1-D Continuous WassersteinDistributions µ, on R. tCumulative functions: Cµ (t) = dµ(x)For all p > 1: T =C 1 Cµ T is non-decreasing (“change of contrast”)
15. 15. 1-D Continuous WassersteinDistributions µ, on R. tCumulative functions: Cµ (t) = dµ(x)For all p > 1: T =C 1 Cµ T is non-decreasing (“change of contrast”)Explicit formulas: 1 H Wp (µ, )p = |Cµ 1 C 1 p | 0 W1 (µ, ) = |Cµ C | = ||(Cµ C ) ⇥ H||1 R
16. 16. Grayscale Histogram Transfer f1Input images: fi : [0, 1] 2 [0, 1], i = 0, 1.f0
17. 17. Grayscale Histogram Transfer f1Input images: fi : [0, 1] 2 [0, 1], i = 0, 1.Gray-value distributions: µi deﬁned on [0, 1]. µi ([a, b]) = 1{a f b} (x)dx [0,1]2 µ1f0 µ0
18. 18. Grayscale Histogram Transfer f1Input images: fi : [0, 1] 2 [0, 1], i = 0, 1.Gray-value distributions: µi deﬁned on [0, 1]. µi ([a, b]) = 1{a f b} (x)dx [0,1]2Optimal transport: T = Cµ11 Cµ0 . µ1f0 Cµ0 (f0 ) T (f0 ) Cµ0 Cµ11 µ0 µ1
19. 19. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projection iof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X ) f1 f0 Sliced Wasserstein project image color statistics Y f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 J. Rabin Wasserstein Regularization
20. 20. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projection iof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal assignement: min ||f0 f1 ⇥ || N Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X ) f1 f0 Sliced Wasserstein project image color statistics Y f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 J. Rabin Wasserstein Regularization
21. 21. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projection iof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal assignement: min ||f0 f1 ⇥ || N Transport: T : f0 (x) R3 f1 ( (i)) R3 Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X ) f1 f0 Sliced Wasserstein project image color statistics Y f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 T J. Rabin Wasserstein Regularization
22. 22. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projection iof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal assignement: min ||f0 f1 ⇥ || N Optimal transport framework Sliced Wasserstein projection Applications Transport: T : f0 (x) R3Application to Color Transfer R3 f1 ( (i)) Optimal transport framework Sliced Wasserstein projection Applications ˜ Application to ColorfTransfer Equalization:) f0 = T (f0 ) ˜ = f1 0 Sliced Wasserstein projection of X to sty Source image (X image color statistics Y f1 f0 T (f0 ) Sliced Wasserstein project image color statistics Y Source image (X ) T f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 Source image after color transfer µ1 Style image (Y ) T J. Rabin Wasserstein Regularization J. Rabin Wasserstein Regularization
23. 23. cðdvÞ ¼ l0> þ dvÞ detðrðv þ dvÞÞ À l1 ¼ 0: ðv can be thought as an elliptic system thought as anThe sys-system of equations. The trilinearRelaxation was performed for transferring v cc > tem cv cc can be of equations. elliptic a sys- the GPU. We used cubic grid. interpolation used a trilineara parallelizable four- the GPU. We operator using interpolation operator for transferring Image Registration Ittem isto verify that a correction for dv can be obtained by solving with an is easy solved using preconditioned conjugate gradient color Gauss-Seidel relaxation scheme. Thisrestriction s solved using preconditioned conjugate À1 gradient with an the coarse grid residual increases robustness the coarse grid correction to ﬁne grids. Thecorrection to ﬁne grids. The residual restriction the system dv % c> ðcv c> Þ cðvÞ (Nocedal and Wright, 1999) The sys- and efﬁciency and is especially suited for the implementation on incomplete Cholesky preconditioner.mplete Cholesky preconditioner. v v operator for projecting residual from for projecting residual from the ﬁne to coarse grids is operator the ﬁne to coarse grids is tem c c> can be thought as an elliptic system of equations. The sys- v c the GPU. We used a trilinear interpolation operator for transferring tem is solved using preconditioned conjugate gradient with an the coarse grid correction to ﬁne grids. The residual restriction incomplete Cholesky preconditioner. operator for projecting residual from the ﬁne to coarse grids is T [ur Rehman et al, 2009] Fig. 6. OMT Results viewed on an axial slice. The top row shows corresponding slices from Pre-op(Left) and Post-op(Right) MRI data. The deformation is clearly visible in the anterior part of the brain.
24. 24. Convex Formulation (Benamou-Brenier) ⇢ ⇢ : Rd ⇥ [0, 1] ! R+ solving:Find m : Rd ⇥ [0, 1] ! Rd W (µ0 , µ1 )2 = min J(x) + ◆C (x) x=(m,⇢)
25. 25. Convex Formulation (Benamou-Brenier) ⇢ ⇢ : Rd ⇥ [0, 1] ! R+ solving:Find m : Rd ⇥ [0, 1] ! Rd W (µ0 , µ1 )2 = min J(x) + ◆C (x) x=(m,⇢) Z Z 1J(x) = j(x(s, t))dtds s2Rd t=0 8 ||m||2 < ˜˜ ⇢ if ⇢ > 0, ˜ j(m, ⇢) = ˜ ˜ : 0 if ⇢ = 0 and m = 0, ˜ ˜ +1 otherwise. 2 R 2 R2
26. 26. Convex Formulation (Benamou-Brenier) ⇢ ⇢ : Rd ⇥ [0, 1] ! R+ solving: Find m : Rd ⇥ [0, 1] ! Rd W (µ0 , µ1 )2 = min J(x) + ◆C (x) x=(m,⇢) Z Z 1J(x) = j(x(s, t))dtds s2Rd t=0 8 ||m||2 < ˜˜ ⇢ if ⇢ > 0, ˜ j(m, ⇢) = ˜ ˜ : 0 if ⇢ = 0 and m = 0, ˜ ˜ +1 otherwise. 2 R 2 R2C = {x = (m, ⇢) div(x) = 0, B(⇢) = (⇢0 , ⇢1 )} B(⇢) = (⇢(0, ·), ⇢(1, ·))
27. 27. Numerical Examples ⇢0 ⇢1 t
28. 28. Numerical Examples ⇢0 ⇢1 con-work,plica- ad of ease.peeds rans- t that heme eratemance ieval et al. t Figure 7: Synthetic 2D examples on a Euclidean domain. The
29. 29. Discrete Formulation sCentered grid formulation (d = 1): min J(x) + ◆C (x) x2RGc ⇥2 P J(x) = i2Gc j(xi ) t Centered grid Gc
30. 30. Discrete Formulation sCentered grid formulation (d = 1): min J(x) + ◆C (x) x2RGc ⇥2 P J(x) = i2Gc j(xi ) tStaggered grid formulation : Centered grid Gc min 2 J(I(x)) + ◆C (x) s 1 x2RGst ⇥RGst t Staggered grid 1 2 Gst Gst
31. 31. Discrete Formulation sCentered grid formulation (d = 1): min J(x) + ◆C (x) x2RGc ⇥2 P J(x) = i2Gc j(xi ) tStaggered grid formulation : Centered grid Gc min 2 J(I(x)) + ◆C (x) s 1 x2RGst ⇥RGstInterpolation operator: 1 2 Gst Gst 1 2 I = (I , I ) : R ⇥R ! RG c t 2I1 (m)i,j = mi+ 1 ,j + mi 2 1 2 ,j Staggered grid ! Projection on div(x) = 0 using FFTs. 1 2 Gst Gst
32. 32. SOCP Formulation P min J(x) + ◆C (x) J(x) = i2Gc j(xi ) x2RGc ⇥d X() min ri s.t. 8 i 2 Gc , (mi , ⇢i , ri ) 2 K x2RGc ⇥d ,r2RGc i(Rotated) Lorentz cone: K = (m, ⇢, r) 2 Rd+2 ||m||2 6 ⇢r ˜ ˜ ˜ ˜ ˜˜
33. 33. SOCP Formulation P min J(x) + ◆C (x) J(x) = i2Gc j(xi ) x2RGc ⇥d X() min ri s.t. 8 i 2 Gc , (mi , ⇢i , ri ) 2 K x2RGc ⇥d ,r2RGc i(Rotated) Lorentz cone: K = (m, ⇢, r) 2 Rd+2 ||m||2 6 ⇢r ˜ ˜ ˜ ˜ ˜˜Second order cone program: ! Use interior point methods (e.g. MOSEK software). Linear convergence with iteration #. Poor scaling with dimension |Gc |. E cient for medium scale problems (N ⇠ 104 ).
34. 34. 1 Example: RegularizationInverse problem: measurements y = x0 + w x0 y
35. 35. 1 Example: Regularization Inverse problem: measurements y = x0 + w x0 y x? argminRegularized inversion: x? 2 argmin 1 ||y 2 x||2 + R(x) x2R N Data ﬁdelity Regularity
36. 36. 1 Example: Regularization Inverse problem: measurements y = x0 + w x0 y x? argminRegularized inversion: x? 2 argmin 1 ||y 2 x||2 + R(x) x2R N Data ﬁdelity Regularity PTotal Variation: R(x) = i ||(rx)i ||
37. 37. 1 Example: Regularization Inverse problem: measurements y = x0 + w x0 y x? argminRegularized inversion: x? 2 argmin 1 ||y 2 x||2 + R(x) x2R N Data ﬁdelity Regularity PTotal Variation: R(x) = i ||(rx)i || 1 P ⇤` sparsity: R(x) = i |xi | Images are sparse in wavelet bases. ⇤ Image f = x Coe↵. x = f
38. 38. Overview• Optimal Transport and Imaging• Convex Analysis and Proximal Calculus• Forward Backward• Douglas Rachford and ADMM• Generalized Forward-Backward• Primal-Dual Schemes
39. 39. Convex OptimizationSetting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x H
40. 40. Convex OptimizationSetting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x HClass of functions: x y Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1]
41. 41. Convex OptimizationSetting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x HClass of functions: x y Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1] Lower semi-continuous: lim inf G(x) G(x0 ) x x0 Proper: {x ⇥ H G(x) ⇤= + } = ⌅ ⇤
42. 42. Convex OptimizationSetting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x HClass of functions: x y Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1] Lower semi-continuous: lim inf G(x) G(x0 ) x x0 Proper: {x ⇥ H G(x) ⇤= + } = ⌅ ⇤ 0 if x ⇥ C,Indicator: C (x) = + otherwise. (C closed and convex)
43. 43. Sub-differentialSub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x| G(0) = [ 1, 1]
44. 44. Sub-differentialSub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x|Smooth functions: If F is C 1 , F (x) = { F (x)} G(0) = [ 1, 1]
45. 45. Sub-differentialSub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x|Smooth functions: If F is C 1 , F (x) = { F (x)} G(0) = [ 1, 1]First-order conditions: x argmin G(x) 0 G(x ) x H
46. 46. Sub-differentialSub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x|Smooth functions: If F is C 1 , F (x) = { F (x)} G(0) = [ 1, 1]First-order conditions: x argmin G(x) 0 G(x ) x H U (x) xMonotone operator: U (x) = G(x) (u, v) U (x) U (y), y x, v u 0
47. 47. Prox and Subdifferential 1Prox G (x) = argmin ||x z||2 + G(z) z 2
48. 48. Prox and Subdifferential 1 Prox G (x) = argmin ||x z||2 + G(z) z 2Resolvant of G: z = Prox G (x) 0 z x + ⇥G(z) x (Id + ⇥G)(z)
49. 49. Prox and Subdifferential 1 Prox G (x) = argmin ||x z||2 + G(z) z 2Resolvant of G: z = Prox G (x) 0 z x + ⇥G(z) x (Id + ⇥G)(z) z = (Id + ⇥G) 1 (x)Inverse of a set-valued mapping: where x U (y) y U 1 (x) Prox G = (Id + ⇥G) 1 is a single-valued mapping
50. 50. Prox and Subdifferential 1 Prox G (x) = argmin ||x z||2 + G(z) z 2Resolvant of G: z = Prox G (x) 0 z x + ⇥G(z) x (Id + ⇥G)(z) z = (Id + ⇥G) 1 (x)Inverse of a set-valued mapping: where x U (y) y U 1 (x) Prox G = (Id + ⇥G) 1 is a single-valued mappingFix point: x argmin G(x) x 0 G(x ) x (Id + ⇥G)(x ) x⇥ = (Id + ⇥G) 1 (x⇥ ) = Prox G (x⇥ )
51. 51. Proximal CalculusSeparability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
52. 52. Proximal CalculusSeparability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1
53. 53. Proximal CalculusSeparability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1Composition by tight frame: A A = Id ProxG A (x) =A ProxG A + Id A A
54. 54. Proximal CalculusSeparability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1Composition by tight frame: A A = Id ProxG A (x) =A ProxG A + Id A A xIndicators: G(x) = C (x) C Prox G (x) = ProjC (x) ProjC (x) = argmin ||x z|| z C
55. 55. Prox of Sparse Regularizers 1Prox G (x) = argmin ||x z||2 + G(z) z 2
56. 56. Prox of Sparse Regularizers 1 Prox G (x) = argmin ||x z||2 + G(z) z 2G(x) = ||x||1 = |xi | 12 log(1 + x2 ) i 10 |x| ||x||0 8 6 4 2G(x) = ||x||0 = | {i xi = 0} | 0 −2 G(x) −10 −8 −6 −4 −2 0 2 4 6 8 10G(x) = log(1 + |xi |2 ) i
57. 57. Prox of Sparse Regularizers 1 Prox G (x) = argmin ||x z||2 + G(z) z 2G(x) = ||x||1 = |xi | 12 log(1 + x2 ) i 10 |x| ||x||0 Prox G (x)i = max 0, 1 xi 8 |xi | 6 4 2G(x) = ||x||0 = | {i xi = 0} | 0 −2 G(x) xi if |xi | 2 , −10 −8 −6 −4 −2 0 2 4 6 8 10 Prox G (x)i = 10 0 otherwise. 8 6 4 2G(x) = log(1 + |xi |2 ) −2 0 i −4 3rd order polynomial root. −6 −8 ProxG (x) −10 −10 −8 −6 −4 −2 0 2 4 6 8 10
58. 58. Legendre-Fenchel DualityLegendre-Fenchel transform: G (u) = sup u, x G(x) eu x dom(G) G(x) S lop G (u) x
59. 59. Legendre-Fenchel DualityLegendre-Fenchel transform: G (u) = sup u, x G(x) eu x dom(G) G(x) S lop G (u)Example: quadratic functional 1 x G(x) = Ax, x + x, b 2 1 G (u) = u b, A 1 (u b) 2
60. 60. Legendre-Fenchel DualityLegendre-Fenchel transform: G (u) = sup u, x G(x) eu x dom(G) G(x) S lop G (u)Example: quadratic functional 1 x G(x) = Ax, x + x, b 2 1 G (u) = u b, A 1 (u b) 2Moreau’s identity: Prox G (x) = x ProxG/ (x/ ) G simple G simple
61. 61. Indicator and Homogeneous FunctionalsPositively 1-homogeneous functional: G( x) = | |G(x) Example: norm G(x) = ||x||Duality: G (x) = G (·) 1 (x) G (y) = min x, y G(x) 1
62. 62. Indicator and Homogeneous FunctionalsPositively 1-homogeneous functional: G( x) = | |G(x) Example: norm G(x) = ||x||Duality: G (x) = G (·) 1 (x) G (y) = min x, y G(x) 1 p norms: G(x) = ||x||p 1 1 + =1 1 p, q + G (x) = ||x||q p q
63. 63. Indicator and Homogeneous FunctionalsPositively 1-homogeneous functional: G( x) = | |G(x) Example: norm G(x) = ||x||Duality: G (x) = G (·) 1 (x) G (y) = min x, y G(x) 1 p norms: G(x) = ||x||p 1 1 + =1 1 p, q + G (x) = ||x||q p qExample: Proximal operator of norm Prox ||·|| = Id Proj||·||1 Proj||·||1 (x)i = max 0, 1 xi |xi | for a well-chosen ⇥ = ⇥ (x, )
64. 64. Prox of the J Functional X ||m||2 ˜J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) = ˜ ˜ for ⇢ > 0 ˜ i ⇢˜
65. 65. Prox of the J Functional X ||m||2 ˜J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) = ˜ ˜ for ⇢ > 0 ˜ i ⇢˜Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i
66. 66. Prox of the J Functional X ||m||2 ˜J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) = ˜ ˜ for ⇢ > 0 ˜ i ⇢˜Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i j ⇤ = ◆C where C = (a, b) 2 R2 ⇥ R 2||a||2 + b 6 0Prox j (˜) = x x ˜ ProjC (˜/ ) x where x = (m, ⇢) ˜ ˜ ˜
67. 67. Prox of the J Functional X ||m||2 ˜J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) = ˜ ˜ for ⇢ > 0 ˜ i ⇢˜Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i j ⇤ = ◆C where C = (a, b) 2 R2 ⇥ R 2||a||2 + b 6 0Prox j (˜) = x x ˜ ProjC (˜/ ) x where x = (m, ⇢) ˜ ˜ ˜ ⇢ (m? , ⇢? ) if ⇢? > 0 Proposition: Prox (m, ⇢) = ˜ ˜ (0, 0) otherwise. ⇢? m ˜ ? where m = ? and ⇢? is the largest root of ⇢ +2 X 3 + (4 ⇢)X 2 + 4 ( ˜ ⇢)X ˜ ||m||2 ˜ 4 2 ⇢=0 ˜
68. 68. Overview• Optimal Transport and Imaging• Convex Analysis and Proximal Calculus• Forward Backward• Douglas Rachford and ADMM• Generalized Forward-Backward• Primal-Dual Schemes
69. 69. Gradient and Proximal DescentsGradient descent: x( +1) = x( ) G(x( ) ) [explicit] G is C 1 and G is L-Lipschitz Theorem: If 0 < < 2/L, x( ) x a solution.
70. 70. Gradient and Proximal DescentsGradient descent: x( +1) = x( ) G(x( ) ) [explicit] G is C 1 and G is L-Lipschitz Theorem: If 0 < < 2/L, x( ) x a solution.Sub-gradient descent: x( +1) = x( ) v( ) , v( ) G(x( ) ) Theorem: If 1/⇥, x( ) x a solution. Problem: slow.
71. 71. Gradient and Proximal DescentsGradient descent: x( +1) = x( ) G(x( ) ) [explicit] G is C 1 and G is L-Lipschitz Theorem: If 0 < < 2/L, x( ) x a solution.Sub-gradient descent: x( +1) = x( ) v( ) , v( ) G(x( ) ) Theorem: If 1/⇥, x( ) x a solution. Problem: slow.Proximal-point algorithm: x(⇥+1) = Prox G (x(⇥) ) [implicit] Theorem: If c > 0, x( ) x a solution. Prox G hard to compute. [Rockafellar, 70]
72. 72. Proximal Splitting Methods Solve min E(x) x HProblem: Prox E is not available.
73. 73. Proximal Splitting Methods Solve min E(x) x HProblem: Prox E is not available.Splitting: E(x) = F (x) + Gi (x) i Smooth Simple
74. 74. Proximal Splitting Methods Solve min E(x) x HProblem: Prox E is not available.Splitting: E(x) = F (x) + Gi (x) i Smooth Simple F (x)Iterative algorithms using: Prox Gi (x) solves Forward-Backward: F + G Douglas-Rachford: Gi Primal-Dual: Gi A Generalized FB: F+ Gi
75. 75. Smooth + Simple SplittingInverse problem: measurements y = Kf0 + w f0 Kf0 K K : RN RP , P NModel: f0 = x0 sparse in dictionary .Sparse recovery: f = x where x solves min F (x) + G(x) x RN Smooth Simple 1Data ﬁdelity: F (x) = ||y x||2 =K ⇥ 2Regularization: G(x) = ||x||1 = |xi | i
76. 76. Forward-BackwardFix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ ))
77. 77. Forward-BackwardFix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ ))Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )
78. 78. Forward-BackwardFix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ ))Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )Projected gradient descent: G= C
79. 79. Forward-BackwardFix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ ))Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )Projected gradient descent: G= C Theorem: Let F be L-Lipschitz. If < 2/L, x( ) x a solution of ( ) [Passty 79, Gabay, 83]
80. 80. Example: L1 Regularization 1 min || x y||2 + ||x||1 min F (x) + G(x) x 2 x 1 F (x) = || x y||2 2 F (x) = ( x y) L = || || G(x) = ||x||1 ⇥ Prox G (x)i = max 0, 1 xi |xi |Forward-backward Iterative soft thresholding
81. 81. Convergence Speedmin E(x) = F (x) + G(x) x F is L-Lipschitz. G is simple.Theorem: If L > 0, FB iterates x( ) satisﬁes E(x( ) ) E(x ) C/ C degrades with L 0.
82. 82. Multi-steps AccelerationsBeck-Teboule accelerated FB: t(0) = 1 ✓ ◆ (`+1) (`) 1 x = Prox1/L y rF (y (`) ) L 1+ 1 + 4(t( ) )2 t( +1) = 2() t 1 ( y ( +1) =x( +1) + ( +1) (x +1) x( ) ) t (see also Nesterov method) C Theorem: If L > 0, E(x ( ) ) E(x )Complexity theory: optimal in a worse-case sense.
83. 83. Overview• Optimal Transport and Imaging• Convex Analysis and Proximal Calculus• Forward Backward• Douglas Rachford and ADMM• Generalized Forward-Backward• Primal-Dual Schemes
84. 84. Douglas Rachford Scheme min G1 (x) + G2 (x) ( ) x Simple SimpleDouglas-Rachford iterations: z (⇥+1) = 1 z (⇥) + RProx G2 RProx G1 (z (⇥) ) 2 2 x(⇥+1) = Prox G2 (z (⇥+1) )Reﬂexive prox: RProx G (x) = 2Prox G (x) x
85. 85. Douglas Rachford Scheme min G1 (x) + G2 (x) ( ) x Simple SimpleDouglas-Rachford iterations: z (⇥+1) = 1 z (⇥) + RProx G2 RProx G1 (z (⇥) ) 2 2 x(⇥+1) = Prox G2 (z (⇥+1) )Reﬂexive prox: RProx G (x) = 2Prox G (x) x Theorem: If 0 < < 2 and ⇥ > 0, x( ) x a solution of ( ) [Lions, Mercier, 79]
86. 86. DR Fix Point Equationmin G1 (x) + G2 (x) 0 (G1 + G2 )(x) x z, z x ⇥( G1 )(x) and x z ⇥( G2 )(x) x = Prox G1 (z) and (2x z) x ⇥( G2 )(x)
87. 87. DR Fix Point Equationmin G1 (x) + G2 (x) 0 (G1 + G2 )(x) x z, z x ⇥( G1 )(x) and x z ⇥( G2 )(x) x = Prox G1 (z) and (2x z) x ⇥( G2 )(x) x = Prox G2 (2x z) = Prox G2 RProx G1 (z) z = 2Prox G2 RProx G1 (y) (2x z) z = 2Prox G2 RProx G1 (z) RProx G1 (z) z = RProx G2 RProx G1 (z) z= 1 z+ RProx G2 RProx G1 (z) 2 2
88. 88. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x)x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid Gc
89. 89. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x)x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid GcProx J : cubic root (closed form).
90. 90. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x)x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid GcProx J : cubic root (closed form).Prox◆C = ProjC = (Id A⇤ 1 A) + A⇤ 1 y 1 = (AA⇤ ) 1 : solving a Poisson equation with b.c.
91. 91. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x) x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid Gc Prox J : cubic root (closed form). Prox◆C = ProjC = (Id A⇤ 1 A) + A⇤ 1 y 1 = (AA⇤ ) 1 : solving a Poisson equation with b.c.Proposition: DR(↵ = 1) is ALG2 of [Benamou, Brenier 2000]
92. 92. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x) x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid Gc Prox J : cubic root (closed form). Prox◆C = ProjC = (Id A⇤ 1 A) + A⇤ 1 y 1 = (AA⇤ ) 1 : solving a Poisson equation with b.c.Proposition: DR(↵ = 1) is ALG2 of [Benamou, Brenier 2000] ! Advantage: relaxation parameter ↵ 2]0, 1[.
93. 93. Example: Constrained L1 min ||x||1 min G1 (x) + G2 (x) x=y xG1 (x) = iC (x), C = {x x = y} Prox G1 (x) = ProjC (x) = x + ⇥ ( ⇥ ) 1 (y x)G2 (x) = ||x||1 Prox G2 (x) = max 0, 1 xi |xi | i e⇥cient if easy to invert.
94. 94. Example: Constrained L1 min ||x||1 min G1 (x) + G2 (x) x=y xG1 (x) = iC (x), C = {x x = y} Prox G1 (x) = ProjC (x) = x + ⇥ ( ⇥ ) 1 (y x)G2 (x) = ||x||1 Prox G2 (x) = max 0, 1 xi |xi | i e⇥cient if easy to invert. log10 (||x( ) ||1 ||x ||1 ) 1Example: compressed sensing −1 0 R100 400 Gaussian matrix −2 −3 = 0.01 y = x0 ||x0 ||0 = 17 −4 =1 −5 = 10 50 100 150 200 250
95. 95. Auxiliary Variables with DRmin G1 (x) + G2 A(x) Linear map A : E H. x min G(z) + C (z) G1 , G2 simple.z⇥H E G(x, y) = G1 (x) + G2 (y) C = {(x, y) ⇥ H E Ax = y}
96. 96. Auxiliary Variables with DR min G1 (x) + G2 A(x) Linear map A : E H. x min G(z) + C (z) G1 , G2 simple. z⇥H E G(x, y) = G1 (x) + G2 (y) C = {(x, y) ⇥ H E Ax = y}Prox G (x, y) = (Prox G1 (x), Prox G2 (y))Prox C (x, y) = (x + A y , y ˜ y ) = (˜, A˜) ˜ x x y = (Id + AA ) ˜ 1 (Ax y) where x = (Id + A A) ˜ 1 (A y + x) e cient if Id + AA or Id + A A easy to invert.
97. 97. Example: TV Regularization 1 ||u||1 = ||ui || min ||Kf y||2 + ||⇥f ||1 f 2 i min G1 (f ) + G2 (f ) xG1 (u) = ||u||1 Prox G1 (u)i = max 0, 1 ui ||ui || 1G2 (f ) = ||Kf y||2 Prox = (Id + K K) 1 K 2 G2C = (f, u) ⇥ RN RN 2 u = ⇤f ˜ ˜ Prox C (f, u) = (f , f )
98. 98. Example: TV Regularization 1 ||u||1 = ||ui || min ||Kf y||2 + ||⇥f ||1 f 2 i min G1 (f ) + G2 (f ) xG1 (u) = ||u||1 Prox G1 (u)i = max 0, 1 ui ||ui || 1G2 (f ) = ||Kf y||2 Prox = (Id + K K) 1 K 2 G2C = (f, u) ⇥ RN RN 2 u = ⇤f ˜ ˜ Prox C (f, u) = (f , f )Compute the solution of: (Id + ˜ )f = div(u) + f O(N log(N )) operations using FFT.
99. 99. Example: TV Regularization Orignal f0 y = f0 + w Recovery fy = Kx0 Iteration
100. 100. Alternating Direction Method of Multipliersmin F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective.
101. 101. Alternating Direction Method of Multipliers min F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective.Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi x,y u
102. 102. Alternating Direction Method of Multipliers min F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective.Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi x,y uAugmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2 x,y u 2
103. 103. Alternating Direction Method of Multipliers min F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective.Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi x,y uAugmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2 x,y u 2 (`+1) x = argminx L (x, y (`) , u(`) ) ADMM y (`+1) = argminy L (x(`+1) , y, u(`) ) u(`+1) = u(`) + (y (`+1) Ax(`+1) )
104. 104. Alternating Direction Method of Multipliers min F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective.Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi x,y uAugmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2 x,y u 2 (`+1) x = argminx L (x, y (`) , u(`) ) ADMM y (`+1) = argminy L (x(`+1) , y, u(`) ) u(`+1) = u(`) + (y (`+1) Ax(`+1) ) Theorem: If > 0, x( ) x a solution of ( ) [Gabay, Mercier, Glowinski, Marrocco, 76]
105. 105. ADMM with Proximal OperatorsProximal mapping for metric A: (A is injective) A 1 Prox F = argmin ||Ax z||2 + F (x) x 2
106. 106. ADMM with Proximal OperatorsProximal mapping for metric A: (A is injective) A 1 Prox F = argmin ||Ax z||2 + F (x) x 2 Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ )
107. 107. ADMM with Proximal OperatorsProximal mapping for metric A: (A is injective) A 1 Prox F = argmin ||Ax z||2 + F (x) x 2 Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ ) x(`+1) = ProxA (y (`) F/ u(`) ) ADMM y (`+1) = ProxG/ (Ax(`+1) + u(`) ) u(`+1) = u(`) + (y (`+1) Ax(`+1) )
108. 108. ADMM with Proximal OperatorsProximal mapping for metric A: (A is injective) A 1 Prox F = argmin ||Ax z||2 + F (x) x 2 Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ ) x(`+1) = ProxA (y (`) F/ u(`) ) ADMM y (`+1) = ProxG/ (Ax(`+1) + u(`) ) u(`+1) = u(`) + (y (`+1) Ax(`+1) ) ! If G A is simple: use DR. ! If F ⇤ A⇤ is simple: use ADMM.
109. 109. ADMM vs. DRFenchel-Rockafellar duality: min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u) x uImportant: no bijection between u and x.
110. 110. ADMM vs. DRFenchel-Rockafellar duality: min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u) x uImportant: no bijection between u and x. Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM. [Eckstein, Bertsekas, 92]
111. 111. ADMM vs. DRFenchel-Rockafellar duality: min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u) x uImportant: no bijection between u and x. Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM. [Eckstein, Bertsekas, 92]DR iterations (when ↵ = 1): (`+1) 1 (`) 1 z = z + RProx F⇤ A⇤ RProx G⇤ (z (`) ) 2 2
112. 112. ADMM vs. DRFenchel-Rockafellar duality: min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u) x uImportant: no bijection between u and x. Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM. [Eckstein, Bertsekas, 92]DR iterations (when ↵ = 1): (`+1) 1 (`) 1 z = z + RProx F ⇤ A⇤ RProx G⇤ (z (`) ) 2 2The iterates of ADMM are recovered using: (`) 1 (`) y = (z u(`) ) x(`+1) = ProxA (y (`) u(`) ) F/ u(`) = Prox G⇤ (z (`) )
113. 113. More than 2 Functionals min G1 (x) + . . . + Gk (x) each Fi is simple x min G(x1 , . . . , xk ) + C (x1 , . . . , xk ) xG(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )C = (x1 , . . . , xk ) Hk x1 = . . . = xk
114. 114. More than 2 Functionals min G1 (x) + . . . + Gk (x) each Fi is simple x min G(x1 , . . . , xk ) + C (x1 , . . . , xk ) x G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk ) C = (x1 , . . . , xk ) Hk x1 = . . . = xkG and C are simple: Prox G (x1 , . . . , xk ) = (Prox Gi (xi ))i 1 Prox ⇥C (x1 , . . . , xk ) = (˜, . . . , x) x ˜ where x = ˜ xi k i
115. 115. Overview• Optimal Transport and Imaging• Convex Analysis and Proximal Calculus• Forward Backward• Douglas Rachford and ADMM• Generalized Forward-Backward• Primal-Dual Schemes
116. 116. GFB Splitting n min F (x) + Gi (x) ( ) x RN i=1i = 1, . . . , n, Smooth Simple (⇥+1) (⇥) (⇥) zi = zi + Proxn G (2x (⇥) zi F (x(⇥) )) x(⇥) n 1 ( +1) x( +1) = zi n i=1 [Raguet, Fadili, Peyr´ 2012] e
117. 117. GFB Splitting n min F (x) + Gi (x) ( ) x RN i=1i = 1, . . . , n, Smooth Simple (⇥+1) (⇥) (⇥) zi = zi + Proxn G (2x (⇥) zi F (x(⇥) )) x(⇥) n 1 ( +1) x( +1) = zi n i=1 Theorem: Let F be L-Lipschitz. If < 2/L, x( ) x a solution of ( ) [Raguet, Fadili, Peyr´ 2012] e
118. 118. GFB Splitting n min F (x) + Gi (x) ( ) x RN i=1i = 1, . . . , n, Smooth Simple (⇥+1) (⇥) (⇥) zi = zi + Proxn G (2x (⇥) zi F (x(⇥) )) x(⇥) n 1 ( +1) x( +1) = zi n i=1 Theorem: Let F be L-Lipschitz. If < 2/L, x( ) x a solution of ( ) [Raguet, Fadili, Peyr´ 2012] e n=1 Forward-backward. F =0 Douglas-Rachford.
119. 119. GFB Fix Pointx argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x ) x RN yi Gi (x ), F (x ) + i yi =0
120. 120. GFB Fix Pointx argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x ) x RN yi Gi (x ), F (x ) + i yi =0 1 (zi )n , i=1 i, x zi F (x ) ⇥Gi (x ) n x = 1 n i zi (use zi = x F (x ) N yi )
121. 121. GFB Fix Pointx argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x ) x RN yi Gi (x ), F (x ) + i yi =0 1 (zi )n , i=1 i, x zi F (x ) ⇥Gi (x ) n x = 1 n i zi (use zi = x F (x ) N yi ) (2x zi F (x )) x n ⇥Gi (x ) x⇥ = Proxn Gi (2x⇥ zi F (x⇥ )) zi = zi + Proxn G (2x⇥ zi F (x⇥ )) x⇥
122. 122. GFB Fix Pointx argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x ) x RN yi Gi (x ), F (x ) + i yi =0 1 (zi )n , i=1 i, x zi F (x ) ⇥Gi (x ) n x = 1 n i zi (use zi = x F (x ) N yi ) (2x zi F (x )) x n ⇥Gi (x ) x⇥ = Proxn Gi (2x⇥ zi F (x⇥ )) zi = zi + Proxn G (2x⇥ zi F (x⇥ )) x⇥ + Fix point equation on (x , z1 , . . . , zn ).
123. 123. Block Regularization 1 2 block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2 m b B m biments Towards More Complex Penalization (2) Bk2 + ` 1 `2 4 k=1 x 1,2 b B1 i b xi ⇥ x⇥⇥1 = i ⇥xi ⇥ b B i b xi2 + i b xi N: 256 b B2 b B Image f = x Coe cients x.
124. 124. Block Regularization 1 2 block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2 m b B m biments Towards More Complex Penalization Non-overlapping decomposition: B = B ... B Towards More Complex Penalization Towards More Complex Penalization n 1 n2 G(x) =4 x iBk (2) + ` ` k=1 G 1,2 (x) Gi (x) = ||x[b] ||, 1 2 i=1 b Bi b b 1b1 B1 i b xiixb xi 22 BB ⇥ x⇥x⇥x⇥⇥1 =i ⇥x⇥x⇥xi ⇥ ⇥= ++ + i b i ⇥ ⇥1 ⇥1 = i i ⇥i i ⇥ bb B B i Bb xii2bi2xi2 bbx i N: 256 b b 2b2 B2 i BB xi2 b2xi b b xi i b B Image f = x Coe cients x. Blocks B1 B1 B2
125. 125. Block Regularization 1 2 block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2 m b B m biments Towards More Complex Penalization Non-overlapping decomposition: B = B ... B Towards More Complex Penalization Towards More Complex Penalization n 1 n2 G(x) =4 x iBk (2) + ` ` k=1 G 1,2 (x) Gi (x) = ||x[b] ||, 1 2 i=1 b Bi Each Gi is simple: b b 1b1 B1 i b xiixb xi BB 22 ⇥ x⇥x⇥x⇥⇥1 =i ⇥xG ⇥xi ⇥ m = b B B i b xii2bi2xi2 ⇥ ⇥1 = i ⇥i i x + i b i ⇤ m ⇥ b ⇥ Bi , ⇥ ⇥1Prox i ⇥xi ⇥(x) b max i0, 1 = Bb bx ++m N: 256 ||x[b]b||B xi2 b2xi 2 2 B2 b B b i b b xi i b B Image f = x Coe cients x. Blocks B1 B1 B2
126. 126. Deconv. + Inpaint. 2min+CP Y ⇥ P K x CP Y + P 1 K2 Deconv. x 2Inpaint. min 2 ⇥ ` ` x x k=1 x+1,2` k=1 log10(E−E 2 1 `2 Numerical Illustration log10(E− 1 Numerical Experiments Experiments 1 Numerical 1 TI (2)`2 4 0 ||y x 1 ⇥x||368s PRx 2 minix(x)Y ⇥ K x= + `wavelets x Bk 2 0 : 283s; tPR: 298s; tCP:: 283s; t : 298s; t (2) Deconvolution min 2 Y ⇥ K tmin −1 EFB x 102 Deconvolution +GCP: 1` 4 −1 tEFB 2 + 10 40 20 368s 30 1 2 2 40k=1 ` x 1,2 1 k=1 20 30 3 iteration 3 # i EFB iteration # EFB log10(E−Emin) log10(E−Emin) PR PR 2 = convolution 2 = inpainting+convolution l1/l2 l1/l2 : 1.30e−03; CPλ2 : 1.30e−03; CP 2 λ tPR: 173s; tCP 190s noise: 0.025; convol.::it. #50; SNR: 22.49dB #50; SNR: 22.49dB tEFB: 161s; tPR: 173s; tCP N: 256 tEFB: 161s; noise: 0.025; :convol.: 2 190s 1 Numerical Experiments 2 1 EFB it. N: 256 EFB (4) Bk Y ⇥P K + 0 0 log10(E−Emin) 3 3 1 PR 2 PR 16 onv. + Inpaint. minx 2 CP 2 30 2 x CP `140`2 k=1 x 1,2 10 20 10 40 20 30 1 iteration # 1 iteration # 0 0 λ4 : 1.00e−03; λ4 : 1.00e−03; l1/l2 l1/l2 tEFB: 283s; tPR: 298s; tCP: 368s −1 noise: 0.025; degrad.: 0.4; 0.025; degrad.: 0.4; convol.: 2 noise: convol.: 2 −1 it. #50; SNR: 21.80dB #50; SNR: 21.80dB it. 10 20 iteration # 30 EFB 40 10 20 iteration # 30 40 x0 3 PRmin 2 CP λ2 : 1.30e−03; λ2 : 1.30e−03; l1/l2 l1/l2 1 noise: 0.025; convol.: 2 noise: 0.025; it. #50; SNR: 22.49dB convol.: 2 it. #50; SNR: 22.49dB10 0 log10 10 20 (E(x( ) ) # iteration 30 E(x )) y = x0 + w 40 x 4
127. 127. Overview• Optimal Transport and Imaging• Convex Analysis and Proximal Calculus• Forward Backward• Douglas Rachford and ADMM• Generalized Forward-Backward• Primal-Dual Schemes
128. 128. Primal-dual FormulationFenchel-Rockafellar duality: A:H⇥ L linearmin G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u) 2x2H x u2L
129. 129. Primal-dual FormulationFenchel-Rockafellar duality: A:H⇥ L linearmin G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u) 2x2H x u2LStrong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 ))(min \$ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui 2 u x = max G⇤ (u) 2 G⇤ ( 1 A⇤ u) u
130. 130. Primal-dual FormulationFenchel-Rockafellar duality: A:H⇥ L linearmin G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u) 2x2H x u2LStrong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 ))(min \$ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui 2 u x = max G⇤ (u) 2 G⇤ ( 1 A⇤ u) uRecovering x? from some u? : x? = argmin G1 (x? ) + hx? , A⇤ u? i x
131. 131. Primal-dual FormulationFenchel-Rockafellar duality: A:H⇥ L linearmin G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u) 2x2H x u2LStrong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 ))(min \$ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui 2 u x = max G⇤ (u) 2 G⇤ ( 1 A⇤ u) uRecovering x? from some u? : x? = argmin G1 (x? ) + hx? , A⇤ u? i x () A⇤ u? 2 @G1 (x? ) () x? 2 (@G1 ) 1 ( A⇤ u? ) = @G⇤ ( A⇤ s? ) 1
132. 132. Forward-Backward on the DualIf G1 is strongly convex: r2 G1 > cId c G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2 2
133. 133. Forward-Backward on the DualIf G1 is strongly convex: r2 G1 > cId c G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2 2 x? uniquely deﬁned. x? = rG? ( A⇤ u? ) 1 G? is of class C 1 . 1
134. 134. Forward-Backward on the DualIf G1 is strongly convex: r2 G1 > cId c G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2 2 x? uniquely deﬁned. x? = rG? ( A⇤ u? ) 1 G? is of class C 1 . 1 FB on the dual: min G1 (x) + G2 A(x) x2H = min G? ( A⇤ u) + G? (u) 1 2 u2L Smooth Simple ⇣ ⌘ u(`+1) = Prox⌧ G? u(`) + ⌧ A⇤ rG? ( A⇤ u(`) ) 2 1
135. 135. Example: TV Denoising 1 min ||f y||2 + ||⇥f ||1 min ||y + div(u)||2 f RN 2 ||u|| ||u||1 = ||ui || ||u|| = max ||ui || i iDual solution u Primal solution f = y + div(u ) [Chambolle 2004]
136. 136. Example: TV Denoising 1 min ||f y||2 + ||⇥f ||1 min ||y + div(u)||2 f RN 2 ||u|| ||u||1 = ||ui || ||u|| = max ||ui || i iDual solution u Primal solution f = y + div(u )FB (aka projected gradient descent): [Chambolle 2004] u( +1) = Proj||·|| u( ) + (y + div(u( ) )) ui v = Proj||·|| (u) vi = max(||ui ||/ , 1) 2 1 Convergence if < = ||div ⇥|| 4
137. 137. Primal-Dual Algorithm min G1 (x) + G2 A(x) x H() min max G1 (x) G⇤ (z) + hA(x), zi 2 x z
138. 138. Primal-Dual Algorithm min G1 (x) + G2 A(x) x H() min max G1 (x) G⇤ (z) + hA(x), zi 2 x z z (`+1) = Prox G⇤ 2 (z (`) + A(˜(`) ) x x(⇥+1) = Prox G1 (x(⇥) A (z (⇥) )) ˜ x( +1) = x( +1) + (x( +1) x( ) ) = 0: Arrow-Hurwicz algorithm. = 1: convergence speed on duality gap.
139. 139. Primal-Dual Algorithm min G1 (x) + G2 A(x) x H() min max G1 (x) G⇤ (z) + hA(x), zi 2 x z z (`+1) = Prox G⇤ 2 (z (`) + A(˜(`) ) x x(⇥+1) = Prox G1 (x(⇥) A (z (⇥) )) ˜ x( +1) = x( +1) + (x( +1) x( ) ) = 0: Arrow-Hurwicz algorithm. = 1: convergence speed on duality gap. Theorem: [Chambolle-Pock 2011] If 0 1 and ⇥⇤ ||A||2 < 1 then x( ) x minimizer of G1 + G2 A.
140. 140. Example: Optimal TransportStaggered grid formulation : min 1 2 J(I(x)) + ◆C (x) x2RGst ⇥RGst 1 2 Gst Gst 1 2 I = (I , I ) : R ⇥R ! RG c s s I t t Staggered grid Centered grid Gc 1 2 Gst Gst
141. 141. ConclusionInverse problems in imaging: Large scale, N 106 . Non-smooth (sparsity, TV, . . . ) (Sometimes) convex. Highly structured (separability, p norms, . . . ).
142. 142. ConclusionInverse problems in imaging: Large scale, N 106 . Towards More Complex Penalization Non-smooth (sparsity, TV, . . . ) (Sometimes) convex. b B1 i b xi 2 ⇥ x⇥⇥1 = i ⇥xi ⇥ b B 2 i p xi + Highly structured (separability, b norms, . . . ). b B2 i b xi2Proximal splitting: Unravel the structure of problems. Parallelizable. Decomposition G = k Gk
143. 143. ConclusionInverse problems in imaging: Large scale, N 106 . Towards More Complex Penalization Non-smooth (sparsity, TV, . . . ) (Sometimes) convex. b B1 i b xi 2 ⇥ x⇥⇥1 = i ⇥xi ⇥ b B 2 i p xi + Highly structured (separability, b norms, . . . ). b B2 i b xi2Proximal splitting: Unravel the structure of problems. Parallelizable.Open problems: Decomposition G = k Gk Less structured problems without smoothness. Non-convex optimization.