The document discusses proximal splitting methods for solving optimization problems involving the minimization of a sum of functions. It first introduces subdifferential calculus and proximal operators. It then describes several proximal splitting algorithms, including forward-backward splitting, Douglas-Rachford splitting, primal-dual splitting, and generalized forward-backward splitting. These algorithms allow solving composite optimization problems by exploiting the separable structure and properties like smoothness or proximity of the individual terms. The document provides examples of applying such methods to inverse problems like sparse recovery.
4. Convex Optimization
Setting: G : H
R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
Problem:
Class of functions:
Convex: G(tx + (1
min G(x)
x H
y
x
t)y)
tG(x) + (1
t)G(y)
t
[0, 1]
5. Convex Optimization
Setting: G : H
R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
Problem:
Class of functions:
Convex: G(tx + (1
min G(x)
x H
y
x
t)y)
Lower semi-continuous:
tG(x) + (1
t)G(y)
lim inf G(x)
G(x0 )
x
x0
Proper: {x ⇥ H G(x) ⇤= + } = ⌅
⇤
t
[0, 1]
6. Convex Optimization
Setting: G : H
R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
min G(x)
Problem:
Class of functions:
Convex: G(tx + (1
x H
t)y)
Lower semi-continuous:
tG(x) + (1
t)G(y)
lim inf G(x)
G(x0 )
x
x0
Proper: {x ⇥ H G(x) ⇤= + } = ⌅
⇤
Indicator:
y
x
C (x)
=
(C closed and convex)
0 if x ⇥ C,
+
otherwise.
t
[0, 1]
8. Example:
Inverse problem:
f0
Model: f0 =
x RQ
coe cients
K
1
Regularization
measurements
Kf0
y = Kf0 + w
K : RN
x0 sparse in dictionary
f=
x R
image
N
= K ⇥ ⇥ RP
K
Q
RP ,
RN
Q
,Q
P
N
N.
y = Kf RP
observations
9. Example:
Inverse problem:
f0
Model: f0 =
x RQ
coe cients
K
1
Regularization
measurements
Kf0
y = Kf0 + w
K : RN
x0 sparse in dictionary
f=
x R
image
N
= K ⇥ ⇥ RP
K
Q
Sparse recovery: f = x where x solves
1
min
||y
x||2 + ||x||1
x RN 2
Fidelity Regularization
RP ,
RN
Q
,Q
P
N
N.
y = Kf RP
observations
14. Sub-differential
Sub-di erential:
G(x) = {u ⇥ H ⇤ z, G(z)
G(x) + ⌅u, z
x⇧}
G(x) = |x|
Smooth functions:
If F is C 1 , F (x) = { F (x)}
G(0) = [ 1, 1]
First-order conditions:
x
argmin G(x)
x H
0
G(x )
15. Sub-differential
Sub-di erential:
G(x) = {u ⇥ H ⇤ z, G(z)
G(x) + ⌅u, z
x⇧}
G(x) = |x|
Smooth functions:
If F is C 1 , F (x) = { F (x)}
G(0) = [ 1, 1]
First-order conditions:
x
argmin G(x)
0
x H
Monotone operator:
(u, v)
U (x)
G(x )
U (x)
x
U (x) = G(x)
U (y),
y
x, v
u
0
16. Example:
1
Regularization
1
x ⇥ argmin G(x) = ||y
2
x RQ
⇥G(x) =
|| · ||1 (x)i =
( x
y) + ⇥|| · ||1 (x)
x||2 + ||x||1
sign(xi ) if xi ⇥= 0,
[ 1, 1] if xi = 0.
17. Example:
1
Regularization
1
x ⇥ argmin G(x) = ||y
2
x RQ
⇥G(x) =
|| · ||1 (x)i =
( x
x||2 + ||x||1
y) + ⇥|| · ||1 (x)
sign(xi ) if xi ⇥= 0,
[ 1, 1] if xi = 0.
Support of the solution:
I = {i ⇥ {0, . . . , N 1} xi ⇤= 0}
xi
i
18. Example:
1
Regularization
1
x ⇥ argmin G(x) = ||y
2
x RQ
⇥G(x) =
|| · ||1 (x)i =
( x
x||2 + ||x||1
y) + ⇥|| · ||1 (x)
sign(xi ) if xi ⇥= 0,
[ 1, 1] if xi = 0.
xi
i
Support of the solution:
I = {i ⇥ {0, . . . , N 1} xi ⇤= 0}
First-order conditions:
s
RN ,
( x
i,
y) + s = 0
sI = sign(xI ),
||sI c ||
1.
y
x
i
19. Example: Total Variation Denoising
Important: the optimization variable is f .
1
f ⇥ argmin ||y f ||2 + J(f )
f RN 2
Finite di erence gradient:
Discrete TV norm:
:R
J(f ) =
i
= 0 (noisy)
N
R
N 2
||( f )i ||
( f )i
R2
20. Example: Total Variation Denoising
1
f ⇥ argmin ||y
f RN 2
J(f ) = G( f )
f ||2 + J(f )
G(u) =
i
Composition by linear maps:
J(f ) =
⇥G(u)i =
(J
||ui ||
A) = A
( J) A
div ( G( f ))
ui
||ui ||
if ui ⇥= 0,
R2 || || 1
if ui = 0.
21. Example: Total Variation Denoising
1
f ⇥ argmin ||y
f RN 2
J(f ) = G( f )
f ||2 + J(f )
G(u) =
i
(J
Composition by linear maps:
J(f ) =
⇥G(u)i =
A) = A
( J) A
div ( G( f ))
if ui ⇥= 0,
R2 || || 1
ui
||ui ||
First-order conditions:
⇥i
⇥i
||ui ||
I, vi =
I c , ||vi ||
v
fi
|| fi || ,
1
RN
if ui = 0.
2
, f = y + div(v)
I = {i (⇥f )i = 0}
28. Proximal Calculus
Separability:
G(x) = G1 (x1 ) + . . . + Gn (xn )
ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
1
Quadratic functionals:
G(x) = || x y||2
2
Prox G = (Id +
) 1
=
(Id +
)
1
Composition by tight frame: A A = Id
ProxG
A (x)
=A
ProxG A + Id
A
A
29. Proximal Calculus
G(x) = G1 (x1 ) + . . . + Gn (xn )
Separability:
ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
1
Quadratic functionals:
G(x) = || x y||2
2
Prox G = (Id +
) 1
=
(Id +
)
1
Composition by tight frame: A A = Id
ProxG
A (x)
Indicators:
Prox
G (x)
=A
G(x) =
ProxG A + Id
z C
A
x
C (x)
= ProjC (x)
= argmin ||x
A
C
z||
ProjC (x)
30. Prox and Subdifferential
Resolvant of G:
z = Prox
x
G (x)
0
(Id + ⇥G)(z)
z
x + ⇥G(z)
z = (Id + ⇥G)
1
(x)
Inverse of a set-valued mapping:
where x
Prox
G
U (y)
= (Id + ⇥G)
y
1
U
1
(x)
is a single-valued mapping
31. Prox and Subdifferential
Resolvant of G:
z = Prox
x
G (x)
0
(Id + ⇥G)(z)
z
x + ⇥G(z)
z = (Id + ⇥G)
1
(x)
Inverse of a set-valued mapping:
where x
Prox
G
Fix point:
U (y)
= (Id + ⇥G)
x
y
1
U
(x)
is a single-valued mapping
argmin G(x)
x
0
1
G(x )
x⇥ = (Id + ⇥G)
x
1
(Id + ⇥G)(x )
(x⇥ ) = Prox
(x⇥ )
G
32. Gradient and Proximal Descents
x( +1) = x( )
G(x( ) )
Gradient descent:
G is C 1 and G is L-Lipschitz
Theorem:
If 0 <
< 2/L, x(
)
[explicit]
x a solution.
33. Gradient and Proximal Descents
x( +1) = x( )
G(x( ) )
Gradient descent:
G is C 1 and G is L-Lipschitz
Theorem:
< 2/L, x(
If 0 <
Sub-gradient descent: x(
Theorem:
If
+1)
= x(
1/⇥, x(
Problem: slow.
)
)
)
[explicit]
x a solution.
v( ) ,
v(
)
x a solution.
G(x( ) )
34. Gradient and Proximal Descents
x( +1) = x( )
G(x( ) )
Gradient descent:
G is C 1 and G is L-Lipschitz
Theorem:
< 2/L, x(
If 0 <
Sub-gradient descent: x(
Theorem:
+1)
= x(
1/⇥, x(
If
)
)
[explicit]
x a solution.
v( ) ,
v(
)
G(x( ) )
x a solution.
)
Problem: slow.
Proximal-point algorithm: x(⇥+1) = Prox
Theorem:
c > 0, x(
If
Prox
G
)
(x(⇥) ) [implicit]
G
x a solution.
hard to compute.
38. Proximal Splitting Methods
Solve
min E(x)
x H
is not available.
Problem:
Prox
Splitting:
E(x) = F (x) +
E
Smooth
Gi (x)
i
Iterative algorithms using:
Forward-Backward:
solves
Simple
F (x)
Prox Gi (x)
F + G
Douglas-Rachford:
Gi
Primal-Dual:
Gi A
Generalized FB:
F+
Gi
39. Smooth + Simple Splitting
Inverse problem:
f0
K
Model: f0 =
measurements
Kf0
y = Kf0 + w
K : RN
x0 sparse in dictionary
Sparse recovery: f =
RP ,
P
.
x where x solves
min F (x) + G(x)
x RN
Smooth Simple
1
Data fidelity:
F (x) = ||y
x||2
2
Regularization: G(x) = ||x||1 =
|xi |
i
=K ⇥
N
42. Forward-Backward
Fix point equation:
x
argmin F (x) + G(x)
x
(x
0
F (x ) + G(x )
F (x ))
x + ⇥G(x )
x⇥ = Prox
Forward-backward:
(x⇥
G
x(⇥+1) = Prox
Projected gradient descent:
G=
G
C
x(⇥)
F (x⇥ ))
F (x(⇥) )
43. Forward-Backward
Fix point equation:
x
argmin F (x) + G(x)
x
F (x ) + G(x )
F (x ))
(x
0
x + ⇥G(x )
x⇥ = Prox
Forward-backward:
x(⇥+1) = Prox
G=
Projected gradient descent:
Theorem:
If
< 2/L,
(x⇥
G
Let
x(
)
G
x(⇥)
F (x⇥ ))
F (x(⇥) )
C
F be L-Lipschitz.
x
a solution of ( )
44. Example: L1 Regularization
1
min || x
x 2
y||2 + ||x||1
1
F (x) = || x
2
min F (x) + G(x)
x
y||2
F (x) =
( x
G(x) = ||x||1
Prox
G (x)i
Forward-backward
L = ||
y)
= max 0, 1
⇥
|xi |
||
xi
Iterative soft thresholding
45. Convergence Speed
min E(x) = F (x) + G(x)
x
F is L-Lipschitz.
G is simple.
Theorem:
If L > 0, FB iterates x(
E(x( ) )
E(x )
C degrades with L
C/
0.
)
satisfies
46. Multi-steps Accelerations
t(0) = 1
Beck-Teboule accelerated FB:
✓
◆
1
(`+1)
(`)
x
= Prox1/L y
rF (y (`) )
L
1+
1 + 4(t( ) )2
t( +1) =
2()
t
1 (
( +1)
( +1)
y
=x
+ ( +1) (x
t
+1)
x( ) )
(see also Nesterov method)
Theorem:
If L > 0,
( )
E(x
)
E(x )
C
Complexity theory: optimal in a worse-case sense.
48. Douglas Rachford Scheme
min G1 (x) + G2 (x)
x
Simple
( )
Simple
Douglas-Rachford iterations:
z (⇥+1) = 1
x(`+1)
2
z (⇥) +
2
= Prox G1 (z (`+1) )
Reflexive prox:
RProx
G (x)
RProx
= 2Prox
G2
G (x)
RProx
x
(z (⇥) )
G1
49. Douglas Rachford Scheme
min G1 (x) + G2 (x)
x
Simple
( )
Simple
Douglas-Rachford iterations:
z (⇥+1) = 1
x(`+1)
z (⇥) +
2
2
= Prox G1 (z (`+1) )
Reflexive prox:
RProx
Theorem:
x(
G (x)
= 2Prox
If 0 <
)
RProx
x
G2
G (x)
RProx
x
< 2 and ⇥ > 0,
a solution of ( )
(z (⇥) )
G1
50. DR Fix Point Equation
min G1 (x) + G2 (x)
0
x
z, z
x
x = Prox
(G1 + G2 )(x)
⇥( G1 )(x) and x
G1 (z)
and
(2x
z)
⇥( G2 )(x)
z
x
⇥( G2 )(x)
51. DR Fix Point Equation
min G1 (x) + G2 (x)
0
x
z, z
⇥( G1 )(x) and x
x
x = Prox
(G1 + G2 )(x)
G1 (z)
x = Prox
and
(2x
⇥( G2 )(x)
z
G2
⇥( G2 )(x)
x
z) = Prox
G2 (2x
z)
RProx
G1 (z)
z = 2Prox
G2
RProx
G1 (y)
(2x
z = 2Prox
G2
RProx
G1 (z)
RProx
G1 (z)
RProx
G1 (z)
z = RProx
z= 1
2
G2
RProx
z+
2
z)
G1 (z)
RProx
G2
52. Example: Constrainted L1
min ||x||1
min G1 (x) + G2 (x)
x=y
C = {x x = y}
G1 (x) = iC (x),
Prox
x
G1 (x) = ProjC (x) = x +
G2 (x) = ||x||1
Prox
e⇥cient if
G2 (x)
=
⇥
(
⇥
)
max 0, 1
easy to invert.
1
(y
|xi |
x)
xi
i
53. Example: Constrainted L1
min ||x||1
min G1 (x) + G2 (x)
x=y
C = {x x = y}
G1 (x) = iC (x),
Prox
x
G1 (x) = ProjC (x) = x +
G2 (x) = ||x||1
Prox
e⇥cient if
G2 (x)
=
⇥
(
easy to invert.
Example: compressed sensing
y = x0
400
Gaussian matrix
||x0 ||0 = 17
)
1
max 0, 1
1
R100
⇥
(y
x)
xi
|xi |
i
log10 (||x( ) ||1
||x ||1 )
0
−1
−2
−3
−4
−5
= 0.01
=1
= 10
50
100
150
200
250
54. More than 2 Functionals
min G1 (x) + . . . + Gk (x)
x
min
(x1 ,...,xk )
each Fi is simple
G(x1 , . . . , xk ) + ◆C (x1 , . . . , xk )
G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )
C = (x1 , . . . , xk )
Hk x1 = . . . = xk
55. More than 2 Functionals
each Fi is simple
min G1 (x) + . . . + Gk (x)
x
min
(x1 ,...,xk )
G(x1 , . . . , xk ) + ◆C (x1 , . . . , xk )
G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )
C = (x1 , . . . , xk )
G and
Prox
Prox
C
Hk x1 = . . . = xk
are simple:
G (x1 , . . . , xk )
= (Prox
Gi (xi ))i
⇥C (x1 , . . . , xk )
= (˜, . . . , x)
x
˜
1
where x =
˜
k
xi
i
56. Auxiliary Variables: DR
Linear map A : E
min G1 (x) + G2 A(x)
x
min G(z) +
z⇥H E
G1 , G2 simple.
C (z)
G(x, y) = G1 (x) + G2 (y)
C = {(x, y) ⇥ H
E Ax = y}
H.
57. Auxiliary Variables: DR
Linear map A : E
min G1 (x) + G2 A(x)
x
min G(z) +
z⇥H E
G1 , G2 simple.
C (z)
G(x, y) = G1 (x) + G2 (y)
C = {(x, y) ⇥ H
Prox
G (x, y)
= (Prox
G1 (x), Prox G2 (y))
˜
Prox C (x, y) = (x + A y , y
where
E Ax = y}
x x
y ) = (˜, A˜)
˜
y = (Id + AA )
˜
1
(Ax
x = (Id + A A)
˜
1
(A y + x)
y)
e cient if Id + AA or Id + A A easy to invert.
H.
58. Example: TV Regularization
1
min ||Kf y||2 + ||⇥f ||1
f
2
min G1 (f ) + G2
(f )
||u||1 =
i
||ui ||
x
G1 (u) = ||u||1
1
G2 (f ) = ||Kf
2
C = (f, u) ⇥ RN
Prox
G1 (u)i
y||2
RN
Prox
2
= max 0, 1
G2
u = ⇤f
˜ ˜
Prox C (f, u) = (f , f )
||ui ||
= (Id + K K)
ui
1
K
59. Example: TV Regularization
1
min ||Kf y||2 + ||⇥f ||1
f
2
min G1 (f ) + G2
(f )
||u||1 =
i
||ui ||
x
G1 (u) = ||u||1
1
G2 (f ) = ||Kf
2
C = (f, u) ⇥ RN
Prox
G1 (u)i
y||2
RN
Prox
2
= max 0, 1
G2
||ui ||
= (Id + K K)
ui
1
K
u = ⇤f
˜ ˜
Prox C (f, u) = (f , f )
Compute the solution of:
(Id +
˜
)f =
div(u) + f
O(N log(N )) operations using FFT.
62. GFB Splitting
n
min F (x) +
x RN
(⇥+1)
(⇥)
zi
= zi +
n
1
( +1)
x
=
n
Proxn
i=1
( )
i=1
Smooth
i = 1, . . . , n,
Gi (x)
( +1)
zi
G
Simple
(2x
(⇥)
(⇥)
zi
F (x(⇥) )) x(⇥)
63. GFB Splitting
n
min F (x) +
x RN
(⇥+1)
(⇥)
zi
= zi +
n
1
( +1)
=
x
n
Simple
Proxn
G
(2x
(⇥)
(⇥)
zi
F (x(⇥) )) x(⇥)
( +1)
zi
i=1
Theorem:
If
( )
i=1
Smooth
i = 1, . . . , n,
Gi (x)
< 2/L,
Let
x(
)
F be L-Lipschitz.
x
a solution of ( )
64. GFB Splitting
n
min F (x) +
x RN
(⇥+1)
(⇥)
zi
= zi +
n
1
( +1)
=
x
n
Proxn
Simple
G
(2x
(⇥)
(⇥)
zi
F (x(⇥) )) x(⇥)
( +1)
zi
i=1
Theorem:
If
( )
i=1
Smooth
i = 1, . . . , n,
Gi (x)
< 2/L,
n=1
F =0
Let
x(
)
F be L-Lipschitz.
x
a solution of ( )
Forward-backward.
Douglas-Rachford.
65. GFB Fix Point
x
argmin F (x) +
x RN
yi
i
Gi (x)
Gi (x ),
0
F (x ) +
F (x ) +
i yi
=0
i
Gi (x )
66. GFB Fix Point
x
argmin F (x) +
x RN
i
Gi (x)
Gi (x ),
yi
(zi )n ,
i=1
1
i, x
n
x =
i zi
1
n
0
F (x ) +
zi
F (x ) +
i yi
Gi (x )
=0
F (x )
(use zi = x
i
⇥Gi (x )
F (x )
N yi )
67. GFB Fix Point
x
argmin F (x) +
x RN
i
Gi (x)
Gi (x ),
yi
(zi )n ,
i=1
i zi
1
n
(2x
zi
x⇥ = Proxn
F (x ) +
F (x ) +
1
i, x
n
x =
0
i yi
(use zi = x
F (x ))
(2x⇥
Gi
zi = zi + Proxn
G
⇥Gi (x )
F (x )
N yi )
n ⇥Gi (x )
x
F (x⇥ ))
zi
(2x⇥
Gi (x )
=0
F (x )
zi
i
zi
F (x⇥ ))
x⇥
68. GFB Fix Point
x
argmin F (x) +
x RN
i
Gi (x)
Gi (x ),
yi
(zi )n ,
i=1
i zi
1
n
(2x
zi
x⇥ = Proxn
i yi
(use zi = x
F (x ))
(2x⇥
Gi
G
Gi (x )
⇥Gi (x )
F (x )
N yi )
n ⇥Gi (x )
x
F (x⇥ ))
zi
(2x⇥
i
=0
F (x )
zi
zi = zi + Proxn
+
F (x ) +
F (x ) +
1
i, x
n
x =
0
zi
F (x⇥ ))
x⇥
Fix point equation on (x , z1 , . . . , zn ).
69. Block Regularization
1
2
block sparsity: G(x) =
b B
iments
2
+
(2)
` 1 `2
4
k=1
N: 256
x
x2
m
m b
Towards More Complex Penalization
Bk
1,2
⇥ x⇥⇥1 =
i ⇥xi ⇥
b
Image f =
||x[b] ||2 =
||x[b] ||,
B
x Coe cients x.
b B
i
xi2
b
b B1
b B2
+
i b xi
i b xi
70. Block Regularization
1
2
block sparsity: G(x) =
b B
||x[b] ||,
||x[b] ||2 =
x2
m
m b
... B
Non-overlapping decomposition: B = B
iments Towards More Complex Penalization
Towards More Complex Penalization
Towards More Complex Penalization
2
1
n
(2)
G(x) =4 x iBk
(x)
+ ` ` k=1 G 1,2
1
2
N: 256
Gi (x) =
b Bi
i=1
⇥=
⇥ x⇥x⇥x⇥⇥1 =i ⇥x⇥x⇥xi ⇥
⇥ ⇥1 ⇥1 = i i ⇥i i ⇥
b
Image f =
||x[b] ||,
bb B B i
Bb
xii2bi2xi2
bbx
i
B
x Coe cients x.
n
Blocks B1
22
b b 1b1 B1 i b xiixb xi
BB
i b i
++ +
b b 2b2 B2 i
BB
B1
xi2 b2xi
b b xi
i
B2
71. Block Regularization
1
2
block sparsity: G(x) =
b B
||x[b] ||,
||x[b] ||2 =
x2
m
m b
... B
Non-overlapping decomposition: B = B
iments Towards More Complex Penalization
Towards More Complex Penalization
Towards More Complex Penalization
2
1
n
(2)
G(x) =4 x iBk
(x)
+ ` ` k=1 G 1,2
1
2
Gi (x) =
b Bi
i=1
||x[b] ||,
Each Gi is simple:
⇥ ⇥1 = i ⇥i i
⇥ x⇥x⇥x⇥⇥1 =i ⇥xG ⇥xi ⇥ m = b B B i b xii2bi2xi2
=
Bb
⇤ m ⇥ b ⇥ Bi , ⇥ ⇥1Prox i ⇥xi ⇥(x) b max i0, 1
bx
N: 256
b
Image f =
B
x Coe cients x.
n
Blocks B1
22
b b 1b1 B1 i b xiixb xi
BB
i b i
||x[b]b||B
b B b
++m
x +
2 2 B2
B1
i
xi2 b2xi
b b xi
i
B2
72. 10
10
x+1,2`
1
`2
k=1
Numerical
Numerical Experiments Experiments
1
1
1
0
log10(E−Emin)
log10(E−Emin)
tmin
: 283s; t : 298s; t :: 283s; t : 298s; t (2)
t CP 2 +
368s
||y x 1 ⇥x||368s PRx 2 minix(x)Y ⇥ K
PR
Deconvolution +GCP: 1` 4
−1 EFB
−1 EFB
Deconvolution min 2 Y ⇥ K
`
x 102
10 40
20
30 1 2 2 40k=1
20
30
EFB iteration #
i
EFB
iteration 3
#
3
0
log10(E−Emin)
x
k=1
Numerical Illustration
log (E−
log (E−E
Deconv. + Inpaint. 2min+CP Y ⇥ P K x CP Y + P 1 K2
x
Deconv. x 2Inpaint. min 2 ⇥ ` `
2
PR
CP
= convolution
2
x
PR
CP 2
λ
Bk 2
TI (2)`2 4
x= + `wavelets x
k=1
1,2
1
λ2 : 1.30e−03;
: 1.30e−03;
= inpainting+convolution l1/l2
l1/l2
tEFB: 161s; tPR: 173s; tCP N: 256
190s
t
: 161s; noise: 0.025; :convol.: 2
t : 173s; t
190s noise: 0.025; convol.::it. #50; SNR: 22.49dB #50; SNR: 22.49dB
it.
2
N: 256
EFB
PR
CP
3
2
Numerical Experiments
1
0
onv. + Inpaint. minx
2
10
20
1
EFB
0
3
PR
1
CP
2 30
2
1
iteration #
1
0
0
Y ⇥P K
10
40
20
x
2
+
30
iteration #
EFB
PR
(4)
CP
`140`2
16
k=1
x
λ4 : 1.00e−03;
l1/l2
Bk
1,2
λ4 : 1.00e−03;
l1/l2
10
min
tEFB: 283s; tPR: 298s; tCP: 368s
it. #50; SNR: 21.80dB #50; SNR: 21.80dB
noise: 0.025; degrad.: 0.4; 0.025; degrad.: 0.4; convol.: 2
it.
noise: convol.: 2
−1
−1
10
20
30
EFB
PR
CP
iteration #
3
2
1
40
10
20
30
x0
40
iteration #
λ2
: 1.30e−03;
l1/l2
noise: 0.025; it. #50; SNR: 22.49dB
convol.: 2
noise: 0.025; convol.: 2
0
10
log10
20
iteration
(E(x( ) ) #
y = x0 + w
E(x ))
30
40
4
x
λ2 : 1.30e−03;
l1/l2
it. #50; SNR: 22.49dB
76. Legendre-Fenchel Duality
Legendre-Fenchel transform:
G (u) =
sup
u, x
G(x)
x dom(G)
Example: quadratic functional
1
G(x) = Ax, x + x, b
2
1
G (u) = u b, A 1 (u b)
2
G(x)
G (u)
Moreau’s identity:
Prox
G
(x) = x
G simple
eu
lop
S
ProxG/ (x/ )
G simple
x
77. Indicator and Homogeneous
Positively 1-homogeneous functional:
Example: norm
Duality:
G (x) =
G(x) = ||x||
G (·) 1 (x)
G( x) = |x|G(x)
G (y) = min
G(x) 1
x, y
78. Indicator and Homogeneous
Positively 1-homogeneous functional:
Example: norm
Duality:
G (x) =
G(x) = ||x||
G (·) 1 (x)
G( x) = |x|G(x)
G (y) = min
G(x) 1
p
norms:
G(x) = ||x||p
G (x) = ||x||q
1 1
+ =1
p q
1
x, y
p, q
+
79. Indicator and Homogeneous
G( x) = |x|G(x)
Positively 1-homogeneous functional:
G(x) = ||x||
Example: norm
Duality:
G (x) =
G (·) 1 (x)
G (y) = min
G(x) 1
p
norms:
G(x) = ||x||p
G (x) = ||x||q
1 1
+ =1
p q
Example: Proximal operator of
Prox
||·||
Proj||·||1
= Id
norm
Proj||·||1
(x)i = max 0, 1
|xi |
for a well-chosen ⇥ = ⇥ (x, )
xi
1
x, y
p, q
+
82. Primal-dual Formulation
A:H⇥
Fenchel-Rockafellar duality:
L
linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui
x
x2H
u2L
G⇤ (u)
2
Strong duality:
0 2 ri(dom(G2 ))
(min $ max)
= max
G⇤ (u) + min G1 (x) + hx, A⇤ ui
2
= max
G⇤ (u)
2
u
u
A ri(dom(G1 ))
x
G⇤ (
1
Recovering x? from some u? :
x? = argmin G1 (x? ) + hx? , A⇤ u? i
x
A⇤ u)
83. Primal-dual Formulation
A:H⇥
Fenchel-Rockafellar duality:
L
linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui
x
x2H
u2L
G⇤ (u)
2
Strong duality:
0 2 ri(dom(G2 ))
(min $ max)
= max
G⇤ (u) + min G1 (x) + hx, A⇤ ui
2
= max
G⇤ (u)
2
u
u
A ri(dom(G1 ))
x
G⇤ (
1
A⇤ u)
Recovering x? from some u? :
x? = argmin G1 (x? ) + hx? , A⇤ u? i
x
()
A⇤ u? 2 @G1 (x? )
() x? 2 (@G1 )
1
( A⇤ u? ) = @G⇤ ( A⇤ u? )
1
84. Forward-Backward on the Dual
If G1 is strongly convex:
G1 (tx + (1
r2 G1 > cId
t)y) 6 tG1 (x) + (1
t)G1 (y)
c
t(1
2
t)||x
y||2
85. Forward-Backward on the Dual
If G1 is strongly convex:
G1 (tx + (1
r2 G1 > cId
t)y) 6 tG1 (x) + (1
x? uniquely defined.
G? is of class C 1 .
1
t)G1 (y)
c
t(1
2
t)||x
x? = rG? ( A⇤ u? )
1
y||2
86. Forward-Backward on the Dual
r2 G1 > cId
If G1 is strongly convex:
G1 (tx + (1
t)y) 6 tG1 (x) + (1
x? uniquely defined.
G? is of class C 1 .
1
FB on the dual:
t)G1 (y)
c
t(1
2
t)||x
x? = rG? ( A⇤ u? )
1
min G1 (x) + G2 A(x)
x2H
=
min G? ( A⇤ u) + G? (u)
1
2
u2L
Simple
Smooth
⇣
u(`+1) = Prox⌧ G? u(`) + ⌧ A⇤ rG? ( A⇤ u(`) )
1
2
⌘
y||2
87. Example: TV Denoising
1
min ||f
f RN 2
y||2 + ||⇥f ||1
||u||1 =
Dual solution u
i
||ui ||
min ||y + div(u)||2
||u||
||u||
= max ||ui ||
i
Primal solution f = y + div(u )
[Chambolle 2004]
88. Example: TV Denoising
1
min ||f
f RN 2
min ||y + div(u)||2
y||2 + ||⇥f ||1
||u||1 =
Dual solution u
i
||u||
||u||
||ui ||
+1)
= Proj||·||
i
Primal solution f = y + div(u )
FB (aka projected gradient descent):
u(
= max ||ui ||
u( ) +
[Chambolle 2004]
(y + div(u( ) ))
ui
v = Proj||·||
(u)
vi =
max(||ui ||/ , 1)
2
1
<
=
Convergence if
||div ⇥||
4
90. Primal-Dual Algorithm
min G1 (x) + G2 A(x)
x H
G⇤ (z) + hA(x), zi
2
() min max G1 (x)
x
z
z (`+1) = Prox
G⇤
2
x(⇥+1) = Prox
(x(⇥)
G1
x(
˜
+ (x(
+1)
= x(
+1)
(z (`) + A(˜(`) )
x
A (z (⇥) ))
+1)
x( ) )
= 0: Arrow-Hurwicz algorithm.
= 1: convergence speed on duality gap.
91. Primal-Dual Algorithm
min G1 (x) + G2 A(x)
x H
G⇤ (z) + hA(x), zi
2
() min max G1 (x)
x
z
z (`+1) = Prox
G⇤
2
x(⇥+1) = Prox
(x(⇥)
G1
x(
˜
+ (x(
+1)
= x(
+1)
(z (`) + A(˜(`) )
x
A (z (⇥) ))
+1)
x( ) )
= 0: Arrow-Hurwicz algorithm.
= 1: convergence speed on duality gap.
Theorem: [Chambolle-Pock 2011]
If 0
x(
)
1 and ⇥⇤ ||A||2 < 1 then
x minimizer of G1 + G2 A.
92. Conclusion
Inverse problems in imaging:
Large scale, N 106 .
Non-smooth (sparsity, TV, . . . )
(Sometimes) convex.
Highly structured (separability,
p
norms, . . . ).
93. Conclusion
Inverse problems in imaging:
Large scale, N 106 .
Towards More Complex Penalization
Non-smooth (sparsity, TV, . . . )
(Sometimes) convex.
⇥ x⇥⇥1 =
i ⇥xi ⇥
b B
Highly structured (separability,
b B1
2
i p xi
b
+
2
i b xi
norms, . . . ).
b B2
Proximal splitting:
Unravel the structure of problems.
Parallelizable.
Decomposition G =
k
Gk
i
xi2
b
94. Conclusion
Inverse problems in imaging:
Large scale, N 106 .
Towards More Complex Penalization
Non-smooth (sparsity, TV, . . . )
(Sometimes) convex.
⇥ x⇥⇥1 =
i ⇥xi ⇥
b B
Highly structured (separability,
2
i p xi
b
Proximal splitting:
Unravel the structure of problems.
b B1
+
2
i b xi
norms, . . . ).
b B2
Parallelizable.
Open problems:
Less structured problems without smoothness.
Decomposition G = k Gk
Non-convex optimization.
i
xi2
b