SlideShare a Scribd company logo
Low Complexity
Regularization of
Inverse Problems
Cours #3
Proximal Splitting Methods
Gabriel Peyré
www.numerical-tours.com
Overview of the Course

• Course #1: Inverse Problems

• Course #2: Recovery Guarantees

• Course #3: Proximal Splitting Methods
Convex Optimization
Setting: G : H
R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
Problem:

min G(x)

x H
Convex Optimization
Setting: G : H
R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
Problem:

Class of functions:
Convex: G(tx + (1

min G(x)

x H

y

x
t)y)

tG(x) + (1

t)G(y)

t

[0, 1]
Convex Optimization
Setting: G : H
R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
Problem:

Class of functions:
Convex: G(tx + (1

min G(x)

x H

y

x
t)y)

Lower semi-continuous:

tG(x) + (1

t)G(y)

lim inf G(x)

G(x0 )

x

x0

Proper: {x ⇥ H  G(x) ⇤= + } = ⌅
⇤

t

[0, 1]
Convex Optimization
Setting: G : H
R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
min G(x)

Problem:

Class of functions:
Convex: G(tx + (1

x H

t)y)

Lower semi-continuous:

tG(x) + (1

t)G(y)

lim inf G(x)

G(x0 )

x

x0

Proper: {x ⇥ H  G(x) ⇤= + } = ⌅
⇤
Indicator:

y

x

C (x)

=

(C closed and convex)

0 if x ⇥ C,
+
otherwise.

t

[0, 1]
Example:
Inverse problem:

f0

K

1

Regularization

measurements
Kf0

y = Kf0 + w
K : RN

RP ,

P

N
Example:
Inverse problem:

f0

Model: f0 =
x RQ
coe cients

K

1

Regularization

measurements
Kf0

y = Kf0 + w
K : RN

x0 sparse in dictionary
f=

x R
image

N

= K ⇥ ⇥ RP

K
Q

RP ,

RN

Q

,Q

P

N

N.

y = Kf RP
observations
Example:
Inverse problem:

f0

Model: f0 =
x RQ
coe cients

K

1

Regularization

measurements
Kf0

y = Kf0 + w
K : RN

x0 sparse in dictionary
f=

x R
image

N

= K ⇥ ⇥ RP

K
Q

Sparse recovery: f = x where x solves
1
min
||y
x||2 + ||x||1
x RN 2
Fidelity Regularization

RP ,

RN

Q

,Q

P

N

N.

y = Kf RP
observations
Example:

1

Regularization

Inpainting: masking operator K
fi if i
,
(Kf )i =
0 otherwise.

K : RN
RN

Q

RP

c

P =| |

translation invariant wavelet frame.

Orignal f0 =

x0

y = x0 + w

Recovery

x
Overview
• Subdifferential Calculus
• Proximal Calculus
• Forward Backward
• Douglas Rachford
• Generalized Forward-Backward
• Duality
Sub-differential
Sub-di erential:
G(x) = {u ⇥ H  ⇤ z, G(z)

G(x) + ⌅u, z

x⇧}

G(x) = |x|

G(0) = [ 1, 1]
Sub-differential
Sub-di erential:
G(x) = {u ⇥ H  ⇤ z, G(z)

Smooth functions:

G(x) + ⌅u, z

x⇧}

G(x) = |x|

If F is C 1 , F (x) = { F (x)}

G(0) = [ 1, 1]
Sub-differential
Sub-di erential:
G(x) = {u ⇥ H  ⇤ z, G(z)

G(x) + ⌅u, z

x⇧}

G(x) = |x|

Smooth functions:
If F is C 1 , F (x) = { F (x)}

G(0) = [ 1, 1]

First-order conditions:
x

argmin G(x)
x H

0

G(x )
Sub-differential
Sub-di erential:
G(x) = {u ⇥ H  ⇤ z, G(z)

G(x) + ⌅u, z

x⇧}

G(x) = |x|

Smooth functions:
If F is C 1 , F (x) = { F (x)}

G(0) = [ 1, 1]

First-order conditions:
x

argmin G(x)

0

x H

Monotone operator:
(u, v)

U (x)

G(x )

U (x)

x

U (x) = G(x)
U (y),

y

x, v

u

0
Example:

1

Regularization

1
x ⇥ argmin G(x) = ||y
2
x RQ

⇥G(x) =
|| · ||1 (x)i =

( x

y) + ⇥|| · ||1 (x)

x||2 + ||x||1

sign(xi ) if xi ⇥= 0,
[ 1, 1] if xi = 0.
Example:

1

Regularization

1
x ⇥ argmin G(x) = ||y
2
x RQ

⇥G(x) =
|| · ||1 (x)i =

( x

x||2 + ||x||1

y) + ⇥|| · ||1 (x)

sign(xi ) if xi ⇥= 0,
[ 1, 1] if xi = 0.

Support of the solution:
I = {i ⇥ {0, . . . , N 1}  xi ⇤= 0}

xi
i
Example:

1

Regularization

1
x ⇥ argmin G(x) = ||y
2
x RQ

⇥G(x) =
|| · ||1 (x)i =

( x

x||2 + ||x||1

y) + ⇥|| · ||1 (x)

sign(xi ) if xi ⇥= 0,
[ 1, 1] if xi = 0.

xi
i

Support of the solution:
I = {i ⇥ {0, . . . , N 1}  xi ⇤= 0}
First-order conditions:

s

RN ,

( x

i,

y) + s = 0

sI = sign(xI ),
||sI c ||
1.

y

x

i
Example: Total Variation Denoising
Important: the optimization variable is f .
1
f ⇥ argmin ||y f ||2 + J(f )
f RN 2
Finite di erence gradient:
Discrete TV norm:

:R

J(f ) =
i

= 0 (noisy)

N

R

N 2

||( f )i ||

( f )i

R2
Example: Total Variation Denoising
1
f ⇥ argmin ||y
f RN 2

J(f ) = G( f )

f ||2 + J(f )

G(u) =
i

Composition by linear maps:
J(f ) =
⇥G(u)i =

(J

||ui ||

A) = A

( J) A

div ( G( f ))
ui
||ui ||

if ui ⇥= 0,
R2  || || 1

if ui = 0.
Example: Total Variation Denoising
1
f ⇥ argmin ||y
f RN 2

J(f ) = G( f )

f ||2 + J(f )

G(u) =
i

(J

Composition by linear maps:
J(f ) =
⇥G(u)i =

A) = A

( J) A

div ( G( f ))
if ui ⇥= 0,
R2  || || 1

ui
||ui ||

First-order conditions:
⇥i
⇥i

||ui ||

I, vi =
I c , ||vi ||

v

fi
|| fi || ,

1

RN

if ui = 0.
2

, f = y + div(v)

I = {i  (⇥f )i = 0}
Overview
• Subdifferential Calculus
• Proximal Calculus
• Forward Backward
• Douglas Rachford
• Generalized Forward-Backward
• Duality
Proximal Operators
Proximal operator of G:
1
Prox G (x) = argmin ||x
2
z

z||2 + G(z)
Proximal Operators
Proximal operator of G:
1
Prox G (x) = argmin ||x
2
z
G(x) = ||x||1 =

z||2 + G(z)
log(1 + x2 )
|x| ||x||0

12

i

|xi |

10

8

6

4

2

0

G(x) = ||x||0 = | {i  xi = 0} |

G(x) =
i

log(1 + |xi |2 )

G(x)

−2

−10

−8

−6

−4

−2

0

2

4

6

8

10
Proximal Operators
Proximal operator of G:
1
Prox G (x) = argmin ||x
2
z
G(x) = ||x||1 =
Prox

G (x)i

z||2 + G(z)
12

i

|xi |

= max 0, 1

10

8

|xi |

G(x) = ||x||0 = | {i  xi = 0} |
Prox

G (x)i

log(1 + x2 )
|x| ||x||0

=

xi if |xi |
0 otherwise.

xi

6

4

2

0

G(x)

−2

−10

2 ,

−8

−6

−4

−2

0

2

4

6

8

10

10

8

6

4

2

0

G(x) =
i

log(1 + |xi |2 )

3rd order polynomial root.

−2

−4

−6

ProxG (x)

−8

−10

−10

−8

−6

−4

−2

0

2

4

6

8

10
Proximal Calculus
Separability:

G(x) = G1 (x1 ) + . . . + Gn (xn )

ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
Proximal Calculus
Separability:

G(x) = G1 (x1 ) + . . . + Gn (xn )

ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
1
Quadratic functionals:
G(x) = || x y||2
2
Prox G = (Id +
) 1
=

(Id +

)

1
Proximal Calculus
Separability:

G(x) = G1 (x1 ) + . . . + Gn (xn )

ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
1
Quadratic functionals:
G(x) = || x y||2
2
Prox G = (Id +
) 1
=

(Id +

)

1

Composition by tight frame: A A = Id
ProxG

A (x)

=A

ProxG A + Id

A

A
Proximal Calculus
G(x) = G1 (x1 ) + . . . + Gn (xn )

Separability:

ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
1
Quadratic functionals:
G(x) = || x y||2
2
Prox G = (Id +
) 1
=

(Id +

)

1

Composition by tight frame: A A = Id
ProxG

A (x)

Indicators:

Prox

G (x)

=A

G(x) =

ProxG A + Id

z C

A

x

C (x)

= ProjC (x)
= argmin ||x

A

C
z||

ProjC (x)
Prox and Subdifferential
Resolvant of G:
z = Prox
x

G (x)

0

(Id + ⇥G)(z)

z

x + ⇥G(z)

z = (Id + ⇥G)

1

(x)

Inverse of a set-valued mapping:
where x

Prox

G

U (y)

= (Id + ⇥G)

y
1

U

1

(x)

is a single-valued mapping
Prox and Subdifferential
Resolvant of G:
z = Prox
x

G (x)

0

(Id + ⇥G)(z)

z

x + ⇥G(z)

z = (Id + ⇥G)

1

(x)

Inverse of a set-valued mapping:
where x

Prox

G

Fix point:

U (y)

= (Id + ⇥G)
x

y
1

U

(x)

is a single-valued mapping

argmin G(x)
x

0

1

G(x )

x⇥ = (Id + ⇥G)

x
1

(Id + ⇥G)(x )

(x⇥ ) = Prox

(x⇥ )
G
Gradient and Proximal Descents
x( +1) = x( )
G(x( ) )
Gradient descent:
G is C 1 and G is L-Lipschitz
Theorem:

If 0 <

< 2/L, x(

)

[explicit]

x a solution.
Gradient and Proximal Descents
x( +1) = x( )
G(x( ) )
Gradient descent:
G is C 1 and G is L-Lipschitz
Theorem:

< 2/L, x(

If 0 <

Sub-gradient descent: x(
Theorem:

If

+1)

= x(

1/⇥, x(

Problem: slow.

)

)

)

[explicit]

x a solution.

v( ) ,

v(

)

x a solution.

G(x( ) )
Gradient and Proximal Descents
x( +1) = x( )
G(x( ) )
Gradient descent:
G is C 1 and G is L-Lipschitz
Theorem:

< 2/L, x(

If 0 <

Sub-gradient descent: x(
Theorem:

+1)

= x(

1/⇥, x(

If

)

)

[explicit]

x a solution.

v( ) ,

v(

)

G(x( ) )

x a solution.

)

Problem: slow.

Proximal-point algorithm: x(⇥+1) = Prox
Theorem:

c > 0, x(

If

Prox

G

)

(x(⇥) ) [implicit]
G

x a solution.

hard to compute.
Overview
• Subdifferential Calculus
• Proximal Calculus
• Forward Backward
• Douglas Rachford
• Generalized Forward-Backward
• Duality
Proximal Splitting Methods
Solve
Problem:

Prox

min E(x)

x H
E

is not available.
Proximal Splitting Methods
Solve

min E(x)

x H

is not available.

Problem:

Prox

Splitting:

E(x) = F (x) +

E

Smooth

Gi (x)
i

Simple
Proximal Splitting Methods
Solve

min E(x)

x H

is not available.

Problem:

Prox

Splitting:

E(x) = F (x) +

E

Smooth

Gi (x)
i

Iterative algorithms using:
Forward-Backward:

solves

Simple
F (x)
Prox Gi (x)

F + G

Douglas-Rachford:

Gi

Primal-Dual:

Gi A

Generalized FB:

F+

Gi
Smooth + Simple Splitting
Inverse problem:

f0

K

Model: f0 =

measurements
Kf0

y = Kf0 + w
K : RN

x0 sparse in dictionary

Sparse recovery: f =

RP ,

P

.

x where x solves

min F (x) + G(x)

x RN

Smooth Simple
1
Data fidelity:
F (x) = ||y
x||2
2
Regularization: G(x) = ||x||1 =
|xi |
i

=K ⇥

N
Forward-Backward
Fix point equation:
x

argmin F (x) + G(x)
x

(x

0

F (x ) + G(x )

F (x ))

x + ⇥G(x )

x⇥ = Prox

(x⇥
G

F (x⇥ ))
Forward-Backward
Fix point equation:
x

argmin F (x) + G(x)
x

(x

0

F (x ) + G(x )

F (x ))

x + ⇥G(x )

x⇥ = Prox
Forward-backward:

x(⇥+1) = Prox

(x⇥
G
G

x(⇥)

F (x⇥ ))
F (x(⇥) )
Forward-Backward
Fix point equation:
x

argmin F (x) + G(x)
x

(x

0

F (x ) + G(x )

F (x ))

x + ⇥G(x )

x⇥ = Prox
Forward-backward:

(x⇥
G

x(⇥+1) = Prox

Projected gradient descent:

G=

G

C

x(⇥)

F (x⇥ ))
F (x(⇥) )
Forward-Backward
Fix point equation:
x

argmin F (x) + G(x)
x

F (x ) + G(x )

F (x ))

(x

0

x + ⇥G(x )

x⇥ = Prox
Forward-backward:

x(⇥+1) = Prox

G=

Projected gradient descent:
Theorem:
If

< 2/L,

(x⇥
G

Let
x(

)

G

x(⇥)

F (x⇥ ))
F (x(⇥) )

C

F be L-Lipschitz.
x

a solution of ( )
Example: L1 Regularization
1
min || x
x 2

y||2 + ||x||1

1
F (x) = || x
2

min F (x) + G(x)
x

y||2

F (x) =

( x

G(x) = ||x||1
Prox

G (x)i

Forward-backward

L = ||

y)

= max 0, 1

⇥
|xi |

||

xi

Iterative soft thresholding
Convergence Speed
min E(x) = F (x) + G(x)
x

F is L-Lipschitz.

G is simple.
Theorem:

If L > 0, FB iterates x(

E(x( ) )

E(x )

C degrades with L

C/

0.

)

satisfies
Multi-steps Accelerations
t(0) = 1
Beck-Teboule accelerated FB:
✓
◆
1
(`+1)
(`)
x
= Prox1/L y
rF (y (`) )
L

1+

1 + 4(t( ) )2
t( +1) =
2()
t
1 (
( +1)
( +1)
y
=x
+ ( +1) (x
t

+1)

x( ) )

(see also Nesterov method)

Theorem:

If L > 0,

( )

E(x

)

E(x )

C

Complexity theory: optimal in a worse-case sense.
Overview
• Subdifferential Calculus
• Proximal Calculus
• Forward Backward
• Douglas Rachford
• Generalized Forward-Backward
• Duality
Douglas Rachford Scheme
min G1 (x) + G2 (x)
x

Simple

( )

Simple

Douglas-Rachford iterations:

z (⇥+1) = 1
x(`+1)

2

z (⇥) +

2
= Prox G1 (z (`+1) )

Reflexive prox:
RProx

G (x)

RProx

= 2Prox

G2

G (x)

RProx

x

(z (⇥) )
G1
Douglas Rachford Scheme
min G1 (x) + G2 (x)
x

Simple

( )

Simple

Douglas-Rachford iterations:

z (⇥+1) = 1
x(`+1)

z (⇥) +

2

2
= Prox G1 (z (`+1) )

Reflexive prox:
RProx
Theorem:
x(

G (x)

= 2Prox

If 0 <
)

RProx

x

G2

G (x)

RProx

x

< 2 and ⇥ > 0,
a solution of ( )

(z (⇥) )
G1
DR Fix Point Equation
min G1 (x) + G2 (x)

0

x

z, z

x

x = Prox

(G1 + G2 )(x)

⇥( G1 )(x) and x
G1 (z)

and

(2x

z)

⇥( G2 )(x)

z
x

⇥( G2 )(x)
DR Fix Point Equation
min G1 (x) + G2 (x)

0

x

z, z

⇥( G1 )(x) and x

x

x = Prox

(G1 + G2 )(x)

G1 (z)

x = Prox

and

(2x

⇥( G2 )(x)

z

G2

⇥( G2 )(x)

x

z) = Prox

G2 (2x

z)

RProx

G1 (z)

z = 2Prox

G2

RProx

G1 (y)

(2x

z = 2Prox

G2

RProx

G1 (z)

RProx

G1 (z)

RProx

G1 (z)

z = RProx
z= 1

2

G2

RProx

z+

2

z)

G1 (z)

RProx

G2
Example: Constrainted L1
min ||x||1

min G1 (x) + G2 (x)

x=y

C = {x  x = y}

G1 (x) = iC (x),

Prox

x

G1 (x) = ProjC (x) = x +

G2 (x) = ||x||1

Prox

e⇥cient if

G2 (x)

=

⇥

(

⇥

)

max 0, 1

easy to invert.

1

(y
|xi |

x)
xi
i
Example: Constrainted L1
min ||x||1

min G1 (x) + G2 (x)

x=y

C = {x  x = y}

G1 (x) = iC (x),

Prox

x

G1 (x) = ProjC (x) = x +

G2 (x) = ||x||1

Prox

e⇥cient if

G2 (x)

=

⇥

(

easy to invert.

Example: compressed sensing
y = x0

400

Gaussian matrix
||x0 ||0 = 17

)

1

max 0, 1
1

R100

⇥

(y

x)
xi

|xi |

i

log10 (||x( ) ||1

||x ||1 )

0
−1
−2
−3
−4
−5

= 0.01
=1
= 10
50

100

150

200

250
More than 2 Functionals
min G1 (x) + . . . + Gk (x)
x

min

(x1 ,...,xk )

each Fi is simple

G(x1 , . . . , xk ) + ◆C (x1 , . . . , xk )

G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )
C = (x1 , . . . , xk )

Hk  x1 = . . . = xk
More than 2 Functionals
each Fi is simple

min G1 (x) + . . . + Gk (x)
x

min

(x1 ,...,xk )

G(x1 , . . . , xk ) + ◆C (x1 , . . . , xk )

G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )
C = (x1 , . . . , xk )

G and
Prox

Prox

C

Hk  x1 = . . . = xk

are simple:

G (x1 , . . . , xk )

= (Prox

Gi (xi ))i

⇥C (x1 , . . . , xk )

= (˜, . . . , x)
x
˜

1
where x =
˜
k

xi
i
Auxiliary Variables: DR
Linear map A : E

min G1 (x) + G2 A(x)
x

min G(z) +

z⇥H E

G1 , G2 simple.

C (z)

G(x, y) = G1 (x) + G2 (y)
C = {(x, y) ⇥ H

E  Ax = y}

H.
Auxiliary Variables: DR
Linear map A : E

min G1 (x) + G2 A(x)
x

min G(z) +

z⇥H E

G1 , G2 simple.

C (z)

G(x, y) = G1 (x) + G2 (y)
C = {(x, y) ⇥ H

Prox

G (x, y)

= (Prox

G1 (x), Prox G2 (y))

˜
Prox C (x, y) = (x + A y , y
where

E  Ax = y}

x x
y ) = (˜, A˜)
˜

y = (Id + AA )
˜

1

(Ax

x = (Id + A A)
˜

1

(A y + x)

y)

e cient if Id + AA or Id + A A easy to invert.

H.
Example: TV Regularization
1
min ||Kf y||2 + ||⇥f ||1
f
2
min G1 (f ) + G2
(f )

||u||1 =

i

||ui ||

x

G1 (u) = ||u||1

1
G2 (f ) = ||Kf
2
C = (f, u) ⇥ RN

Prox

G1 (u)i

y||2
RN

Prox
2

= max 0, 1
G2

 u = ⇤f

˜ ˜
Prox C (f, u) = (f , f )

||ui ||

= (Id + K K)

ui
1

K
Example: TV Regularization
1
min ||Kf y||2 + ||⇥f ||1
f
2
min G1 (f ) + G2
(f )

||u||1 =

i

||ui ||

x

G1 (u) = ||u||1

1
G2 (f ) = ||Kf
2
C = (f, u) ⇥ RN

Prox

G1 (u)i

y||2
RN

Prox
2

= max 0, 1
G2

||ui ||

= (Id + K K)

ui
1

K

 u = ⇤f

˜ ˜
Prox C (f, u) = (f , f )

Compute the solution of:

(Id +

˜
)f =

div(u) + f

O(N log(N )) operations using FFT.
Example: TV Regularization

Orignal f0

y = Kx0

y = f0 + w

Recovery f

Iteration
Overview
• Subdifferential Calculus
• Proximal Calculus
• Forward Backward
• Douglas Rachford
• Generalized Forward-Backward
• Duality
GFB Splitting
n

min F (x) +

x RN

(⇥+1)
(⇥)
zi
= zi +
n
1
( +1)

x

=

n

Proxn

i=1

( )

i=1

Smooth

i = 1, . . . , n,

Gi (x)

( +1)

zi

G

Simple

(2x

(⇥)

(⇥)
zi

F (x(⇥) )) x(⇥)
GFB Splitting
n

min F (x) +

x RN

(⇥+1)
(⇥)
zi
= zi +
n
1
( +1)

=

x

n

Simple

Proxn

G

(2x

(⇥)

(⇥)
zi

F (x(⇥) )) x(⇥)

( +1)

zi

i=1

Theorem:
If

( )

i=1

Smooth

i = 1, . . . , n,

Gi (x)

< 2/L,

Let
x(

)

F be L-Lipschitz.
x

a solution of ( )
GFB Splitting
n

min F (x) +

x RN

(⇥+1)
(⇥)
zi
= zi +
n
1
( +1)

=

x

n

Proxn

Simple

G

(2x

(⇥)

(⇥)
zi

F (x(⇥) )) x(⇥)

( +1)

zi

i=1

Theorem:
If

( )

i=1

Smooth

i = 1, . . . , n,

Gi (x)

< 2/L,

n=1
F =0

Let
x(

)

F be L-Lipschitz.
x

a solution of ( )

Forward-backward.
Douglas-Rachford.
GFB Fix Point
x

argmin F (x) +
x RN

yi

i

Gi (x)

Gi (x ),

0
F (x ) +

F (x ) +
i yi

=0

i

Gi (x )
GFB Fix Point
x

argmin F (x) +
x RN

i

Gi (x)

Gi (x ),

yi
(zi )n ,
i=1

1
i, x
n

x =

i zi

1
n

0
F (x ) +
zi

F (x ) +
i yi

Gi (x )

=0

F (x )

(use zi = x

i

⇥Gi (x )
F (x )

N yi )
GFB Fix Point
x

argmin F (x) +
x RN

i

Gi (x)

Gi (x ),

yi
(zi )n ,
i=1

i zi

1
n

(2x

zi

x⇥ = Proxn

F (x ) +

F (x ) +

1
i, x
n

x =

0

i yi

(use zi = x
F (x ))

(2x⇥
Gi

zi = zi + Proxn

G

⇥Gi (x )
F (x )

N yi )

n ⇥Gi (x )

x

F (x⇥ ))

zi

(2x⇥

Gi (x )

=0

F (x )

zi

i

zi

F (x⇥ ))

x⇥
GFB Fix Point
x

argmin F (x) +
x RN

i

Gi (x)

Gi (x ),

yi
(zi )n ,
i=1

i zi

1
n

(2x

zi

x⇥ = Proxn

i yi

(use zi = x
F (x ))

(2x⇥
Gi
G

Gi (x )

⇥Gi (x )
F (x )

N yi )

n ⇥Gi (x )

x

F (x⇥ ))

zi

(2x⇥

i

=0

F (x )

zi

zi = zi + Proxn
+

F (x ) +

F (x ) +

1
i, x
n

x =

0

zi

F (x⇥ ))

x⇥

Fix point equation on (x , z1 , . . . , zn ).
Block Regularization
1

2

block sparsity: G(x) =
b B

iments

2

+

(2)
` 1 `2

4
k=1

N: 256

x

x2
m
m b

Towards More Complex Penalization

Bk
1,2

⇥ x⇥⇥1 =

i ⇥xi ⇥

b

Image f =

||x[b] ||2 =

||x[b] ||,

B

x Coe cients x.

b B

i

xi2
b

b B1
b B2

+

i b xi

i b xi
Block Regularization
1

2

block sparsity: G(x) =
b B

||x[b] ||,

||x[b] ||2 =

x2
m
m b

... B
Non-overlapping decomposition: B = B
iments Towards More Complex Penalization
Towards More Complex Penalization
Towards More Complex Penalization

2

1

n

(2)
G(x) =4 x iBk
(x)
+ ` ` k=1 G 1,2
1

2

N: 256

Gi (x) =

b Bi

i=1

⇥=
⇥ x⇥x⇥x⇥⇥1 =i ⇥x⇥x⇥xi ⇥
⇥ ⇥1 ⇥1 = i i ⇥i i ⇥

b

Image f =

||x[b] ||,

bb B B i
Bb

xii2bi2xi2
bbx
i

B

x Coe cients x.

n

Blocks B1

22
b b 1b1 B1 i b xiixb xi
BB
i b i

++ +

b b 2b2 B2 i
BB

B1

xi2 b2xi
b b xi
i

B2
Block Regularization
1

2

block sparsity: G(x) =
b B

||x[b] ||,

||x[b] ||2 =

x2
m
m b

... B
Non-overlapping decomposition: B = B
iments Towards More Complex Penalization
Towards More Complex Penalization
Towards More Complex Penalization

2

1

n

(2)
G(x) =4 x iBk
(x)
+ ` ` k=1 G 1,2
1

2

Gi (x) =

b Bi

i=1

||x[b] ||,

Each Gi is simple:
⇥ ⇥1 = i ⇥i i
⇥ x⇥x⇥x⇥⇥1 =i ⇥xG ⇥xi ⇥ m = b B B i b xii2bi2xi2
=
Bb
⇤ m ⇥ b ⇥ Bi , ⇥ ⇥1Prox i ⇥xi ⇥(x) b max i0, 1
bx
N: 256

b

Image f =

B

x Coe cients x.

n

Blocks B1

22
b b 1b1 B1 i b xiixb xi
BB
i b i

||x[b]b||B
b B b

++m
x +

2 2 B2

B1

i

xi2 b2xi
b b xi
i

B2
10

10

x+1,2`
1

`2

k=1

Numerical
Numerical Experiments Experiments
1

1

1
0

log10(E−Emin)
log10(E−Emin)

tmin
: 283s; t : 298s; t :: 283s; t : 298s; t (2)
t CP 2 +
368s
||y x 1 ⇥x||368s PRx 2 minix(x)Y ⇥ K
PR
Deconvolution +GCP: 1` 4
−1 EFB
−1 EFB
Deconvolution min 2 Y ⇥ K
`
x 102
10 40
20
30 1 2 2 40k=1
20
30
EFB iteration #
i
EFB
iteration 3
#
3
0

log10(E−Emin)

x
k=1

Numerical Illustration
log (E−

log (E−E

Deconv. + Inpaint. 2min+CP Y ⇥ P K x CP Y + P 1 K2
x
Deconv. x 2Inpaint. min 2 ⇥ ` `
2

PR
CP

= convolution
2

x

PR
CP 2
λ

Bk 2
TI (2)`2 4
x= + `wavelets x
k=1
1,2
1

λ2 : 1.30e−03;
: 1.30e−03;
= inpainting+convolution l1/l2
l1/l2
tEFB: 161s; tPR: 173s; tCP N: 256
190s
t
: 161s; noise: 0.025; :convol.: 2
t : 173s; t
190s noise: 0.025; convol.::it. #50; SNR: 22.49dB #50; SNR: 22.49dB
it.
2
N: 256
EFB
PR
CP

3

2

Numerical Experiments

1
0

onv. + Inpaint. minx
2
10

20

1

EFB
0
3
PR
1
CP
2 30
2

1

iteration #
1

0

0

Y ⇥P K
10

40

20

x

2

+

30

iteration #

EFB
PR
(4)
CP
`140`2

16
k=1

x

λ4 : 1.00e−03;
l1/l2

Bk
1,2

λ4 : 1.00e−03;
l1/l2

10

min

tEFB: 283s; tPR: 298s; tCP: 368s
it. #50; SNR: 21.80dB #50; SNR: 21.80dB
noise: 0.025; degrad.: 0.4; 0.025; degrad.: 0.4; convol.: 2
it.
noise: convol.: 2
−1
−1
10

20

30
EFB
PR
CP

iteration #

3
2
1

40

10

20

30

x0

40

iteration #
λ2

: 1.30e−03;

l1/l2

noise: 0.025; it. #50; SNR: 22.49dB
convol.: 2

noise: 0.025; convol.: 2

0
10

log10

20

iteration
(E(x( ) ) #

y = x0 + w
E(x ))
30

40

4

x

λ2 : 1.30e−03;
l1/l2

it. #50; SNR: 22.49dB
Overview
• Subdifferential Calculus
• Proximal Calculus
• Forward Backward
• Douglas Rachford
• Generalized Forward-Backward
• Duality
Legendre-Fenchel Duality
Legendre-Fenchel transform:
G (u) =

sup
x dom(G)

u, x

G(x)

eu
lop
S

G(x)
G (u)

x
Legendre-Fenchel Duality
Legendre-Fenchel transform:
G (u) =

sup

u, x

G(x)

x dom(G)

Example: quadratic functional
1
G(x) = Ax, x + x, b
2
1
G (u) = u b, A 1 (u b)
2

eu
lop
S

G(x)
G (u)

x
Legendre-Fenchel Duality
Legendre-Fenchel transform:
G (u) =

sup

u, x

G(x)

x dom(G)

Example: quadratic functional
1
G(x) = Ax, x + x, b
2
1
G (u) = u b, A 1 (u b)
2

G(x)
G (u)

Moreau’s identity:
Prox

G

(x) = x

G simple

eu
lop
S

ProxG/ (x/ )

G simple

x
Indicator and Homogeneous
Positively 1-homogeneous functional:
Example: norm

Duality:

G (x) =

G(x) = ||x||

G (·) 1 (x)

G( x) = |x|G(x)

G (y) = min

G(x) 1

x, y
Indicator and Homogeneous
Positively 1-homogeneous functional:
Example: norm

Duality:

G (x) =

G(x) = ||x||

G (·) 1 (x)

G( x) = |x|G(x)

G (y) = min

G(x) 1

p

norms:

G(x) = ||x||p

G (x) = ||x||q

1 1
+ =1
p q

1

x, y

p, q

+
Indicator and Homogeneous
G( x) = |x|G(x)

Positively 1-homogeneous functional:
G(x) = ||x||

Example: norm

Duality:

G (x) =

G (·) 1 (x)

G (y) = min

G(x) 1

p

norms:

G(x) = ||x||p

G (x) = ||x||q

1 1
+ =1
p q

Example: Proximal operator of
Prox

||·||

Proj||·||1

= Id

norm

Proj||·||1

(x)i = max 0, 1

|xi |

for a well-chosen ⇥ = ⇥ (x, )

xi

1

x, y

p, q

+
Primal-dual Formulation
A:H⇥

Fenchel-Rockafellar duality:

L

linear

min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui

x2H

x

u2L

G⇤ (u)
2
Primal-dual Formulation
A:H⇥

Fenchel-Rockafellar duality:

L

linear

min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui
x

x2H

u2L

G⇤ (u)
2

Strong duality:

0 2 ri(dom(G2 ))

(min $ max)

= max

G⇤ (u) + min G1 (x) + hx, A⇤ ui
2

= max

G⇤ (u)
2

u

u

A ri(dom(G1 ))
x
G⇤ (
1

A⇤ u)
Primal-dual Formulation
A:H⇥

Fenchel-Rockafellar duality:

L

linear

min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui
x

x2H

u2L

G⇤ (u)
2

Strong duality:

0 2 ri(dom(G2 ))

(min $ max)

= max

G⇤ (u) + min G1 (x) + hx, A⇤ ui
2

= max

G⇤ (u)
2

u

u

A ri(dom(G1 ))
x
G⇤ (
1

Recovering x? from some u? :
x? = argmin G1 (x? ) + hx? , A⇤ u? i
x

A⇤ u)
Primal-dual Formulation
A:H⇥

Fenchel-Rockafellar duality:

L

linear

min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui
x

x2H

u2L

G⇤ (u)
2

Strong duality:

0 2 ri(dom(G2 ))

(min $ max)

= max

G⇤ (u) + min G1 (x) + hx, A⇤ ui
2

= max

G⇤ (u)
2

u

u

A ri(dom(G1 ))
x
G⇤ (
1

A⇤ u)

Recovering x? from some u? :
x? = argmin G1 (x? ) + hx? , A⇤ u? i
x

()

A⇤ u? 2 @G1 (x? )

() x? 2 (@G1 )

1

( A⇤ u? ) = @G⇤ ( A⇤ u? )
1
Forward-Backward on the Dual
If G1 is strongly convex:
G1 (tx + (1

r2 G1 > cId

t)y) 6 tG1 (x) + (1

t)G1 (y)

c
t(1
2

t)||x

y||2
Forward-Backward on the Dual
If G1 is strongly convex:
G1 (tx + (1

r2 G1 > cId

t)y) 6 tG1 (x) + (1

x? uniquely defined.
G? is of class C 1 .
1

t)G1 (y)

c
t(1
2

t)||x

x? = rG? ( A⇤ u? )
1

y||2
Forward-Backward on the Dual
r2 G1 > cId

If G1 is strongly convex:
G1 (tx + (1

t)y) 6 tG1 (x) + (1

x? uniquely defined.
G? is of class C 1 .
1

FB on the dual:

t)G1 (y)

c
t(1
2

t)||x

x? = rG? ( A⇤ u? )
1

min G1 (x) + G2 A(x)

x2H

=

min G? ( A⇤ u) + G? (u)
1
2
u2L
Simple
Smooth
⇣

u(`+1) = Prox⌧ G? u(`) + ⌧ A⇤ rG? ( A⇤ u(`) )
1
2

⌘

y||2
Example: TV Denoising
1
min ||f
f RN 2

y||2 + ||⇥f ||1

||u||1 =
Dual solution u

i

||ui ||

min ||y + div(u)||2

||u||

||u||

= max ||ui ||
i

Primal solution f = y + div(u )
[Chambolle 2004]
Example: TV Denoising
1
min ||f
f RN 2

min ||y + div(u)||2

y||2 + ||⇥f ||1

||u||1 =
Dual solution u

i

||u||

||u||

||ui ||

+1)

= Proj||·||

i

Primal solution f = y + div(u )

FB (aka projected gradient descent):
u(

= max ||ui ||

u( ) +

[Chambolle 2004]

(y + div(u( ) ))

ui
v = Proj||·||
(u)
vi =
max(||ui ||/ , 1)
2
1
<
=
Convergence if
||div ⇥||
4
Primal-Dual Algorithm
min G1 (x) + G2 A(x)

x H

() min max G1 (x)
x

z

G⇤ (z) + hA(x), zi
2
Primal-Dual Algorithm
min G1 (x) + G2 A(x)

x H

G⇤ (z) + hA(x), zi
2

() min max G1 (x)
x

z

z (`+1) = Prox

G⇤
2

x(⇥+1) = Prox

(x(⇥)
G1

x(
˜

+ (x(

+1)

= x(

+1)

(z (`) + A(˜(`) )
x

A (z (⇥) ))
+1)

x( ) )

= 0: Arrow-Hurwicz algorithm.
= 1: convergence speed on duality gap.
Primal-Dual Algorithm
min G1 (x) + G2 A(x)

x H

G⇤ (z) + hA(x), zi
2

() min max G1 (x)
x

z

z (`+1) = Prox

G⇤
2

x(⇥+1) = Prox

(x(⇥)
G1

x(
˜

+ (x(

+1)

= x(

+1)

(z (`) + A(˜(`) )
x

A (z (⇥) ))
+1)

x( ) )

= 0: Arrow-Hurwicz algorithm.
= 1: convergence speed on duality gap.
Theorem: [Chambolle-Pock 2011]
If 0

x(

)

1 and ⇥⇤ ||A||2 < 1 then

x minimizer of G1 + G2 A.
Conclusion
Inverse problems in imaging:
Large scale, N 106 .
Non-smooth (sparsity, TV, . . . )
(Sometimes) convex.
Highly structured (separability,

p

norms, . . . ).
Conclusion
Inverse problems in imaging:
Large scale, N 106 .

Towards More Complex Penalization

Non-smooth (sparsity, TV, . . . )
(Sometimes) convex.
⇥ x⇥⇥1 =

i ⇥xi ⇥

b B

Highly structured (separability,

b B1

2

i p xi
b

+

2
i b xi

norms, . . . ).
b B2

Proximal splitting:
Unravel the structure of problems.
Parallelizable.
Decomposition G =

k

Gk

i

xi2
b
Conclusion
Inverse problems in imaging:
Large scale, N 106 .

Towards More Complex Penalization

Non-smooth (sparsity, TV, . . . )
(Sometimes) convex.
⇥ x⇥⇥1 =

i ⇥xi ⇥

b B

Highly structured (separability,

2

i p xi
b

Proximal splitting:
Unravel the structure of problems.

b B1

+

2
i b xi

norms, . . . ).
b B2

Parallelizable.

Open problems:
Less structured problems without smoothness.
Decomposition G = k Gk
Non-convex optimization.

i

xi2
b

More Related Content

What's hot

Open GL 04 linealgos
Open GL 04 linealgosOpen GL 04 linealgos
Open GL 04 linealgos
Roziq Bahtiar
 
Image Processing 3
Image Processing 3Image Processing 3
Image Processing 3
jainatin
 
Lecture3 linear svm_with_slack
Lecture3 linear svm_with_slackLecture3 linear svm_with_slack
Lecture3 linear svm_with_slack
Stéphane Canu
 

What's hot (20)

Geodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGeodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and Graphics
 
Learning Sparse Representation
Learning Sparse RepresentationLearning Sparse Representation
Learning Sparse Representation
 
Mesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionMesh Processing Course : Multiresolution
Mesh Processing Course : Multiresolution
 
Signal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse ProblemsSignal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse Problems
 
Open GL 04 linealgos
Open GL 04 linealgosOpen GL 04 linealgos
Open GL 04 linealgos
 
A series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyA series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropy
 
Bregman divergences from comparative convexity
Bregman divergences from comparative convexityBregman divergences from comparative convexity
Bregman divergences from comparative convexity
 
Classification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metricsClassification with mixtures of curved Mahalanobis metrics
Classification with mixtures of curved Mahalanobis metrics
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dual
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image Processing
 
Mesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh ParameterizationMesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh Parameterization
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon information
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas Eberle
 
Lecture5 kernel svm
Lecture5 kernel svmLecture5 kernel svm
Lecture5 kernel svm
 
Image Processing 3
Image Processing 3Image Processing 3
Image Processing 3
 
2018 MUMS Fall Course - Statistical and Mathematical Techniques for Sensitivi...
2018 MUMS Fall Course - Statistical and Mathematical Techniques for Sensitivi...2018 MUMS Fall Course - Statistical and Mathematical Techniques for Sensitivi...
2018 MUMS Fall Course - Statistical and Mathematical Techniques for Sensitivi...
 
Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014
 
Mesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsMesh Processing Course : Geodesics
Mesh Processing Course : Geodesics
 
Lecture3 linear svm_with_slack
Lecture3 linear svm_with_slackLecture3 linear svm_with_slack
Lecture3 linear svm_with_slack
 
Lecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalLecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primal
 

Similar to Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitting Methods

Chapter 1 (math 1)
Chapter 1 (math 1)Chapter 1 (math 1)
Chapter 1 (math 1)
Amr Mohamed
 
Absolute value functions
Absolute value functionsAbsolute value functions
Absolute value functions
Alexander Nwatu
 

Similar to Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitting Methods (20)

Functions limits and continuity
Functions limits and continuityFunctions limits and continuity
Functions limits and continuity
 
functions limits and continuity
functions limits and continuityfunctions limits and continuity
functions limits and continuity
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
 
Ece3075 a 8
Ece3075 a 8Ece3075 a 8
Ece3075 a 8
 
2.1 Calculus 2.formulas.pdf.pdf
2.1 Calculus 2.formulas.pdf.pdf2.1 Calculus 2.formulas.pdf.pdf
2.1 Calculus 2.formulas.pdf.pdf
 
2. Fixed Point Iteration.pptx
2. Fixed Point Iteration.pptx2. Fixed Point Iteration.pptx
2. Fixed Point Iteration.pptx
 
03 convexfunctions
03 convexfunctions03 convexfunctions
03 convexfunctions
 
Chapter 1 (math 1)
Chapter 1 (math 1)Chapter 1 (math 1)
Chapter 1 (math 1)
 
Operation on functions
Operation on functionsOperation on functions
Operation on functions
 
0210 ch 2 day 10
0210 ch 2 day 100210 ch 2 day 10
0210 ch 2 day 10
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averages
 
Function evaluation, termination, vertical line test etc
Function evaluation, termination, vertical line test etcFunction evaluation, termination, vertical line test etc
Function evaluation, termination, vertical line test etc
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
Tabela derivada
Tabela derivadaTabela derivada
Tabela derivada
 
Absolute value functions
Absolute value functionsAbsolute value functions
Absolute value functions
 
stoch41.pdf
stoch41.pdfstoch41.pdf
stoch41.pdf
 
Number theory lecture (part 2)
Number theory lecture (part 2)Number theory lecture (part 2)
Number theory lecture (part 2)
 
QMC: Operator Splitting Workshop, Boundedness of the Sequence if Iterates Gen...
QMC: Operator Splitting Workshop, Boundedness of the Sequence if Iterates Gen...QMC: Operator Splitting Workshop, Boundedness of the Sequence if Iterates Gen...
QMC: Operator Splitting Workshop, Boundedness of the Sequence if Iterates Gen...
 
maths basics
maths basicsmaths basics
maths basics
 
Dif int
Dif intDif int
Dif int
 

More from Gabriel Peyré

More from Gabriel Peyré (15)

Mesh Processing Course : Introduction
Mesh Processing Course : IntroductionMesh Processing Course : Introduction
Mesh Processing Course : Introduction
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic Sampling
 
Mesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusMesh Processing Course : Differential Calculus
Mesh Processing Course : Differential Calculus
 
Signal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoverySignal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse Recovery
 
Signal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the CourseSignal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the Course
 
Signal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal BasesSignal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal Bases
 
Signal Processing Course : Fourier
Signal Processing Course : FourierSignal Processing Course : Fourier
Signal Processing Course : Fourier
 
Signal Processing Course : Denoising
Signal Processing Course : DenoisingSignal Processing Course : Denoising
Signal Processing Course : Denoising
 
Signal Processing Course : Compressed Sensing
Signal Processing Course : Compressed SensingSignal Processing Course : Compressed Sensing
Signal Processing Course : Compressed Sensing
 
Signal Processing Course : Approximation
Signal Processing Course : ApproximationSignal Processing Course : Approximation
Signal Processing Course : Approximation
 
Signal Processing Course : Wavelets
Signal Processing Course : WaveletsSignal Processing Course : Wavelets
Signal Processing Course : Wavelets
 
Sparsity and Compressed Sensing
Sparsity and Compressed SensingSparsity and Compressed Sensing
Sparsity and Compressed Sensing
 
Optimal Transport in Imaging Sciences
Optimal Transport in Imaging SciencesOptimal Transport in Imaging Sciences
Optimal Transport in Imaging Sciences
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal Transport
 
A Review of Proximal Methods, with a New One
A Review of Proximal Methods, with a New OneA Review of Proximal Methods, with a New One
A Review of Proximal Methods, with a New One
 

Recently uploaded

Recently uploaded (20)

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitting Methods

  • 1. Low Complexity Regularization of Inverse Problems Cours #3 Proximal Splitting Methods Gabriel Peyré www.numerical-tours.com
  • 2. Overview of the Course • Course #1: Inverse Problems • Course #2: Recovery Guarantees • Course #3: Proximal Splitting Methods
  • 3. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x H
  • 4. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: Class of functions: Convex: G(tx + (1 min G(x) x H y x t)y) tG(x) + (1 t)G(y) t [0, 1]
  • 5. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: Class of functions: Convex: G(tx + (1 min G(x) x H y x t)y) Lower semi-continuous: tG(x) + (1 t)G(y) lim inf G(x) G(x0 ) x x0 Proper: {x ⇥ H G(x) ⇤= + } = ⌅ ⇤ t [0, 1]
  • 6. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . min G(x) Problem: Class of functions: Convex: G(tx + (1 x H t)y) Lower semi-continuous: tG(x) + (1 t)G(y) lim inf G(x) G(x0 ) x x0 Proper: {x ⇥ H G(x) ⇤= + } = ⌅ ⇤ Indicator: y x C (x) = (C closed and convex) 0 if x ⇥ C, + otherwise. t [0, 1]
  • 8. Example: Inverse problem: f0 Model: f0 = x RQ coe cients K 1 Regularization measurements Kf0 y = Kf0 + w K : RN x0 sparse in dictionary f= x R image N = K ⇥ ⇥ RP K Q RP , RN Q ,Q P N N. y = Kf RP observations
  • 9. Example: Inverse problem: f0 Model: f0 = x RQ coe cients K 1 Regularization measurements Kf0 y = Kf0 + w K : RN x0 sparse in dictionary f= x R image N = K ⇥ ⇥ RP K Q Sparse recovery: f = x where x solves 1 min ||y x||2 + ||x||1 x RN 2 Fidelity Regularization RP , RN Q ,Q P N N. y = Kf RP observations
  • 10. Example: 1 Regularization Inpainting: masking operator K fi if i , (Kf )i = 0 otherwise. K : RN RN Q RP c P =| | translation invariant wavelet frame. Orignal f0 = x0 y = x0 + w Recovery x
  • 11. Overview • Subdifferential Calculus • Proximal Calculus • Forward Backward • Douglas Rachford • Generalized Forward-Backward • Duality
  • 12. Sub-differential Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x| G(0) = [ 1, 1]
  • 13. Sub-differential Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) Smooth functions: G(x) + ⌅u, z x⇧} G(x) = |x| If F is C 1 , F (x) = { F (x)} G(0) = [ 1, 1]
  • 14. Sub-differential Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x| Smooth functions: If F is C 1 , F (x) = { F (x)} G(0) = [ 1, 1] First-order conditions: x argmin G(x) x H 0 G(x )
  • 15. Sub-differential Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x| Smooth functions: If F is C 1 , F (x) = { F (x)} G(0) = [ 1, 1] First-order conditions: x argmin G(x) 0 x H Monotone operator: (u, v) U (x) G(x ) U (x) x U (x) = G(x) U (y), y x, v u 0
  • 16. Example: 1 Regularization 1 x ⇥ argmin G(x) = ||y 2 x RQ ⇥G(x) = || · ||1 (x)i = ( x y) + ⇥|| · ||1 (x) x||2 + ||x||1 sign(xi ) if xi ⇥= 0, [ 1, 1] if xi = 0.
  • 17. Example: 1 Regularization 1 x ⇥ argmin G(x) = ||y 2 x RQ ⇥G(x) = || · ||1 (x)i = ( x x||2 + ||x||1 y) + ⇥|| · ||1 (x) sign(xi ) if xi ⇥= 0, [ 1, 1] if xi = 0. Support of the solution: I = {i ⇥ {0, . . . , N 1} xi ⇤= 0} xi i
  • 18. Example: 1 Regularization 1 x ⇥ argmin G(x) = ||y 2 x RQ ⇥G(x) = || · ||1 (x)i = ( x x||2 + ||x||1 y) + ⇥|| · ||1 (x) sign(xi ) if xi ⇥= 0, [ 1, 1] if xi = 0. xi i Support of the solution: I = {i ⇥ {0, . . . , N 1} xi ⇤= 0} First-order conditions: s RN , ( x i, y) + s = 0 sI = sign(xI ), ||sI c || 1. y x i
  • 19. Example: Total Variation Denoising Important: the optimization variable is f . 1 f ⇥ argmin ||y f ||2 + J(f ) f RN 2 Finite di erence gradient: Discrete TV norm: :R J(f ) = i = 0 (noisy) N R N 2 ||( f )i || ( f )i R2
  • 20. Example: Total Variation Denoising 1 f ⇥ argmin ||y f RN 2 J(f ) = G( f ) f ||2 + J(f ) G(u) = i Composition by linear maps: J(f ) = ⇥G(u)i = (J ||ui || A) = A ( J) A div ( G( f )) ui ||ui || if ui ⇥= 0, R2 || || 1 if ui = 0.
  • 21. Example: Total Variation Denoising 1 f ⇥ argmin ||y f RN 2 J(f ) = G( f ) f ||2 + J(f ) G(u) = i (J Composition by linear maps: J(f ) = ⇥G(u)i = A) = A ( J) A div ( G( f )) if ui ⇥= 0, R2 || || 1 ui ||ui || First-order conditions: ⇥i ⇥i ||ui || I, vi = I c , ||vi || v fi || fi || , 1 RN if ui = 0. 2 , f = y + div(v) I = {i (⇥f )i = 0}
  • 22. Overview • Subdifferential Calculus • Proximal Calculus • Forward Backward • Douglas Rachford • Generalized Forward-Backward • Duality
  • 23. Proximal Operators Proximal operator of G: 1 Prox G (x) = argmin ||x 2 z z||2 + G(z)
  • 24. Proximal Operators Proximal operator of G: 1 Prox G (x) = argmin ||x 2 z G(x) = ||x||1 = z||2 + G(z) log(1 + x2 ) |x| ||x||0 12 i |xi | 10 8 6 4 2 0 G(x) = ||x||0 = | {i xi = 0} | G(x) = i log(1 + |xi |2 ) G(x) −2 −10 −8 −6 −4 −2 0 2 4 6 8 10
  • 25. Proximal Operators Proximal operator of G: 1 Prox G (x) = argmin ||x 2 z G(x) = ||x||1 = Prox G (x)i z||2 + G(z) 12 i |xi | = max 0, 1 10 8 |xi | G(x) = ||x||0 = | {i xi = 0} | Prox G (x)i log(1 + x2 ) |x| ||x||0 = xi if |xi | 0 otherwise. xi 6 4 2 0 G(x) −2 −10 2 , −8 −6 −4 −2 0 2 4 6 8 10 10 8 6 4 2 0 G(x) = i log(1 + |xi |2 ) 3rd order polynomial root. −2 −4 −6 ProxG (x) −8 −10 −10 −8 −6 −4 −2 0 2 4 6 8 10
  • 26. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
  • 27. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1 Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1
  • 28. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1 Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1 Composition by tight frame: A A = Id ProxG A (x) =A ProxG A + Id A A
  • 29. Proximal Calculus G(x) = G1 (x1 ) + . . . + Gn (xn ) Separability: ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1 Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1 Composition by tight frame: A A = Id ProxG A (x) Indicators: Prox G (x) =A G(x) = ProxG A + Id z C A x C (x) = ProjC (x) = argmin ||x A C z|| ProjC (x)
  • 30. Prox and Subdifferential Resolvant of G: z = Prox x G (x) 0 (Id + ⇥G)(z) z x + ⇥G(z) z = (Id + ⇥G) 1 (x) Inverse of a set-valued mapping: where x Prox G U (y) = (Id + ⇥G) y 1 U 1 (x) is a single-valued mapping
  • 31. Prox and Subdifferential Resolvant of G: z = Prox x G (x) 0 (Id + ⇥G)(z) z x + ⇥G(z) z = (Id + ⇥G) 1 (x) Inverse of a set-valued mapping: where x Prox G Fix point: U (y) = (Id + ⇥G) x y 1 U (x) is a single-valued mapping argmin G(x) x 0 1 G(x ) x⇥ = (Id + ⇥G) x 1 (Id + ⇥G)(x ) (x⇥ ) = Prox (x⇥ ) G
  • 32. Gradient and Proximal Descents x( +1) = x( ) G(x( ) ) Gradient descent: G is C 1 and G is L-Lipschitz Theorem: If 0 < < 2/L, x( ) [explicit] x a solution.
  • 33. Gradient and Proximal Descents x( +1) = x( ) G(x( ) ) Gradient descent: G is C 1 and G is L-Lipschitz Theorem: < 2/L, x( If 0 < Sub-gradient descent: x( Theorem: If +1) = x( 1/⇥, x( Problem: slow. ) ) ) [explicit] x a solution. v( ) , v( ) x a solution. G(x( ) )
  • 34. Gradient and Proximal Descents x( +1) = x( ) G(x( ) ) Gradient descent: G is C 1 and G is L-Lipschitz Theorem: < 2/L, x( If 0 < Sub-gradient descent: x( Theorem: +1) = x( 1/⇥, x( If ) ) [explicit] x a solution. v( ) , v( ) G(x( ) ) x a solution. ) Problem: slow. Proximal-point algorithm: x(⇥+1) = Prox Theorem: c > 0, x( If Prox G ) (x(⇥) ) [implicit] G x a solution. hard to compute.
  • 35. Overview • Subdifferential Calculus • Proximal Calculus • Forward Backward • Douglas Rachford • Generalized Forward-Backward • Duality
  • 37. Proximal Splitting Methods Solve min E(x) x H is not available. Problem: Prox Splitting: E(x) = F (x) + E Smooth Gi (x) i Simple
  • 38. Proximal Splitting Methods Solve min E(x) x H is not available. Problem: Prox Splitting: E(x) = F (x) + E Smooth Gi (x) i Iterative algorithms using: Forward-Backward: solves Simple F (x) Prox Gi (x) F + G Douglas-Rachford: Gi Primal-Dual: Gi A Generalized FB: F+ Gi
  • 39. Smooth + Simple Splitting Inverse problem: f0 K Model: f0 = measurements Kf0 y = Kf0 + w K : RN x0 sparse in dictionary Sparse recovery: f = RP , P . x where x solves min F (x) + G(x) x RN Smooth Simple 1 Data fidelity: F (x) = ||y x||2 2 Regularization: G(x) = ||x||1 = |xi | i =K ⇥ N
  • 40. Forward-Backward Fix point equation: x argmin F (x) + G(x) x (x 0 F (x ) + G(x ) F (x )) x + ⇥G(x ) x⇥ = Prox (x⇥ G F (x⇥ ))
  • 41. Forward-Backward Fix point equation: x argmin F (x) + G(x) x (x 0 F (x ) + G(x ) F (x )) x + ⇥G(x ) x⇥ = Prox Forward-backward: x(⇥+1) = Prox (x⇥ G G x(⇥) F (x⇥ )) F (x(⇥) )
  • 42. Forward-Backward Fix point equation: x argmin F (x) + G(x) x (x 0 F (x ) + G(x ) F (x )) x + ⇥G(x ) x⇥ = Prox Forward-backward: (x⇥ G x(⇥+1) = Prox Projected gradient descent: G= G C x(⇥) F (x⇥ )) F (x(⇥) )
  • 43. Forward-Backward Fix point equation: x argmin F (x) + G(x) x F (x ) + G(x ) F (x )) (x 0 x + ⇥G(x ) x⇥ = Prox Forward-backward: x(⇥+1) = Prox G= Projected gradient descent: Theorem: If < 2/L, (x⇥ G Let x( ) G x(⇥) F (x⇥ )) F (x(⇥) ) C F be L-Lipschitz. x a solution of ( )
  • 44. Example: L1 Regularization 1 min || x x 2 y||2 + ||x||1 1 F (x) = || x 2 min F (x) + G(x) x y||2 F (x) = ( x G(x) = ||x||1 Prox G (x)i Forward-backward L = || y) = max 0, 1 ⇥ |xi | || xi Iterative soft thresholding
  • 45. Convergence Speed min E(x) = F (x) + G(x) x F is L-Lipschitz. G is simple. Theorem: If L > 0, FB iterates x( E(x( ) ) E(x ) C degrades with L C/ 0. ) satisfies
  • 46. Multi-steps Accelerations t(0) = 1 Beck-Teboule accelerated FB: ✓ ◆ 1 (`+1) (`) x = Prox1/L y rF (y (`) ) L 1+ 1 + 4(t( ) )2 t( +1) = 2() t 1 ( ( +1) ( +1) y =x + ( +1) (x t +1) x( ) ) (see also Nesterov method) Theorem: If L > 0, ( ) E(x ) E(x ) C Complexity theory: optimal in a worse-case sense.
  • 47. Overview • Subdifferential Calculus • Proximal Calculus • Forward Backward • Douglas Rachford • Generalized Forward-Backward • Duality
  • 48. Douglas Rachford Scheme min G1 (x) + G2 (x) x Simple ( ) Simple Douglas-Rachford iterations: z (⇥+1) = 1 x(`+1) 2 z (⇥) + 2 = Prox G1 (z (`+1) ) Reflexive prox: RProx G (x) RProx = 2Prox G2 G (x) RProx x (z (⇥) ) G1
  • 49. Douglas Rachford Scheme min G1 (x) + G2 (x) x Simple ( ) Simple Douglas-Rachford iterations: z (⇥+1) = 1 x(`+1) z (⇥) + 2 2 = Prox G1 (z (`+1) ) Reflexive prox: RProx Theorem: x( G (x) = 2Prox If 0 < ) RProx x G2 G (x) RProx x < 2 and ⇥ > 0, a solution of ( ) (z (⇥) ) G1
  • 50. DR Fix Point Equation min G1 (x) + G2 (x) 0 x z, z x x = Prox (G1 + G2 )(x) ⇥( G1 )(x) and x G1 (z) and (2x z) ⇥( G2 )(x) z x ⇥( G2 )(x)
  • 51. DR Fix Point Equation min G1 (x) + G2 (x) 0 x z, z ⇥( G1 )(x) and x x x = Prox (G1 + G2 )(x) G1 (z) x = Prox and (2x ⇥( G2 )(x) z G2 ⇥( G2 )(x) x z) = Prox G2 (2x z) RProx G1 (z) z = 2Prox G2 RProx G1 (y) (2x z = 2Prox G2 RProx G1 (z) RProx G1 (z) RProx G1 (z) z = RProx z= 1 2 G2 RProx z+ 2 z) G1 (z) RProx G2
  • 52. Example: Constrainted L1 min ||x||1 min G1 (x) + G2 (x) x=y C = {x x = y} G1 (x) = iC (x), Prox x G1 (x) = ProjC (x) = x + G2 (x) = ||x||1 Prox e⇥cient if G2 (x) = ⇥ ( ⇥ ) max 0, 1 easy to invert. 1 (y |xi | x) xi i
  • 53. Example: Constrainted L1 min ||x||1 min G1 (x) + G2 (x) x=y C = {x x = y} G1 (x) = iC (x), Prox x G1 (x) = ProjC (x) = x + G2 (x) = ||x||1 Prox e⇥cient if G2 (x) = ⇥ ( easy to invert. Example: compressed sensing y = x0 400 Gaussian matrix ||x0 ||0 = 17 ) 1 max 0, 1 1 R100 ⇥ (y x) xi |xi | i log10 (||x( ) ||1 ||x ||1 ) 0 −1 −2 −3 −4 −5 = 0.01 =1 = 10 50 100 150 200 250
  • 54. More than 2 Functionals min G1 (x) + . . . + Gk (x) x min (x1 ,...,xk ) each Fi is simple G(x1 , . . . , xk ) + ◆C (x1 , . . . , xk ) G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk ) C = (x1 , . . . , xk ) Hk x1 = . . . = xk
  • 55. More than 2 Functionals each Fi is simple min G1 (x) + . . . + Gk (x) x min (x1 ,...,xk ) G(x1 , . . . , xk ) + ◆C (x1 , . . . , xk ) G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk ) C = (x1 , . . . , xk ) G and Prox Prox C Hk x1 = . . . = xk are simple: G (x1 , . . . , xk ) = (Prox Gi (xi ))i ⇥C (x1 , . . . , xk ) = (˜, . . . , x) x ˜ 1 where x = ˜ k xi i
  • 56. Auxiliary Variables: DR Linear map A : E min G1 (x) + G2 A(x) x min G(z) + z⇥H E G1 , G2 simple. C (z) G(x, y) = G1 (x) + G2 (y) C = {(x, y) ⇥ H E Ax = y} H.
  • 57. Auxiliary Variables: DR Linear map A : E min G1 (x) + G2 A(x) x min G(z) + z⇥H E G1 , G2 simple. C (z) G(x, y) = G1 (x) + G2 (y) C = {(x, y) ⇥ H Prox G (x, y) = (Prox G1 (x), Prox G2 (y)) ˜ Prox C (x, y) = (x + A y , y where E Ax = y} x x y ) = (˜, A˜) ˜ y = (Id + AA ) ˜ 1 (Ax x = (Id + A A) ˜ 1 (A y + x) y) e cient if Id + AA or Id + A A easy to invert. H.
  • 58. Example: TV Regularization 1 min ||Kf y||2 + ||⇥f ||1 f 2 min G1 (f ) + G2 (f ) ||u||1 = i ||ui || x G1 (u) = ||u||1 1 G2 (f ) = ||Kf 2 C = (f, u) ⇥ RN Prox G1 (u)i y||2 RN Prox 2 = max 0, 1 G2 u = ⇤f ˜ ˜ Prox C (f, u) = (f , f ) ||ui || = (Id + K K) ui 1 K
  • 59. Example: TV Regularization 1 min ||Kf y||2 + ||⇥f ||1 f 2 min G1 (f ) + G2 (f ) ||u||1 = i ||ui || x G1 (u) = ||u||1 1 G2 (f ) = ||Kf 2 C = (f, u) ⇥ RN Prox G1 (u)i y||2 RN Prox 2 = max 0, 1 G2 ||ui || = (Id + K K) ui 1 K u = ⇤f ˜ ˜ Prox C (f, u) = (f , f ) Compute the solution of: (Id + ˜ )f = div(u) + f O(N log(N )) operations using FFT.
  • 60. Example: TV Regularization Orignal f0 y = Kx0 y = f0 + w Recovery f Iteration
  • 61. Overview • Subdifferential Calculus • Proximal Calculus • Forward Backward • Douglas Rachford • Generalized Forward-Backward • Duality
  • 62. GFB Splitting n min F (x) + x RN (⇥+1) (⇥) zi = zi + n 1 ( +1) x = n Proxn i=1 ( ) i=1 Smooth i = 1, . . . , n, Gi (x) ( +1) zi G Simple (2x (⇥) (⇥) zi F (x(⇥) )) x(⇥)
  • 63. GFB Splitting n min F (x) + x RN (⇥+1) (⇥) zi = zi + n 1 ( +1) = x n Simple Proxn G (2x (⇥) (⇥) zi F (x(⇥) )) x(⇥) ( +1) zi i=1 Theorem: If ( ) i=1 Smooth i = 1, . . . , n, Gi (x) < 2/L, Let x( ) F be L-Lipschitz. x a solution of ( )
  • 64. GFB Splitting n min F (x) + x RN (⇥+1) (⇥) zi = zi + n 1 ( +1) = x n Proxn Simple G (2x (⇥) (⇥) zi F (x(⇥) )) x(⇥) ( +1) zi i=1 Theorem: If ( ) i=1 Smooth i = 1, . . . , n, Gi (x) < 2/L, n=1 F =0 Let x( ) F be L-Lipschitz. x a solution of ( ) Forward-backward. Douglas-Rachford.
  • 65. GFB Fix Point x argmin F (x) + x RN yi i Gi (x) Gi (x ), 0 F (x ) + F (x ) + i yi =0 i Gi (x )
  • 66. GFB Fix Point x argmin F (x) + x RN i Gi (x) Gi (x ), yi (zi )n , i=1 1 i, x n x = i zi 1 n 0 F (x ) + zi F (x ) + i yi Gi (x ) =0 F (x ) (use zi = x i ⇥Gi (x ) F (x ) N yi )
  • 67. GFB Fix Point x argmin F (x) + x RN i Gi (x) Gi (x ), yi (zi )n , i=1 i zi 1 n (2x zi x⇥ = Proxn F (x ) + F (x ) + 1 i, x n x = 0 i yi (use zi = x F (x )) (2x⇥ Gi zi = zi + Proxn G ⇥Gi (x ) F (x ) N yi ) n ⇥Gi (x ) x F (x⇥ )) zi (2x⇥ Gi (x ) =0 F (x ) zi i zi F (x⇥ )) x⇥
  • 68. GFB Fix Point x argmin F (x) + x RN i Gi (x) Gi (x ), yi (zi )n , i=1 i zi 1 n (2x zi x⇥ = Proxn i yi (use zi = x F (x )) (2x⇥ Gi G Gi (x ) ⇥Gi (x ) F (x ) N yi ) n ⇥Gi (x ) x F (x⇥ )) zi (2x⇥ i =0 F (x ) zi zi = zi + Proxn + F (x ) + F (x ) + 1 i, x n x = 0 zi F (x⇥ )) x⇥ Fix point equation on (x , z1 , . . . , zn ).
  • 69. Block Regularization 1 2 block sparsity: G(x) = b B iments 2 + (2) ` 1 `2 4 k=1 N: 256 x x2 m m b Towards More Complex Penalization Bk 1,2 ⇥ x⇥⇥1 = i ⇥xi ⇥ b Image f = ||x[b] ||2 = ||x[b] ||, B x Coe cients x. b B i xi2 b b B1 b B2 + i b xi i b xi
  • 70. Block Regularization 1 2 block sparsity: G(x) = b B ||x[b] ||, ||x[b] ||2 = x2 m m b ... B Non-overlapping decomposition: B = B iments Towards More Complex Penalization Towards More Complex Penalization Towards More Complex Penalization 2 1 n (2) G(x) =4 x iBk (x) + ` ` k=1 G 1,2 1 2 N: 256 Gi (x) = b Bi i=1 ⇥= ⇥ x⇥x⇥x⇥⇥1 =i ⇥x⇥x⇥xi ⇥ ⇥ ⇥1 ⇥1 = i i ⇥i i ⇥ b Image f = ||x[b] ||, bb B B i Bb xii2bi2xi2 bbx i B x Coe cients x. n Blocks B1 22 b b 1b1 B1 i b xiixb xi BB i b i ++ + b b 2b2 B2 i BB B1 xi2 b2xi b b xi i B2
  • 71. Block Regularization 1 2 block sparsity: G(x) = b B ||x[b] ||, ||x[b] ||2 = x2 m m b ... B Non-overlapping decomposition: B = B iments Towards More Complex Penalization Towards More Complex Penalization Towards More Complex Penalization 2 1 n (2) G(x) =4 x iBk (x) + ` ` k=1 G 1,2 1 2 Gi (x) = b Bi i=1 ||x[b] ||, Each Gi is simple: ⇥ ⇥1 = i ⇥i i ⇥ x⇥x⇥x⇥⇥1 =i ⇥xG ⇥xi ⇥ m = b B B i b xii2bi2xi2 = Bb ⇤ m ⇥ b ⇥ Bi , ⇥ ⇥1Prox i ⇥xi ⇥(x) b max i0, 1 bx N: 256 b Image f = B x Coe cients x. n Blocks B1 22 b b 1b1 B1 i b xiixb xi BB i b i ||x[b]b||B b B b ++m x + 2 2 B2 B1 i xi2 b2xi b b xi i B2
  • 72. 10 10 x+1,2` 1 `2 k=1 Numerical Numerical Experiments Experiments 1 1 1 0 log10(E−Emin) log10(E−Emin) tmin : 283s; t : 298s; t :: 283s; t : 298s; t (2) t CP 2 + 368s ||y x 1 ⇥x||368s PRx 2 minix(x)Y ⇥ K PR Deconvolution +GCP: 1` 4 −1 EFB −1 EFB Deconvolution min 2 Y ⇥ K ` x 102 10 40 20 30 1 2 2 40k=1 20 30 EFB iteration # i EFB iteration 3 # 3 0 log10(E−Emin) x k=1 Numerical Illustration log (E− log (E−E Deconv. + Inpaint. 2min+CP Y ⇥ P K x CP Y + P 1 K2 x Deconv. x 2Inpaint. min 2 ⇥ ` ` 2 PR CP = convolution 2 x PR CP 2 λ Bk 2 TI (2)`2 4 x= + `wavelets x k=1 1,2 1 λ2 : 1.30e−03; : 1.30e−03; = inpainting+convolution l1/l2 l1/l2 tEFB: 161s; tPR: 173s; tCP N: 256 190s t : 161s; noise: 0.025; :convol.: 2 t : 173s; t 190s noise: 0.025; convol.::it. #50; SNR: 22.49dB #50; SNR: 22.49dB it. 2 N: 256 EFB PR CP 3 2 Numerical Experiments 1 0 onv. + Inpaint. minx 2 10 20 1 EFB 0 3 PR 1 CP 2 30 2 1 iteration # 1 0 0 Y ⇥P K 10 40 20 x 2 + 30 iteration # EFB PR (4) CP `140`2 16 k=1 x λ4 : 1.00e−03; l1/l2 Bk 1,2 λ4 : 1.00e−03; l1/l2 10 min tEFB: 283s; tPR: 298s; tCP: 368s it. #50; SNR: 21.80dB #50; SNR: 21.80dB noise: 0.025; degrad.: 0.4; 0.025; degrad.: 0.4; convol.: 2 it. noise: convol.: 2 −1 −1 10 20 30 EFB PR CP iteration # 3 2 1 40 10 20 30 x0 40 iteration # λ2 : 1.30e−03; l1/l2 noise: 0.025; it. #50; SNR: 22.49dB convol.: 2 noise: 0.025; convol.: 2 0 10 log10 20 iteration (E(x( ) ) # y = x0 + w E(x )) 30 40 4 x λ2 : 1.30e−03; l1/l2 it. #50; SNR: 22.49dB
  • 73. Overview • Subdifferential Calculus • Proximal Calculus • Forward Backward • Douglas Rachford • Generalized Forward-Backward • Duality
  • 74. Legendre-Fenchel Duality Legendre-Fenchel transform: G (u) = sup x dom(G) u, x G(x) eu lop S G(x) G (u) x
  • 75. Legendre-Fenchel Duality Legendre-Fenchel transform: G (u) = sup u, x G(x) x dom(G) Example: quadratic functional 1 G(x) = Ax, x + x, b 2 1 G (u) = u b, A 1 (u b) 2 eu lop S G(x) G (u) x
  • 76. Legendre-Fenchel Duality Legendre-Fenchel transform: G (u) = sup u, x G(x) x dom(G) Example: quadratic functional 1 G(x) = Ax, x + x, b 2 1 G (u) = u b, A 1 (u b) 2 G(x) G (u) Moreau’s identity: Prox G (x) = x G simple eu lop S ProxG/ (x/ ) G simple x
  • 77. Indicator and Homogeneous Positively 1-homogeneous functional: Example: norm Duality: G (x) = G(x) = ||x|| G (·) 1 (x) G( x) = |x|G(x) G (y) = min G(x) 1 x, y
  • 78. Indicator and Homogeneous Positively 1-homogeneous functional: Example: norm Duality: G (x) = G(x) = ||x|| G (·) 1 (x) G( x) = |x|G(x) G (y) = min G(x) 1 p norms: G(x) = ||x||p G (x) = ||x||q 1 1 + =1 p q 1 x, y p, q +
  • 79. Indicator and Homogeneous G( x) = |x|G(x) Positively 1-homogeneous functional: G(x) = ||x|| Example: norm Duality: G (x) = G (·) 1 (x) G (y) = min G(x) 1 p norms: G(x) = ||x||p G (x) = ||x||q 1 1 + =1 p q Example: Proximal operator of Prox ||·|| Proj||·||1 = Id norm Proj||·||1 (x)i = max 0, 1 |xi | for a well-chosen ⇥ = ⇥ (x, ) xi 1 x, y p, q +
  • 80. Primal-dual Formulation A:H⇥ Fenchel-Rockafellar duality: L linear min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui x2H x u2L G⇤ (u) 2
  • 81. Primal-dual Formulation A:H⇥ Fenchel-Rockafellar duality: L linear min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui x x2H u2L G⇤ (u) 2 Strong duality: 0 2 ri(dom(G2 )) (min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui 2 = max G⇤ (u) 2 u u A ri(dom(G1 )) x G⇤ ( 1 A⇤ u)
  • 82. Primal-dual Formulation A:H⇥ Fenchel-Rockafellar duality: L linear min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui x x2H u2L G⇤ (u) 2 Strong duality: 0 2 ri(dom(G2 )) (min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui 2 = max G⇤ (u) 2 u u A ri(dom(G1 )) x G⇤ ( 1 Recovering x? from some u? : x? = argmin G1 (x? ) + hx? , A⇤ u? i x A⇤ u)
  • 83. Primal-dual Formulation A:H⇥ Fenchel-Rockafellar duality: L linear min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui x x2H u2L G⇤ (u) 2 Strong duality: 0 2 ri(dom(G2 )) (min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui 2 = max G⇤ (u) 2 u u A ri(dom(G1 )) x G⇤ ( 1 A⇤ u) Recovering x? from some u? : x? = argmin G1 (x? ) + hx? , A⇤ u? i x () A⇤ u? 2 @G1 (x? ) () x? 2 (@G1 ) 1 ( A⇤ u? ) = @G⇤ ( A⇤ u? ) 1
  • 84. Forward-Backward on the Dual If G1 is strongly convex: G1 (tx + (1 r2 G1 > cId t)y) 6 tG1 (x) + (1 t)G1 (y) c t(1 2 t)||x y||2
  • 85. Forward-Backward on the Dual If G1 is strongly convex: G1 (tx + (1 r2 G1 > cId t)y) 6 tG1 (x) + (1 x? uniquely defined. G? is of class C 1 . 1 t)G1 (y) c t(1 2 t)||x x? = rG? ( A⇤ u? ) 1 y||2
  • 86. Forward-Backward on the Dual r2 G1 > cId If G1 is strongly convex: G1 (tx + (1 t)y) 6 tG1 (x) + (1 x? uniquely defined. G? is of class C 1 . 1 FB on the dual: t)G1 (y) c t(1 2 t)||x x? = rG? ( A⇤ u? ) 1 min G1 (x) + G2 A(x) x2H = min G? ( A⇤ u) + G? (u) 1 2 u2L Simple Smooth ⇣ u(`+1) = Prox⌧ G? u(`) + ⌧ A⇤ rG? ( A⇤ u(`) ) 1 2 ⌘ y||2
  • 87. Example: TV Denoising 1 min ||f f RN 2 y||2 + ||⇥f ||1 ||u||1 = Dual solution u i ||ui || min ||y + div(u)||2 ||u|| ||u|| = max ||ui || i Primal solution f = y + div(u ) [Chambolle 2004]
  • 88. Example: TV Denoising 1 min ||f f RN 2 min ||y + div(u)||2 y||2 + ||⇥f ||1 ||u||1 = Dual solution u i ||u|| ||u|| ||ui || +1) = Proj||·|| i Primal solution f = y + div(u ) FB (aka projected gradient descent): u( = max ||ui || u( ) + [Chambolle 2004] (y + div(u( ) )) ui v = Proj||·|| (u) vi = max(||ui ||/ , 1) 2 1 < = Convergence if ||div ⇥|| 4
  • 89. Primal-Dual Algorithm min G1 (x) + G2 A(x) x H () min max G1 (x) x z G⇤ (z) + hA(x), zi 2
  • 90. Primal-Dual Algorithm min G1 (x) + G2 A(x) x H G⇤ (z) + hA(x), zi 2 () min max G1 (x) x z z (`+1) = Prox G⇤ 2 x(⇥+1) = Prox (x(⇥) G1 x( ˜ + (x( +1) = x( +1) (z (`) + A(˜(`) ) x A (z (⇥) )) +1) x( ) ) = 0: Arrow-Hurwicz algorithm. = 1: convergence speed on duality gap.
  • 91. Primal-Dual Algorithm min G1 (x) + G2 A(x) x H G⇤ (z) + hA(x), zi 2 () min max G1 (x) x z z (`+1) = Prox G⇤ 2 x(⇥+1) = Prox (x(⇥) G1 x( ˜ + (x( +1) = x( +1) (z (`) + A(˜(`) ) x A (z (⇥) )) +1) x( ) ) = 0: Arrow-Hurwicz algorithm. = 1: convergence speed on duality gap. Theorem: [Chambolle-Pock 2011] If 0 x( ) 1 and ⇥⇤ ||A||2 < 1 then x minimizer of G1 + G2 A.
  • 92. Conclusion Inverse problems in imaging: Large scale, N 106 . Non-smooth (sparsity, TV, . . . ) (Sometimes) convex. Highly structured (separability, p norms, . . . ).
  • 93. Conclusion Inverse problems in imaging: Large scale, N 106 . Towards More Complex Penalization Non-smooth (sparsity, TV, . . . ) (Sometimes) convex. ⇥ x⇥⇥1 = i ⇥xi ⇥ b B Highly structured (separability, b B1 2 i p xi b + 2 i b xi norms, . . . ). b B2 Proximal splitting: Unravel the structure of problems. Parallelizable. Decomposition G = k Gk i xi2 b
  • 94. Conclusion Inverse problems in imaging: Large scale, N 106 . Towards More Complex Penalization Non-smooth (sparsity, TV, . . . ) (Sometimes) convex. ⇥ x⇥⇥1 = i ⇥xi ⇥ b B Highly structured (separability, 2 i p xi b Proximal splitting: Unravel the structure of problems. b B1 + 2 i b xi norms, . . . ). b B2 Parallelizable. Open problems: Less structured problems without smoothness. Decomposition G = k Gk Non-convex optimization. i xi2 b