This document summarizes proximal splitting and optimal transport methods. It begins with an overview of topics including optimal transport and imaging, convex analysis, and various proximal splitting algorithms. It then discusses measure-preserving maps between distributions and defines the optimal transport problem. Finally, it presents formulations for optimal transport including the convex Benamou-Brenier formulation and discrete formulations on centered and staggered grids. Numerical examples of optimal transport between distributions on 2D domains are also shown.
Model Selection with Piecewise Regular GaugesGabriel Peyré
Talk given at Sampta 2013.
The corresponding paper is :
Model Selection with Piecewise Regular Gauges (S. Vaiter, M. Golbabaee, J. Fadili, G. Peyré), Technical report, Preprint hal-00842603, 2013.
http://hal.archives-ouvertes.fr/hal-00842603/
Model Selection with Piecewise Regular GaugesGabriel Peyré
Talk given at Sampta 2013.
The corresponding paper is :
Model Selection with Piecewise Regular Gauges (S. Vaiter, M. Golbabaee, J. Fadili, G. Peyré), Technical report, Preprint hal-00842603, 2013.
http://hal.archives-ouvertes.fr/hal-00842603/
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
Presentation at ICIP 2016.
Slide 4, there is a typo, replace absolute value by parenthesis. The cross-ratio can be negative and we use the principal complex logarithm
"The Metropolis adjusted Langevin Algorithm
for log-concave probability measures in high
dimensions", talk by Andreas Elberle at the BigMC seminar, 9th June 2011, Paris
A crash coarse in stochastic Lyapunov theory for Markov processes (emphasis is on continuous time)
See also the survey for models in discrete time,
https://netfiles.uiuc.edu/meyn/www/spm_files/MarkovTutorial/MarkovTutorialUCSB2010.html
Gibbs flow transport for Bayesian inferenceJeremyHeng10
Minisymposium on "Selected topics in computation and dynamics: machine learning and multiscale methods" at SciCADE 2019, Innsbruck, July 2019.
https://scicade2019.uibk.ac.at/
Slides are based on the article in https://arxiv.org/abs/1509.08787
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
Presentation at ICIP 2016.
Slide 4, there is a typo, replace absolute value by parenthesis. The cross-ratio can be negative and we use the principal complex logarithm
"The Metropolis adjusted Langevin Algorithm
for log-concave probability measures in high
dimensions", talk by Andreas Elberle at the BigMC seminar, 9th June 2011, Paris
A crash coarse in stochastic Lyapunov theory for Markov processes (emphasis is on continuous time)
See also the survey for models in discrete time,
https://netfiles.uiuc.edu/meyn/www/spm_files/MarkovTutorial/MarkovTutorialUCSB2010.html
Gibbs flow transport for Bayesian inferenceJeremyHeng10
Minisymposium on "Selected topics in computation and dynamics: machine learning and multiscale methods" at SciCADE 2019, Innsbruck, July 2019.
https://scicade2019.uibk.ac.at/
Slides are based on the article in https://arxiv.org/abs/1509.08787
Gibbs flow transport for Bayesian inferenceJeremyHeng10
Workshop on "Computational Statistics and Molecular Simulation: A Practical Cross-Fertilization", Casa Matematica Oaxaca (CMO), 13 November 2018
Accompanying video: http://www.birs.ca/events/2018/5-day-workshops/18w5023/videos/watch/201811131630-Heng.html
Workshop details: http://www.birs.ca/events/2018/5-day-workshops/18w5023
A short course I taught in 2002 at the University of Hawaii's Kauai Community College. This course was offered to professionals working on radar tracking systems for air, missile, surface, and subsurface vehicle tracking. The material is a decade old, so it does not cover the latest technology. However, it is an excellent primer for those just starting on the subject.
A Note on “ Geraghty contraction type mappings”IOSRJM
In this paper, a fixed point result for Geraghty contraction type mappings has been proved. Karapiner [2] assumes to be continuous. In this paper, the continuity condition of has been replaced by a weaker condition and fixed point result has been proved. Thus the result proved generalizes many known results in the literature [2-7].
The first report of Machine Learning Seminar organized by Computational Linguistics Laboratory at Kazan Federal University. See http://cll.niimm.ksu.ru/cms/lang/en_US/main/seminars/mlseminar
Slides of the lectures given at the summer school "Biomedical Image Analysis Summer School : Modalities, Methodologies & Clinical Research", Centrale Paris, Paris, July 9-13, 2012
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
2. Overview
• Optimal Transport and Imaging
• Convex Analysis and Proximal Calculus
• Forward Backward
• Douglas Rachford and ADMM
• Generalized Forward-Backward
• Primal-Dual Schemes
3. ork, Measure Preserving Maps
ica-
d ofDistributions µ0 , µ1 on Rk .
ase.
eeds
ans-
that
eme
rate
ance
eval
t al. µ0 µ1
4. ork, Measure Preserving Maps
ica-
d ofDistributions µ0 , µ1 on Rk .
ase.
eeds
Mass preserving map T : Rk Rk .
ans-
that µ1 = T µ0 where (T µ0 )(A) = µ0 (T (A))
1
eme
rate
ance x T (x)
eval
t al. µ0 µ1
5. ork, Measure Preserving Maps
ica-
d ofDistributions µ0 , µ1 on Rk .
ase.
eeds
Mass preserving map T : Rk Rk .
ans-
that µ1 = T µ0 where (T µ0 )(A) = µ0 (T (A))
1
eme
rate
ance x T (x)
eval
t al. µ0 µ1
Distributions with densities: µi = i (x)dx
T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x)
7. Optimal Transport
Lp optimal transport:
W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx)
T µ0 =µ1
Regularity condition:
µ0 or µ1 does not give mass to “small sets”.
Theorem (p > 1): there exists a unique optimal T .
T T
µ1
µ0
8. Optimal Transport
Lp optimal transport:
W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx)
T µ0 =µ1
Regularity condition:
µ0 or µ1 does not give mass to “small sets”.
Theorem (p > 1): there exists a unique optimal T .
Theorem (p = 2): T is defined as T = with convex.
T T T (x)
T (x ) T is monotone:
µ1 x T (x) T (x ), x x 0
µ0 x
9. Wasserstein Distance
µ
Couplings: µ, x
A Rd , ⇥(A Rd ) = µ(A) y
B Rd , ⇥(Rd B) = (B)
10. Wasserstein Distance
µ
Couplings: µ, x
A Rd , ⇥(A Rd ) = µ(A) y
B Rd , ⇥(Rd B) = (B)
Transportation cost:
Wp (µ, )p = min c(x, y)d⇥(x, y)
µ, Rd Rd
11. Wasserstein Distance
µ
Couplings: µ, x
A Rd , ⇥(A Rd ) = µ(A) y
B Rd , ⇥(Rd B) = (B)
Transportation cost:
Wp (µ, )p = min c(x, y)d⇥(x, y)
µ, Rd Rd
12. Optimal Transport
Let p > 1 and µ does not vanish on small sets.
Unique µ, s.t. Wp (µ, )p = c(x, y)d⇥(x, y)
Rd Rd
Optimal transport T : Rd Rd :
µ
x
y
(x, T (x))
13. Optimal Transport
Let p > 1 and µ does not vanish on small sets.
Unique µ, s.t. Wp (µ, )p = c(x, y)d⇥(x, y)
Rd Rd
Optimal transport T : Rd Rd :
µ
x
p = 2: T = unique solution of y
⇥ is convex l.s.c. (x, T (x))
( ⇥)⇤µ =
15. 1-D Continuous Wasserstein
Distributions µ, on R.
t
Cumulative functions: Cµ (t) = dµ(x)
For all p > 1: T =C 1
Cµ
T is non-decreasing (“change of contrast”)
Explicit formulas:
1 H
Wp (µ, )p = |Cµ 1 C 1 p
|
0
W1 (µ, ) = |Cµ C | = ||(Cµ C ) ⇥ H||1
R
17. Grayscale Histogram Transfer
f1
Input images: fi : [0, 1] 2
[0, 1], i = 0, 1.
Gray-value distributions: µi defined on [0, 1].
µi ([a, b]) = 1{a f b} (x)dx
[0,1]2
µ1
f0
µ0
18. Grayscale Histogram Transfer
f1
Input images: fi : [0, 1] 2
[0, 1], i = 0, 1.
Gray-value distributions: µi defined on [0, 1].
µi ([a, b]) = 1{a f b} (x)dx
[0,1]2
Optimal transport: T = Cµ11 Cµ0 . µ1
f0 Cµ0 (f0 ) T (f0 )
Cµ0 Cµ11
µ0 µ1
19. pplication to Color Transfer
Color Histogram Equalization
1
Input color images: fi RN 3 . projection iof= to style
Sliced Wasserstein
⇥ X
N x
fi (x)
image color statistics Y
Optimal transport framework Sliced Wasserstein projection Applications
Application to Color Transfer
Source image (X )
f1 f0
Sliced Wasserstein project
image color statistics Y
f0
Source image after color transfer
µ1 image (Y )
Style Source image (X )
µ0
J. Rabin Wasserstein Regularization
20. pplication to Color Transfer
Color Histogram Equalization
1
Input color images: fi RN 3 . projection iof= to style
Sliced Wasserstein
⇥ X
N x
fi (x)
image color statistics Y
Optimal assignement: min ||f0 f1 ⇥ ||
N
Optimal transport framework Sliced Wasserstein projection Applications
Application to Color Transfer
Source image (X )
f1 f0
Sliced Wasserstein project
image color statistics Y
f0
Source image after color transfer
µ1 image (Y )
Style Source image (X )
µ0
J. Rabin Wasserstein Regularization
21. pplication to Color Transfer
Color Histogram Equalization
1
Input color images: fi RN 3 . projection iof= to style
Sliced Wasserstein
⇥ X
N x
fi (x)
image color statistics Y
Optimal assignement: min ||f0 f1 ⇥ ||
N
Transport: T : f0 (x) R3 f1 ( (i)) R3
Optimal transport framework Sliced Wasserstein projection Applications
Application to Color Transfer
Source image (X )
f1 f0
Sliced Wasserstein project
image color statistics Y
f0
Source image after color transfer
µ1 image (Y )
Style Source image (X )
µ0
T
J. Rabin Wasserstein Regularization
22. pplication to Color Transfer
Color Histogram Equalization
1
Input color images: fi RN 3 . projection iof= to style
Sliced Wasserstein
⇥ X
N x
fi (x)
image color statistics Y
Optimal assignement: min ||f0 f1 ⇥ ||
N
Optimal transport framework Sliced Wasserstein projection Applications
Transport: T : f0 (x) R3Application to Color Transfer R3
f1 ( (i))
Optimal transport framework Sliced Wasserstein projection Applications
˜ Application to ColorfTransfer
Equalization:) f0 = T (f0 ) ˜ = f1
0 Sliced Wasserstein projection of X to sty
Source image (X image color statistics Y
f1 f0 T (f0 )
Sliced Wasserstein project
image color statistics Y
Source image (X )
T
f0
Source image after color transfer
µ1 image (Y )
Style Source image (X )
µ0 Source image after color transfer
µ1
Style image (Y )
T J. Rabin Wasserstein Regularization
J. Rabin Wasserstein Regularization
23. cðdvÞ ¼ l0> þ dvÞ detðrðv þ dvÞÞ À l1 ¼ 0:
ðv
can be thought as an elliptic system thought as anThe sys-system of equations. The trilinearRelaxation was performed for transferring
v cc
> tem cv cc can be of equations. elliptic a sys-
the GPU. We used cubic grid. interpolation used a trilineara parallelizable four-
the GPU. We operator using interpolation operator for transferring
Image Registration
Ittem isto verify that a correction for dv can be obtained by solving with an
is easy solved using preconditioned conjugate gradient color Gauss-Seidel relaxation scheme. Thisrestriction
s solved using preconditioned conjugate À1 gradient with an the coarse grid residual increases robustness
the coarse grid correction to fine grids. Thecorrection to fine grids. The residual restriction
the system dv % c> ðcv c> Þ cðvÞ (Nocedal and Wright, 1999) The sys- and efficiency and is especially suited for the implementation on
incomplete Cholesky preconditioner.
mplete Cholesky preconditioner. v v
operator for projecting residual from for projecting residual from the fine to coarse grids is
operator the fine to coarse grids is
tem c c> can be thought as an elliptic system of equations. The sys-
v c the GPU. We used a trilinear interpolation operator for transferring
tem is solved using preconditioned conjugate gradient with an the coarse grid correction to fine grids. The residual restriction
incomplete Cholesky preconditioner. operator for projecting residual from the fine to coarse grids is
T
[ur Rehman et al, 2009]
Fig. 6. OMT Results viewed on an axial slice. The top row shows corresponding slices from Pre-op(Left) and Post-op(Right) MRI data. The deformation is clearly visible in the
anterior part of the brain.
28. Numerical Examples
⇢0 ⇢1
con-
work,
plica-
ad of
ease.
peeds
rans-
t that
heme
erate
mance
ieval
et al.
t
Figure 7: Synthetic 2D examples on a Euclidean domain. The
29. Discrete Formulation
s
Centered grid formulation (d = 1):
min J(x) + ◆C (x)
x2RGc ⇥2
P
J(x) = i2Gc j(xi ) t
Centered grid Gc
30. Discrete Formulation
s
Centered grid formulation (d = 1):
min J(x) + ◆C (x)
x2RGc ⇥2
P
J(x) = i2Gc j(xi ) t
Staggered grid formulation : Centered grid Gc
min 2 J(I(x)) + ◆C (x) s
1
x2RGst ⇥RGst
t
Staggered grid
1 2
Gst Gst
31. Discrete Formulation
s
Centered grid formulation (d = 1):
min J(x) + ◆C (x)
x2RGc ⇥2
P
J(x) = i2Gc j(xi ) t
Staggered grid formulation : Centered grid Gc
min 2 J(I(x)) + ◆C (x) s
1
x2RGst ⇥RGst
Interpolation operator:
1 2
Gst Gst
1 2
I = (I , I ) : R ⇥R ! RG c
t
2I1 (m)i,j = mi+ 1 ,j + mi
2
1
2 ,j
Staggered grid
! Projection on div(x) = 0 using FFTs. 1 2
Gst Gst
32. SOCP Formulation
P
min J(x) + ◆C (x) J(x) = i2Gc j(xi )
x2RGc ⇥d
X
() min ri s.t. 8 i 2 Gc , (mi , ⇢i , ri ) 2 K
x2RGc ⇥d ,r2RGc
i
(Rotated) Lorentz cone: K = (m, ⇢, r) 2 Rd+2 ||m||2 6 ⇢r
˜ ˜ ˜ ˜ ˜˜
33. SOCP Formulation
P
min J(x) + ◆C (x) J(x) = i2Gc j(xi )
x2RGc ⇥d
X
() min ri s.t. 8 i 2 Gc , (mi , ⇢i , ri ) 2 K
x2RGc ⇥d ,r2RGc
i
(Rotated) Lorentz cone: K = (m, ⇢, r) 2 Rd+2 ||m||2 6 ⇢r
˜ ˜ ˜ ˜ ˜˜
Second order cone program:
! Use interior point methods (e.g. MOSEK software).
Linear convergence with iteration #.
Poor scaling with dimension |Gc |.
E cient for medium scale problems (N ⇠ 104 ).
34. 1
Example: Regularization
Inverse problem: measurements y = x0 + w
x0 y
35. 1
Example: Regularization
Inverse problem: measurements y = x0 + w
x0 y x?
argmin
Regularized inversion: x? 2 argmin 1 ||y
2 x||2 + R(x)
x2R N
Data fidelity Regularity
36. 1
Example: Regularization
Inverse problem: measurements y = x0 + w
x0 y x?
argmin
Regularized inversion: x? 2 argmin 1 ||y
2 x||2 + R(x)
x2R N
Data fidelity Regularity
P
Total Variation: R(x) = i ||(rx)i ||
37. 1
Example: Regularization
Inverse problem: measurements y = x0 + w
x0 y x?
argmin
Regularized inversion: x? 2 argmin 1 ||y
2 x||2 + R(x)
x2R N
Data fidelity Regularity
P
Total Variation: R(x) = i ||(rx)i ||
1
P ⇤
` sparsity: R(x) = i |xi |
Images are sparse
in wavelet bases. ⇤
Image f = x Coe↵. x = f
38. Overview
• Optimal Transport and Imaging
• Convex Analysis and Proximal Calculus
• Forward Backward
• Douglas Rachford and ADMM
• Generalized Forward-Backward
• Primal-Dual Schemes
40. Convex Optimization
Setting: G : H R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
Problem: min G(x)
x H
Class of functions: x y
Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1]
41. Convex Optimization
Setting: G : H R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
Problem: min G(x)
x H
Class of functions: x y
Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1]
Lower semi-continuous: lim inf G(x) G(x0 )
x x0
Proper: {x ⇥ H G(x) ⇤= + } = ⌅
⇤
42. Convex Optimization
Setting: G : H R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
Problem: min G(x)
x H
Class of functions: x y
Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1]
Lower semi-continuous: lim inf G(x) G(x0 )
x x0
Proper: {x ⇥ H G(x) ⇤= + } = ⌅
⇤
0 if x ⇥ C,
Indicator: C (x) =
+ otherwise.
(C closed and convex)
44. Sub-differential
Sub-di erential:
G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧}
G(x) = |x|
Smooth functions:
If F is C 1 , F (x) = { F (x)}
G(0) = [ 1, 1]
45. Sub-differential
Sub-di erential:
G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧}
G(x) = |x|
Smooth functions:
If F is C 1 , F (x) = { F (x)}
G(0) = [ 1, 1]
First-order conditions:
x argmin G(x) 0 G(x )
x H
46. Sub-differential
Sub-di erential:
G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧}
G(x) = |x|
Smooth functions:
If F is C 1 , F (x) = { F (x)}
G(0) = [ 1, 1]
First-order conditions:
x argmin G(x) 0 G(x )
x H U (x)
x
Monotone operator: U (x) = G(x)
(u, v) U (x) U (y), y x, v u 0
48. Prox and Subdifferential
1
Prox G (x) = argmin ||x z||2 + G(z)
z 2
Resolvant of G:
z = Prox G (x) 0 z x + ⇥G(z)
x (Id + ⇥G)(z)
49. Prox and Subdifferential
1
Prox G (x) = argmin ||x z||2 + G(z)
z 2
Resolvant of G:
z = Prox G (x) 0 z x + ⇥G(z)
x (Id + ⇥G)(z) z = (Id + ⇥G) 1
(x)
Inverse of a set-valued mapping:
where x U (y) y U 1
(x)
Prox G = (Id + ⇥G) 1
is a single-valued mapping
50. Prox and Subdifferential
1
Prox G (x) = argmin ||x z||2 + G(z)
z 2
Resolvant of G:
z = Prox G (x) 0 z x + ⇥G(z)
x (Id + ⇥G)(z) z = (Id + ⇥G) 1
(x)
Inverse of a set-valued mapping:
where x U (y) y U 1
(x)
Prox G = (Id + ⇥G) 1
is a single-valued mapping
Fix point: x argmin G(x)
x
0 G(x ) x (Id + ⇥G)(x )
x⇥ = (Id + ⇥G) 1
(x⇥ ) = Prox G (x⇥ )
53. Proximal Calculus
Separability: G(x) = G1 (x1 ) + . . . + Gn (xn )
ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
1
Quadratic functionals: G(x) = || x y||2
2
Prox G = (Id + ) 1
= (Id + ) 1
Composition by tight frame: A A = Id
ProxG A (x) =A ProxG A + Id A A
54. Proximal Calculus
Separability: G(x) = G1 (x1 ) + . . . + Gn (xn )
ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
1
Quadratic functionals: G(x) = || x y||2
2
Prox G = (Id + ) 1
= (Id + ) 1
Composition by tight frame: A A = Id
ProxG A (x) =A ProxG A + Id A A
x
Indicators: G(x) = C (x)
C
Prox G (x) = ProjC (x) ProjC (x)
= argmin ||x z||
z C
55. Prox of Sparse Regularizers
1
Prox G (x) = argmin ||x z||2 + G(z)
z 2
60. Legendre-Fenchel Duality
Legendre-Fenchel transform:
G (u) = sup u, x G(x) eu
x dom(G) G(x) S lop
G (u)
Example: quadratic functional
1 x
G(x) = Ax, x + x, b
2
1
G (u) = u b, A 1 (u b)
2
Moreau’s identity:
Prox G (x) = x ProxG/ (x/ )
G simple G simple
61. Indicator and Homogeneous Functionals
Positively 1-homogeneous functional: G( x) = | |G(x)
Example: norm G(x) = ||x||
Duality: G (x) = G (·) 1 (x) G (y) = min x, y
G(x) 1
62. Indicator and Homogeneous Functionals
Positively 1-homogeneous functional: G( x) = | |G(x)
Example: norm G(x) = ||x||
Duality: G (x) = G (·) 1 (x) G (y) = min x, y
G(x) 1
p
norms: G(x) = ||x||p 1 1
+ =1 1 p, q +
G (x) = ||x||q p q
63. Indicator and Homogeneous Functionals
Positively 1-homogeneous functional: G( x) = | |G(x)
Example: norm G(x) = ||x||
Duality: G (x) = G (·) 1 (x) G (y) = min x, y
G(x) 1
p
norms: G(x) = ||x||p 1 1
+ =1 1 p, q +
G (x) = ||x||q p q
Example: Proximal operator of norm
Prox ||·|| = Id Proj||·||1
Proj||·||1 (x)i = max 0, 1 xi
|xi |
for a well-chosen ⇥ = ⇥ (x, )
64. Prox of the J Functional
X ||m||2
˜
J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) =
˜ ˜ for ⇢ > 0
˜
i
⇢˜
65. Prox of the J Functional
X ||m||2
˜
J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) =
˜ ˜ for ⇢ > 0
˜
i
⇢˜
Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i
66. Prox of the J Functional
X ||m||2
˜
J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) =
˜ ˜ for ⇢ > 0
˜
i
⇢˜
Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i
j ⇤ = ◆C where C = (a, b) 2 R2 ⇥ R 2||a||2 + b 6 0
Prox j (˜) = x
x ˜ ProjC (˜/ )
x where x = (m, ⇢)
˜ ˜ ˜
67. Prox of the J Functional
X ||m||2
˜
J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) =
˜ ˜ for ⇢ > 0
˜
i
⇢˜
Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i
j ⇤ = ◆C where C = (a, b) 2 R2 ⇥ R 2||a||2 + b 6 0
Prox j (˜) = x
x ˜ ProjC (˜/ )
x where x = (m, ⇢)
˜ ˜ ˜
⇢
(m? , ⇢? ) if ⇢? > 0
Proposition: Prox (m, ⇢) =
˜ ˜
(0, 0) otherwise.
⇢? m
˜
?
where m = ? and ⇢? is the largest root of
⇢ +2
X 3 + (4 ⇢)X 2 + 4 (
˜ ⇢)X
˜ ||m||2
˜ 4 2
⇢=0
˜
68. Overview
• Optimal Transport and Imaging
• Convex Analysis and Proximal Calculus
• Forward Backward
• Douglas Rachford and ADMM
• Generalized Forward-Backward
• Primal-Dual Schemes
69. Gradient and Proximal Descents
Gradient descent: x( +1) = x( ) G(x( ) ) [explicit]
G is C 1 and G is L-Lipschitz
Theorem: If 0 < < 2/L, x( )
x a solution.
70. Gradient and Proximal Descents
Gradient descent: x( +1) = x( ) G(x( ) ) [explicit]
G is C 1 and G is L-Lipschitz
Theorem: If 0 < < 2/L, x( )
x a solution.
Sub-gradient descent: x( +1)
= x( )
v( ) , v( )
G(x( ) )
Theorem: If 1/⇥, x( )
x a solution.
Problem: slow.
71. Gradient and Proximal Descents
Gradient descent: x( +1) = x( ) G(x( ) ) [explicit]
G is C 1 and G is L-Lipschitz
Theorem: If 0 < < 2/L, x( )
x a solution.
Sub-gradient descent: x( +1)
= x( )
v( ) , v( )
G(x( ) )
Theorem: If 1/⇥, x( )
x a solution.
Problem: slow.
Proximal-point algorithm: x(⇥+1) = Prox G (x(⇥) ) [implicit]
Theorem: If c > 0, x( )
x a solution.
Prox G hard to compute. [Rockafellar, 70]
73. Proximal Splitting Methods
Solve min E(x)
x H
Problem: Prox E is not available.
Splitting: E(x) = F (x) + Gi (x)
i
Smooth Simple
74. Proximal Splitting Methods
Solve min E(x)
x H
Problem: Prox E is not available.
Splitting: E(x) = F (x) + Gi (x)
i
Smooth Simple
F (x)
Iterative algorithms using:
Prox Gi (x)
solves
Forward-Backward: F + G
Douglas-Rachford: Gi
Primal-Dual: Gi A
Generalized FB: F+ Gi
75. Smooth + Simple Splitting
Inverse problem: measurements y = Kf0 + w
f0 Kf0
K K : RN RP , P N
Model: f0 = x0 sparse in dictionary .
Sparse recovery: f = x where x solves
min F (x) + G(x)
x RN
Smooth Simple
1
Data fidelity: F (x) = ||y x||2 =K ⇥
2
Regularization: G(x) = ||x||1 = |xi |
i
77. Forward-Backward
Fix point equation:
x argmin F (x) + G(x) 0 F (x ) + G(x )
x
(x F (x )) x + ⇥G(x )
x⇥ = Prox G (x⇥ F (x⇥ ))
Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )
78. Forward-Backward
Fix point equation:
x argmin F (x) + G(x) 0 F (x ) + G(x )
x
(x F (x )) x + ⇥G(x )
x⇥ = Prox G (x⇥ F (x⇥ ))
Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )
Projected gradient descent: G= C
79. Forward-Backward
Fix point equation:
x argmin F (x) + G(x) 0 F (x ) + G(x )
x
(x F (x )) x + ⇥G(x )
x⇥ = Prox G (x⇥ F (x⇥ ))
Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )
Projected gradient descent: G= C
Theorem: Let F be L-Lipschitz.
If < 2/L, x( )
x a solution of ( )
[Passty 79, Gabay, 83]
80. Example: L1 Regularization
1
min || x y||2 + ||x||1 min F (x) + G(x)
x 2 x
1
F (x) = || x y||2
2
F (x) = ( x y) L = || ||
G(x) = ||x||1
⇥
Prox G (x)i = max 0, 1 xi
|xi |
Forward-backward Iterative soft thresholding
81. Convergence Speed
min E(x) = F (x) + G(x)
x
F is L-Lipschitz.
G is simple.
Theorem: If L > 0, FB iterates x( )
satisfies
E(x( ) ) E(x ) C/
C degrades with L 0.
82. Multi-steps Accelerations
Beck-Teboule accelerated FB: t(0) = 1
✓ ◆
(`+1) (`) 1
x = Prox1/L y rF (y (`) )
L
1+ 1 + 4(t( ) )2
t( +1) =
2()
t 1 (
y ( +1)
=x( +1)
+ ( +1) (x +1)
x( ) )
t
(see also Nesterov method)
C
Theorem: If L > 0, E(x ( )
) E(x )
Complexity theory: optimal in a worse-case sense.
83. Overview
• Optimal Transport and Imaging
• Convex Analysis and Proximal Calculus
• Forward Backward
• Douglas Rachford and ADMM
• Generalized Forward-Backward
• Primal-Dual Schemes
84. Douglas Rachford Scheme
min G1 (x) + G2 (x) ( )
x
Simple Simple
Douglas-Rachford iterations:
z (⇥+1) = 1 z (⇥) + RProx G2 RProx G1 (z (⇥) )
2 2
x(⇥+1) = Prox G2 (z (⇥+1) )
Reflexive prox: RProx G (x) = 2Prox G (x) x
85. Douglas Rachford Scheme
min G1 (x) + G2 (x) ( )
x
Simple Simple
Douglas-Rachford iterations:
z (⇥+1) = 1 z (⇥) + RProx G2 RProx G1 (z (⇥) )
2 2
x(⇥+1) = Prox G2 (z (⇥+1) )
Reflexive prox: RProx G (x) = 2Prox G (x) x
Theorem: If 0 < < 2 and ⇥ > 0,
x( )
x a solution of ( )
[Lions, Mercier, 79]
86. DR Fix Point Equation
min G1 (x) + G2 (x) 0 (G1 + G2 )(x)
x
z, z x ⇥( G1 )(x) and x z ⇥( G2 )(x)
x = Prox G1 (z) and (2x z) x ⇥( G2 )(x)
87. DR Fix Point Equation
min G1 (x) + G2 (x) 0 (G1 + G2 )(x)
x
z, z x ⇥( G1 )(x) and x z ⇥( G2 )(x)
x = Prox G1 (z) and (2x z) x ⇥( G2 )(x)
x = Prox G2 (2x z) = Prox G2 RProx G1 (z)
z = 2Prox G2 RProx G1 (y) (2x z)
z = 2Prox G2 RProx G1 (z) RProx G1 (z)
z = RProx G2 RProx G1 (z)
z= 1 z+ RProx G2 RProx G1 (z)
2 2
88. Example: Optimal Transport on Centered Grid
s
min J(x) + ◆C (x)
x2RGc ⇥2
C = {x = (m, ⇢) Ax = b} I0 I1
b = (0, ⇢0 , ⇢1 )
t
A(x) = (div(x), ⇢I0 , ⇢I1 )
Centered grid Gc
89. Example: Optimal Transport on Centered Grid
s
min J(x) + ◆C (x)
x2RGc ⇥2
C = {x = (m, ⇢) Ax = b} I0 I1
b = (0, ⇢0 , ⇢1 )
t
A(x) = (div(x), ⇢I0 , ⇢I1 )
Centered grid Gc
Prox J : cubic root (closed form).
90. Example: Optimal Transport on Centered Grid
s
min J(x) + ◆C (x)
x2RGc ⇥2
C = {x = (m, ⇢) Ax = b} I0 I1
b = (0, ⇢0 , ⇢1 )
t
A(x) = (div(x), ⇢I0 , ⇢I1 )
Centered grid Gc
Prox J : cubic root (closed form).
Prox◆C = ProjC = (Id A⇤ 1
A) + A⇤ 1
y
1
= (AA⇤ ) 1
: solving a Poisson equation with b.c.
91. Example: Optimal Transport on Centered Grid
s
min J(x) + ◆C (x)
x2RGc ⇥2
C = {x = (m, ⇢) Ax = b} I0 I1
b = (0, ⇢0 , ⇢1 )
t
A(x) = (div(x), ⇢I0 , ⇢I1 )
Centered grid Gc
Prox J : cubic root (closed form).
Prox◆C = ProjC = (Id A⇤ 1
A) + A⇤ 1
y
1
= (AA⇤ ) 1
: solving a Poisson equation with b.c.
Proposition: DR(↵ = 1) is ALG2 of [Benamou, Brenier 2000]
92. Example: Optimal Transport on Centered Grid
s
min J(x) + ◆C (x)
x2RGc ⇥2
C = {x = (m, ⇢) Ax = b} I0 I1
b = (0, ⇢0 , ⇢1 )
t
A(x) = (div(x), ⇢I0 , ⇢I1 )
Centered grid Gc
Prox J : cubic root (closed form).
Prox◆C = ProjC = (Id A⇤ 1
A) + A⇤ 1
y
1
= (AA⇤ ) 1
: solving a Poisson equation with b.c.
Proposition: DR(↵ = 1) is ALG2 of [Benamou, Brenier 2000]
! Advantage: relaxation parameter ↵ 2]0, 1[.
93. Example: Constrained L1
min ||x||1 min G1 (x) + G2 (x)
x=y x
G1 (x) = iC (x), C = {x x = y}
Prox G1 (x) = ProjC (x) = x +
⇥
( ⇥
) 1
(y x)
G2 (x) = ||x||1 Prox G2 (x) = max 0, 1 xi
|xi | i
e⇥cient if easy to invert.
94. Example: Constrained L1
min ||x||1 min G1 (x) + G2 (x)
x=y x
G1 (x) = iC (x), C = {x x = y}
Prox G1 (x) = ProjC (x) = x +
⇥
( ⇥
) 1
(y x)
G2 (x) = ||x||1 Prox G2 (x) = max 0, 1 xi
|xi | i
e⇥cient if easy to invert. log10 (||x( ) ||1 ||x ||1 )
1
Example: compressed sensing −1
0
R100 400
Gaussian matrix −2
−3 = 0.01
y = x0 ||x0 ||0 = 17 −4 =1
−5
= 10
50 100 150 200 250
95. Auxiliary Variables with DR
min G1 (x) + G2 A(x) Linear map A : E H.
x
min G(z) + C (z) G1 , G2 simple.
z⇥H E
G(x, y) = G1 (x) + G2 (y)
C = {(x, y) ⇥ H E Ax = y}
96. Auxiliary Variables with DR
min G1 (x) + G2 A(x) Linear map A : E H.
x
min G(z) + C (z) G1 , G2 simple.
z⇥H E
G(x, y) = G1 (x) + G2 (y)
C = {(x, y) ⇥ H E Ax = y}
Prox G (x, y) = (Prox G1 (x), Prox G2 (y))
Prox C (x, y) = (x + A y , y
˜ y ) = (˜, A˜)
˜ x x
y = (Id + AA )
˜ 1
(Ax y)
where
x = (Id + A A)
˜ 1
(A y + x)
e cient if Id + AA or Id + A A easy to invert.
97. Example: TV Regularization
1 ||u||1 = ||ui ||
min ||Kf y||2 + ||⇥f ||1
f 2 i
min G1 (f ) + G2 (f )
x
G1 (u) = ||u||1 Prox G1 (u)i = max 0, 1 ui
||ui ||
1
G2 (f ) = ||Kf y||2 Prox = (Id + K K) 1
K
2 G2
C = (f, u) ⇥ RN RN 2
u = ⇤f
˜ ˜
Prox C (f, u) = (f , f )
98. Example: TV Regularization
1 ||u||1 = ||ui ||
min ||Kf y||2 + ||⇥f ||1
f 2 i
min G1 (f ) + G2 (f )
x
G1 (u) = ||u||1 Prox G1 (u)i = max 0, 1 ui
||ui ||
1
G2 (f ) = ||Kf y||2 Prox = (Id + K K) 1
K
2 G2
C = (f, u) ⇥ RN RN 2
u = ⇤f
˜ ˜
Prox C (f, u) = (f , f )
Compute the solution of: (Id + ˜
)f = div(u) + f
O(N log(N )) operations using FFT.
100. Alternating Direction Method of Multipliers
min F (x) + G A(x) (?) () min F (x) + G(y)
x x,y=Ax
A : RN ! RP injective.
101. Alternating Direction Method of Multipliers
min F (x) + G A(x) (?) () min F (x) + G(y)
x x,y=Ax
A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi
x,y u
102. Alternating Direction Method of Multipliers
min F (x) + G A(x) (?) () min F (x) + G(y)
x x,y=Ax
A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi
x,y u
Augmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2
x,y u 2
103. Alternating Direction Method of Multipliers
min F (x) + G A(x) (?) () min F (x) + G(y)
x x,y=Ax
A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi
x,y u
Augmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2
x,y u 2
(`+1)
x = argminx L (x, y (`) , u(`) )
ADMM
y (`+1) = argminy L (x(`+1) , y, u(`) )
u(`+1) = u(`) + (y (`+1) Ax(`+1) )
104. Alternating Direction Method of Multipliers
min F (x) + G A(x) (?) () min F (x) + G(y)
x x,y=Ax
A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi
x,y u
Augmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2
x,y u 2
(`+1)
x = argminx L (x, y (`) , u(`) )
ADMM
y (`+1) = argminy L (x(`+1) , y, u(`) )
u(`+1) = u(`) + (y (`+1) Ax(`+1) )
Theorem: If > 0, x( )
x a solution of ( )
[Gabay, Mercier, Glowinski, Marrocco, 76]
105. ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
A 1
Prox F = argmin ||Ax z||2 + F (x)
x 2
106. ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
A 1
Prox F = argmin ||Ax z||2 + F (x)
x 2
Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ )
107. ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
A 1
Prox F = argmin ||Ax z||2 + F (x)
x 2
Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ )
x(`+1) = ProxA (y (`)
F/ u(`) )
ADMM
y (`+1) = ProxG/ (Ax(`+1) + u(`) )
u(`+1) = u(`) + (y (`+1) Ax(`+1) )
108. ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
A 1
Prox F = argmin ||Ax z||2 + F (x)
x 2
Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ )
x(`+1) = ProxA (y (`)
F/ u(`) )
ADMM
y (`+1) = ProxG/ (Ax(`+1) + u(`) )
u(`+1) = u(`) + (y (`+1) Ax(`+1) )
! If G A is simple: use DR.
! If F ⇤ A⇤ is simple: use ADMM.
109. ADMM vs. DR
Fenchel-Rockafellar duality:
min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u)
x u
Important: no bijection between u and x.
110. ADMM vs. DR
Fenchel-Rockafellar duality:
min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u)
x u
Important: no bijection between u and x.
Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM.
[Eckstein, Bertsekas, 92]
111. ADMM vs. DR
Fenchel-Rockafellar duality:
min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u)
x u
Important: no bijection between u and x.
Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM.
[Eckstein, Bertsekas, 92]
DR iterations (when ↵ = 1):
(`+1) 1 (`) 1
z = z + RProx F⇤ A⇤ RProx G⇤ (z (`) )
2 2
112. ADMM vs. DR
Fenchel-Rockafellar duality:
min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u)
x u
Important: no bijection between u and x.
Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM.
[Eckstein, Bertsekas, 92]
DR iterations (when ↵ = 1):
(`+1) 1 (`) 1
z = z + RProx F ⇤ A⇤ RProx G⇤ (z (`) )
2 2
The iterates of ADMM are recovered using:
(`) 1 (`)
y = (z u(`) )
x(`+1) = ProxA (y (`) u(`) )
F/
u(`) = Prox G⇤ (z (`) )
113. More than 2 Functionals
min G1 (x) + . . . + Gk (x) each Fi is simple
x
min G(x1 , . . . , xk ) + C (x1 , . . . , xk )
x
G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )
C = (x1 , . . . , xk ) Hk x1 = . . . = xk
114. More than 2 Functionals
min G1 (x) + . . . + Gk (x) each Fi is simple
x
min G(x1 , . . . , xk ) + C (x1 , . . . , xk )
x
G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )
C = (x1 , . . . , xk ) Hk x1 = . . . = xk
G and C are simple:
Prox G (x1 , . . . , xk ) = (Prox Gi (xi ))i
1
Prox ⇥C (x1 , . . . , xk ) = (˜, . . . , x)
x ˜ where x =
˜ xi
k i
115. Overview
• Optimal Transport and Imaging
• Convex Analysis and Proximal Calculus
• Forward Backward
• Douglas Rachford and ADMM
• Generalized Forward-Backward
• Primal-Dual Schemes
116. GFB Splitting
n
min F (x) + Gi (x) ( )
x RN
i=1
i = 1, . . . , n, Smooth Simple
(⇥+1) (⇥) (⇥)
zi = zi + Proxn G (2x
(⇥)
zi F (x(⇥) )) x(⇥)
n
1 ( +1)
x( +1)
= zi
n i=1
[Raguet, Fadili, Peyr´ 2012]
e
117. GFB Splitting
n
min F (x) + Gi (x) ( )
x RN
i=1
i = 1, . . . , n, Smooth Simple
(⇥+1) (⇥) (⇥)
zi = zi + Proxn G (2x (⇥)
zi F (x(⇥) )) x(⇥)
n
1 ( +1)
x( +1)
= zi
n i=1
Theorem: Let F be L-Lipschitz.
If < 2/L, x( )
x a solution of ( )
[Raguet, Fadili, Peyr´ 2012]
e
118. GFB Splitting
n
min F (x) + Gi (x) ( )
x RN
i=1
i = 1, . . . , n, Smooth Simple
(⇥+1) (⇥) (⇥)
zi = zi + Proxn G (2x (⇥)
zi F (x(⇥) )) x(⇥)
n
1 ( +1)
x( +1)
= zi
n i=1
Theorem: Let F be L-Lipschitz.
If < 2/L, x( )
x a solution of ( )
[Raguet, Fadili, Peyr´ 2012]
e
n=1 Forward-backward.
F =0 Douglas-Rachford.
119. GFB Fix Point
x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x )
x RN
yi Gi (x ), F (x ) + i yi =0
120. GFB Fix Point
x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x )
x RN
yi Gi (x ), F (x ) + i yi =0
1
(zi )n ,
i=1 i, x zi F (x ) ⇥Gi (x )
n
x = 1
n i zi (use zi = x F (x ) N yi )
121. GFB Fix Point
x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x )
x RN
yi Gi (x ), F (x ) + i yi =0
1
(zi )n ,
i=1 i, x zi F (x ) ⇥Gi (x )
n
x = 1
n i zi (use zi = x F (x ) N yi )
(2x zi F (x )) x n ⇥Gi (x )
x⇥ = Proxn Gi (2x⇥ zi F (x⇥ ))
zi = zi + Proxn G (2x⇥ zi F (x⇥ )) x⇥
122. GFB Fix Point
x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x )
x RN
yi Gi (x ), F (x ) + i yi =0
1
(zi )n ,
i=1 i, x zi F (x ) ⇥Gi (x )
n
x = 1
n i zi (use zi = x F (x ) N yi )
(2x zi F (x )) x n ⇥Gi (x )
x⇥ = Proxn Gi (2x⇥ zi F (x⇥ ))
zi = zi + Proxn G (2x⇥ zi F (x⇥ )) x⇥
+ Fix point equation on (x , z1 , . . . , zn ).
123. Block Regularization
1 2
block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2
m
b B m b
iments Towards More Complex Penalization
(2) Bk
2
+ ` 1 `2
4
k=1 x 1,2
b B1 i b xi
⇥ x⇥⇥1 = i ⇥xi ⇥ b B i b xi2 +
i b xi
N: 256
b B2
b B
Image f = x Coe cients x.
124. Block Regularization
1 2
block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2
m
b B m b
iments Towards More Complex Penalization
Non-overlapping decomposition: B = B ... B
Towards More Complex Penalization
Towards More Complex Penalization
n
1 n
2 G(x) =4 x iBk
(2)
+ ` ` k=1 G 1,2
(x) Gi (x) = ||x[b] ||,
1 2 i=1 b Bi
b b 1b1 B1 i b xiixb xi
22
BB
⇥ x⇥x⇥x⇥⇥1 =i ⇥x⇥x⇥xi ⇥
⇥= ++ +
i b i
⇥ ⇥1 ⇥1 = i i ⇥i i ⇥ bb B B i
Bb xii2bi2xi2
bbx
i
N: 256
b b 2b2 B2 i
BB xi2 b2xi
b b xi
i
b B
Image f = x Coe cients x. Blocks B1 B1 B2
125. Block Regularization
1 2
block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2
m
b B m b
iments Towards More Complex Penalization
Non-overlapping decomposition: B = B ... B
Towards More Complex Penalization
Towards More Complex Penalization
n
1 n
2 G(x) =4 x iBk
(2)
+ ` ` k=1 G 1,2
(x) Gi (x) = ||x[b] ||,
1 2 i=1 b Bi
Each Gi is simple: b b 1b1 B1 i b xiixb xi
BB
22
⇥ x⇥x⇥x⇥⇥1 =i ⇥xG ⇥xi ⇥ m = b B B i b xii2bi2xi2
⇥ ⇥1 = i ⇥i i x +
i b i
⇤ m ⇥ b ⇥ Bi , ⇥ ⇥1Prox i ⇥xi ⇥(x) b max i0, 1
= Bb bx ++m
N: 256 ||x[b]b||B xi2 b2xi
2 2 B2
b B b i b b xi
i
b B
Image f = x Coe cients x. Blocks B1 B1 B2
126. Deconv. + Inpaint. 2min+CP Y ⇥ P K x CP Y + P 1 K2
Deconv. x 2Inpaint. min 2 ⇥ ` `
x x
k=1 x+1,2` k=1
log10(E−E
2 1 `2
Numerical Illustration
log10(E−
1
Numerical Experiments Experiments
1
Numerical 1
TI (2)`2 4
0
||y x 1 ⇥x||368s PRx 2 minix(x)Y ⇥ K x= + `wavelets x
Bk 2
0
: 283s; tPR: 298s; tCP:: 283s; t : 298s; t (2)
Deconvolution min 2 Y ⇥ K
tmin
−1 EFB
x 102
Deconvolution +GCP: 1` 4
−1
tEFB 2 +
10 40 20
368s
30 1 2 2 40k=1
`
x 1,2 1 k=1
20 30
3 iteration 3
# i
EFB iteration # EFB
log10(E−Emin)
log10(E−Emin)
PR PR
2 = convolution
2 = inpainting+convolution l1/l2
l1/l2
: 1.30e−03; CPλ2 : 1.30e−03; CP 2
λ
tPR: 173s; tCP 190s noise: 0.025; convol.::it. #50; SNR: 22.49dB #50; SNR: 22.49dB
tEFB: 161s; tPR: 173s; tCP N: 256
tEFB: 161s; noise: 0.025; :convol.: 2 190s
1
Numerical Experiments 2 1
EFB
it. N: 256
EFB
(4) Bk
Y ⇥P K +
0 0
log10(E−Emin)
3 3
1
PR
2 PR 16
onv. + Inpaint. minx
2
CP
2 30
2
x CP
`140`2 k=1 x 1,2
10 20 10 40 20 30
1 iteration #
1 iteration #
0 0 λ4 : 1.00e−03; λ4 : 1.00e−03;
l1/l2 l1/l2
tEFB: 283s; tPR: 298s; tCP: 368s
−1 noise: 0.025; degrad.: 0.4; 0.025; degrad.: 0.4; convol.: 2
noise: convol.: 2
−1 it. #50; SNR: 21.80dB #50; SNR: 21.80dB
it.
10 20
iteration #
30
EFB
40 10 20
iteration #
30 40 x0
3
PR
min
2
CP
λ2 : 1.30e−03; λ2 : 1.30e−03;
l1/l2 l1/l2
1 noise: 0.025; convol.: 2 noise: 0.025; it. #50; SNR: 22.49dB
convol.: 2 it. #50; SNR: 22.49dB
10
0
log10
10 20
(E(x( ) ) #
iteration
30
E(x ))
y = x0 + w 40
x
4
127. Overview
• Optimal Transport and Imaging
• Convex Analysis and Proximal Calculus
• Forward Backward
• Douglas Rachford and ADMM
• Generalized Forward-Backward
• Primal-Dual Schemes
129. Primal-dual Formulation
Fenchel-Rockafellar duality: A:H⇥ L linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u)
2
x2H x u2L
Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 ))
(min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui
2
u x
= max G⇤ (u)
2 G⇤ (
1 A⇤ u)
u
130. Primal-dual Formulation
Fenchel-Rockafellar duality: A:H⇥ L linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u)
2
x2H x u2L
Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 ))
(min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui
2
u x
= max G⇤ (u)
2 G⇤ (
1 A⇤ u)
u
Recovering x? from some u? :
x? = argmin G1 (x? ) + hx? , A⇤ u? i
x
131. Primal-dual Formulation
Fenchel-Rockafellar duality: A:H⇥ L linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u)
2
x2H x u2L
Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 ))
(min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui
2
u x
= max G⇤ (u)
2 G⇤ (
1 A⇤ u)
u
Recovering x? from some u? :
x? = argmin G1 (x? ) + hx? , A⇤ u? i
x
() A⇤ u? 2 @G1 (x? )
() x? 2 (@G1 ) 1
( A⇤ u? ) = @G⇤ ( A⇤ s? )
1
132. Forward-Backward on the Dual
If G1 is strongly convex: r2 G1 > cId
c
G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2
2
133. Forward-Backward on the Dual
If G1 is strongly convex: r2 G1 > cId
c
G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2
2
x? uniquely defined.
x? = rG? ( A⇤ u? )
1
G? is of class C 1 .
1
134. Forward-Backward on the Dual
If G1 is strongly convex: r2 G1 > cId
c
G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2
2
x? uniquely defined.
x? = rG? ( A⇤ u? )
1
G? is of class C 1 .
1
FB on the dual: min G1 (x) + G2 A(x)
x2H
= min G? ( A⇤ u) + G? (u)
1 2
u2L
Smooth Simple
⇣ ⌘
u(`+1) = Prox⌧ G? u(`) + ⌧ A⇤ rG? ( A⇤ u(`) )
2 1
135. Example: TV Denoising
1
min ||f y||2 + ||⇥f ||1 min ||y + div(u)||2
f RN 2 ||u||
||u||1 = ||ui || ||u|| = max ||ui ||
i
i
Dual solution u Primal solution f = y + div(u )
[Chambolle 2004]
136. Example: TV Denoising
1
min ||f y||2 + ||⇥f ||1 min ||y + div(u)||2
f RN 2 ||u||
||u||1 = ||ui || ||u|| = max ||ui ||
i
i
Dual solution u Primal solution f = y + div(u )
FB (aka projected gradient descent): [Chambolle 2004]
u( +1)
= Proj||·|| u( ) + (y + div(u( ) ))
ui
v = Proj||·|| (u) vi =
max(||ui ||/ , 1)
2 1
Convergence if < =
||div ⇥|| 4
137. Primal-Dual Algorithm
min G1 (x) + G2 A(x)
x H
() min max G1 (x) G⇤ (z) + hA(x), zi
2
x z
138. Primal-Dual Algorithm
min G1 (x) + G2 A(x)
x H
() min max G1 (x) G⇤ (z) + hA(x), zi
2
x z
z (`+1) = Prox G⇤
2
(z (`) + A(˜(`) )
x
x(⇥+1) = Prox G1 (x(⇥) A (z (⇥) ))
˜
x( +1)
= x( +1)
+ (x( +1)
x( ) )
= 0: Arrow-Hurwicz algorithm.
= 1: convergence speed on duality gap.
139. Primal-Dual Algorithm
min G1 (x) + G2 A(x)
x H
() min max G1 (x) G⇤ (z) + hA(x), zi
2
x z
z (`+1) = Prox G⇤
2
(z (`) + A(˜(`) )
x
x(⇥+1) = Prox G1 (x(⇥) A (z (⇥) ))
˜
x( +1)
= x( +1)
+ (x( +1)
x( ) )
= 0: Arrow-Hurwicz algorithm.
= 1: convergence speed on duality gap.
Theorem: [Chambolle-Pock 2011]
If 0 1 and ⇥⇤ ||A||2 < 1 then
x( )
x minimizer of G1 + G2 A.
140. Example: Optimal Transport
Staggered grid formulation :
min
1 2
J(I(x)) + ◆C (x)
x2RGst ⇥RGst
1 2
Gst Gst
1 2
I = (I , I ) : R ⇥R ! RG c
s
s
I
t t
Staggered grid Centered grid Gc
1 2
Gst Gst
141. Conclusion
Inverse problems in imaging:
Large scale, N 106 .
Non-smooth (sparsity, TV, . . . )
(Sometimes) convex.
Highly structured (separability, p
norms, . . . ).
142. Conclusion
Inverse problems in imaging:
Large scale, N 106 .
Towards More Complex Penalization
Non-smooth (sparsity, TV, . . . )
(Sometimes) convex. b B1 i b xi
2
⇥ x⇥⇥1 = i ⇥xi ⇥ b B
2
i p xi +
Highly structured (separability, b
norms, . . . ).
b B2 i b xi2
Proximal splitting:
Unravel the structure of problems.
Parallelizable.
Decomposition G = k Gk
143. Conclusion
Inverse problems in imaging:
Large scale, N 106 .
Towards More Complex Penalization
Non-smooth (sparsity, TV, . . . )
(Sometimes) convex. b B1 i b xi
2
⇥ x⇥⇥1 = i ⇥xi ⇥ b B
2
i p xi +
Highly structured (separability, b
norms, . . . ).
b B2 i b xi2
Proximal splitting:
Unravel the structure of problems.
Parallelizable.
Open problems:
Decomposition G = k Gk
Less structured problems without smoothness.
Non-convex optimization.