SlideShare a Scribd company logo
Subdifferentials and Proximal Operators
Pontus Giselsson
1
Learning goals
• Be able to derive subdifferential and proximal operator formulas
• Understand that subdifferentials define affine minorizors
• Existence of subgradient for convex functions
• Understand maximal monotonicity and Minty’s theorem
• Know strong monotonicity and relation to strong convexity
• Know different characterizations of smoothness
• Understand and be able to use Fermat’s rule
• Know subdifferential calculus rules
• Understand that prox evaluates subdifferential
2
Subdifferentials
3
Gradients of convex functions
• Recall: A differentiable function f : Rn
→ R is convex iff
f(y) ≥ f(x) + ∇f(x)T
(y − x)
for all x, y ∈ Rn
f(y)
f(x) + ∇f(x)T (y − x)
(∇f(x), −1)
(x, f(x))
• Function f has for all x ∈ Rn
an affine minorizer that:
• has slope s defined by ∇f
• coincides with function f at x
• defines normal (∇f(x), −1) to epigraph of f
• What if function is nondifferentiable?
4
Subdifferentials and subgradients
• Subgradients s define affine minorizers to the function that:
(s, −1)
(s, −1)
(s, −1)
• coincide with f at x
• define normal vector (s, −1) to epigraph of f
• can be one of many affine minorizers at nondifferentiable points x
• Subdifferential of f : Rn
→ R at x is set of vectors s satisfying
f(y) ≥ f(x) + sT
(y − x) for all y ∈ Rn
, (1)
• Notation:
• subdifferential: ∂f : Rn
→ 2Rn
(power-set notation 2Rn
)
• subdifferential at x: ∂f(x) = {s : (1) holds}
• elements s ∈ ∂f(x) are called subgradients of f at x
5
Relation to gradient
• If f differentiable at x and ∂f(x) 6= ∅ then ∂f(x) = {∇f(x)}:
(s, −1)
x1
(∇f(x2), −1)
x2 (∇f(x3), −1)
x3
• i.e., subdifferential (if nonempty) at x consists of only gradient
6
Subgradient existence – Nonconvex example
• Function can be differentiable at x but ∂f(x) = ∅
x1
x2
x3
• x1: ∂f(x1) = {0}, ∇f(x1) = 0
• x2: ∂f(x2) = ∅, ∇f(x2) = 0
• x3: ∂f(x3) = ∅, ∇f(x3) = 0
• Gradient is a local concept, subdifferential is a global property
7
Subgradient existence – Convex example
• Consider the convex function:
x1
x2
x3
f(x) = |x|
• What are the subdifferentials at points x1, x2, x3?
• Subdifferential at x1 is -1 (affine minorizer with slope -1)
• Subdifferential at x2 is [-1,1] (affine minorizers with slope [-1,1])
• Subdifferential at x3 is 1 (affine minorizer with slope 1)
Fact:
• For finite-valued convex functions, a subgradient exists for every x
8
Existence for extended-valued convex functions
• Let f : Rn
→ R ∪ {∞} be convex, then:
1. Subgradients exist for all x in relative interior of domf
2. Subgradients sometimes exist for x on boundary of domf
3. No subgradient exists for x outside domf
• Examples for second case, boundary points of domf:
−
√
1 − x2 + ι[−1,1](x) x2 + ι[−2,2](x)
• No subgradient (affine minorizer) exists for left function at x = 1
9
Monotonicity
• Subdifferential operator is monotone:
(sx − sy)T
(x − y) ≥ 0
for all sx ∈ ∂f(x) and sy ∈ ∂f(y)
• Proof: Add two copies of subdifferential definition
f(y) ≥ f(x) + sT
x (y − x)
with x and y swapped
• ∂f : R → 2R
: Minimum slope 0 and maximum slope ∞
∂f
x
10
Monotonicity beyond subdifferentials
• Let A : Rn
→ 2Rn
be monotone, i.e.:
(u − v)T
(x − y) ≥ 0
for all u ∈ Ax and v ∈ Ay
• If n = 1, then A = ∂f for some function f : R → R ∪ {∞}
• If n ≥ 2 there exist monotone A that are not subdifferentials
11
Maximal monotonicity
• Let the set gph ∂f := {(x, u) : u ∈ ∂f(x)} be the graph of ∂f
• ∂f is maximally monotone if no other function g exists with
gph ∂f ⊂ gph ∂g,
with strict inclusion
• A result (due to Rockafellar):
f is closed convex if and only if ∂f is maximally monotone
12
Minty’s theorem
• Let ∂f : Rn
→ 2Rn
and α > 0
• ∂f is maximally monotone if and only if range(αI + ∂f) = Rn
∂f1
x
maximally monotone
∂f2
x
not maximally monotone
∂f1 + αI
x
full range
∂f2 + αI
x
not full range
• Interpretation: No “holes” in gph ∂f
13
Strong convexity
• Recall that f is σ-strongly convex if f − σ
2 k · k2
2 is convex
• If f is σ-strongly convex then
f(y) ≥ f(x) + sT
(y − x) + σ
2 kx − yk2
2
holds for all x ∈ dom∂f, s ∈ ∂f(x), and y ∈ Rn
• The function has convex quadratic minorizers instead of affine
f(y)
f(x1) + sT
1 (y − x1) + σ
2
kx1 − yk2
2
x1
(s1, −1)
f(x2) + sT
2,1(y − x2) + σ
2
kx2 − yk2
2
(s2,1, −1)
f(x2) + sT
2,2(y − x2) + σ
2
kx2 − yk2
2
x2
(s2,2, −1)
• Multiple lower bounds at x2 with subgradients s2,1 and s2,2
14
Strong monotonicity
• If f σ-strongly convex function, then ∂f is σ-strongly monotone:
(sx − sy)T
(x − y) ≥ σkx − yk2
2
for all sx ∈ ∂f(x) and sy ∈ ∂f(y)
• Proof: Add two copies of strong convexity inequality
f(y) ≥ f(x) + sT
x (y − x) + σ
2 kx − yk2
2
with x and y swapped
• ∂f is σ-strongly monotone if and only if ∂f − σI is monotone
• ∂f : R → 2R
: Minimum slope σ and maximum slope ∞
∂f
x
15
Strongly convex functions – An equivalence
The following are equivalent
(i) f is closed and σ-strongly convex
(ii) ∂f is maximally monotone and σ-strongly monotone
Proof:
(i)⇒(ii): we know this from before
(ii)⇒(i): (ii) ⇒ ∂f − σI = ∂(f − σ
2 k · k2
2) maximally monotone
⇒ f − σ
2 k · k2
2 closed convex
⇒ f closed and σ-strongly convex
16
Smoothness and convexity
• A differentiable function f : Rn
→ R is convex and smooth if
f(y) ≤ f(x) + ∇f(x)T
(y − x) + β
2 kx − yk2
2
f(y) ≥ f(x) + ∇f(x)T
(y − x)
holds for all x, y ∈ Rn
• f has convex quadratic majorizers and affine minorizers
f(x1) + ∇f(x1)T (y − x1) + β
2
kx1 − yk2
2
x1
(∇f(x2), −1)
f(x2) + ∇f(x2)T (y − x2) + β
2
kx2 − yk2
2
x2
(∇f(x2), −1)
f(y)
• Quadratic upper bound is called descent lemma 17
Gradient of smooth convex function
• Gradient of smooth convex function is monotone and Lipschitz
(∇f(x) − ∇f(y))T
(x − y) ≥ 0
k∇f(y) − ∇f(x)k2 ≤ βkx − yk2
• ∇f : R → R: Minimum slope 0 and maximum slope β
∇f
x
• Actually satisfies the stronger 1
β -cocoercivity property:
(∇f(x) − ∇f(y))T
(x − y) ≥ 1
β k∇f(y) − ∇f(x)k2
2
18
Smooth convex functions – Equivalences
Let f : Rn
→ R be differentiable. The following are equivalent:
(i) ∇f is 1
β -cocoercive
(ii) ∇f is maximally monotone and β-Lipschitz continuous
(iii) f is closed convex and satisfies descent lemma (is β-smooth)
• Implication (ii)⇒(i) is called the Baillon-Haddad theorem
• Will connect smoothness and strong convexity via conjugates in next lecture
19
Fermat’s rule
Let f : Rn
→ R ∪ {∞}, then x minimizes f if and only if
0 ∈ ∂f(x)
• Proof: x minimizes f if and only if
f(y) ≥ f(x) + 0T
(y − x) for all y ∈ Rn
which by definition of subdifferential is equivalent to 0 ∈ ∂f(x)
• Example: several subgradients at solution, including 0
(0, −1)
20
Fermat’s rule – Nonconvex example
• Fermat’s rule holds also for nonconvex functions
• Example:
x1
x2
(0, −1)
• ∂f(x1) = 0 and ∇f(x1) = 0 (global minimum)
• ∂f(x2) = ∅ and ∇f(x2) = 0 (local minimum)
• For nonconvex f, we can typically only hope to find local minima
21
Subdifferential calculus rules
• Subdifferential of sum ∂(f1 + f2)
• Subdifferential of composition with matrix ∂(g ◦ L)
22
Subdifferential of sum
If f1, f2 closed convex and relint domf1 ∩ relint domf2 6= ∅:
∂(f1 + f2) = ∂f1 + ∂f2
• One direction always holds: if x ∈ dom∂f1 ∩ dom∂f2:
∂(f1 + f2)(x) ⊇ ∂f1(x) + ∂f2(x)
Proof: let si ∈ ∂fi(x), add subdifferential definitions:
f1(y) + f2(y) ≥ f1(x) + f2(x) + (s1 + s2)T
(y − x)
i.e. s1 + s2 ∈ ∂(f1 + f2)(x)
• If f1 and f2 differentiable, we have (without convexity of f)
∇(f1 + f2) = ∇f1 + ∇f2
23
Subdifferential of composition
If f closed convex and relint dom(f ◦ L) 6= ∅:
∂(f ◦ L)(x) = LT
∂f(Lx)
• One direction always holds: If Lx ∈ domf, then
∂(f ◦ L)(x) ⊇ LT
∂f(Lx)
Proof: let s ∈ ∂f(Lx), then by definition of subgradient of f:
(f ◦ L)(y) ≥ (f ◦ L)(x) + sT
(Ly − Lx) = (f ◦ L)(x) + (LT
s)T
(y − x)
i.e., LT
s ∈ ∂(f ◦ L)(x)
• If f differentiable, we have chain rule (without convexity of f)
∇(f ◦ L)(x) = LT
∇f(Lx)
24
A sufficient optimality condition
Let f : Rm
→ R, g : Rn
→ R, and L ∈ Rm×n
then:
minimize f(Lx) + g(x) (1)
is solved by every x ∈ Rn
that satisfies
0 ∈ LT
∂f(Lx) + ∂g(x) (2)
• Subdifferential calculus inclusions say:
0 ∈ LT
∂f(Lx) + ∂g(x) ⊆ ∂((f ◦ L)(x) + g(x))
which by Fermat’s rule is equivalent to x solution to (1)
• Note: (1) can have solution but no x exists that satisfies (2)
25
A necessary and sufficient optimality condition
Let f : Rm
→ R, g : Rn
→ R, L ∈ Rm×n
with f, g closed convex
and assume relint dom(f ◦ L) ∩ relint domg 6= ∅ then:
minimize f(Lx) + g(x) (1)
is solved by x ∈ Rn
if and only if x satisfies
0 ∈ LT
∂f(Lx) + ∂g(x) (2)
• Subdifferential calculus equality rules say:
0 ∈ LT
∂f(Lx) + ∂g(x) = ∂((f ◦ L)(x) + g(x))
which by Fermat’s rule is equivalent to x solution to (1)
• Algorithms search for x that satisfy 0 ∈ LT
∂f(Lx) + ∂g(x)
26
A comment to constraint qualification
• The condition
relint dom(f ◦ L) ∩ relint domg 6= ∅
is called constraint qualification and referred to as CQ
• It is a mild condition that rarely is not satisfied
dom(f ◦ L)
domg
no solution
dom(f ◦ L)
domg
solution
no CQ
dom(f ◦ L)
domg
solution
CQ
27
Evaluating subgradients of convex functions
• Obviously need to evaluate subdifferentials to solve
0 ∈ LT
∂f(Lx) + ∂g(x)
• Explicit evaluation:
• If function is differentiable: ∇f (unique)
• If function is nondifferentiable: compute element in ∂f
• Implicit evaluation:
• Proximal operator (specific element of subdifferential)
28
Proximal operators
29
Proximal operator
• Proximal operator of (convex) g defined as:
proxγg(z) = argmin
x
(g(x) + 1
2γ kx − zk2
2)
where γ > 0 is a parameter
• Evaluating prox requires solving optimization problem
• Objective is strongly convex ⇒ solution exists and is unique
30
Prox evaluates the subdifferential
• Fermat’s rule on prox definition: x = proxγg(z) if and only if
0 ∈ ∂g(x) + γ−1
(x − z) ⇔ γ−1
(z − x) ∈ ∂g(x)
Hence, γ−1
(z − x) is element in ∂g(x)
• A subgradient ∂g(x) where x = proxγg(z) is computed
• Often used in algorithms when g nonsmooth (no gradient exists)
31
Prox is generalization of projection
• Recall the indicator function of a set C
ιC(x) :=
(
0 if x ∈ C
∞ otherwise
• Then
ΠC(z) = argmin
x
(kx − zk2 : x ∈ C)
= argmin
x
(1
2 kx − zk2
2 : x ∈ C)
= argmin
x
(1
2 kx − zk2
2 + ιC(x))
= proxιC
(z)
• Projection onto C equals prox of indicator function of C
32
Proximal operator – Example 1
Let g(x) = 1
2 xT
Hx + hT
x with H positive semidefinite
• Gradient satisfies ∇g(x) = Hx + h
• Fermat’s rule for x = proxγgz:
0 = ∇g(x) + γ−1
(x − z) ⇔ 0 = Hx + h + γ−1
(x − z)
⇔ (I + γH)x = z − γh
⇔ x = (I + γH)−1
(z − γh)
• So proxγgz = (I + γH)−1
(z − γh)
33
Proximal operator – Example 2
• Consider the function g with subdifferential ∂g:
g(x) =
(
−x if x ≤ 0
0 if x ≥ 0
∂g(x) =





−1 if x < 0
[−1, 0] if x = 0
0 if x > 0
• Graphical representations
(−1, −1)
(−1, −1)
(−0.5, −1) (0, −1) (0, −1)
g(x)
x
∂g(x)
x
• Fermat’s rule for x = proxγgz:
0 ∈ ∂g(x) + γ−1
(x − z)
34
Proximal operator – Example 2 cont’d
• Let x < 0, then Fermat’s rule reads
0 = −1 + γ−1
(x − z) ⇔ x = z + γ
which is valid (x < 0) if z < −γ
• Let x = 0, then Fermat’s rule reads
0 = [−1, 0] + γ−1
(0 − z)
which is valid (x = 0) if z ∈ [−γ, 0]
• Let x > 0, then Fermat’s rule reads
0 = 0 + γ−1
(x − z) ⇔ x = z
which is valid (x > 0) if z > 0
• The prox satisfies
proxγg(z) =





z + γ if z < −γ
0 if z ∈ [−γ, 0]
z if z > 0
35
Computational cost
• Evaluating prox requires solving optimization problem
proxγg(z) = argmin
x
(g(x) + 1
2γ kx − zk2
2)
• Prox typically more expensive to evaluate than gradient
• Example: Quadratic g(x) = 1
2 xT
Hx + hT
x:
proxγg(z) = (I + γH)−1
(z − γh), ∇g(z) = Hz − h
36

More Related Content

Similar to subdiff_prox.pdf

0.5.derivatives
0.5.derivatives0.5.derivatives
0.5.derivatives
m2699
 
Limits, Continuity & Differentiation (Theory)
Limits, Continuity & Differentiation (Theory)Limits, Continuity & Differentiation (Theory)
Limits, Continuity & Differentiation (Theory)
Eduron e-Learning Private Limited
 
Lesson 8: Basic Differentation Rules (slides)
Lesson 8: Basic Differentation Rules (slides)Lesson 8: Basic Differentation Rules (slides)
Lesson 8: Basic Differentation Rules (slides)
Mel Anthony Pepito
 
Lesson 8: Basic Differentation Rules (slides)
Lesson 8: Basic Differentation Rules (slides)Lesson 8: Basic Differentation Rules (slides)
Lesson 8: Basic Differentation Rules (slides)
Matthew Leingang
 
Matrix calculus
Matrix calculusMatrix calculus
Matrix calculus
Sungbin Lim
 
limits and continuity
limits and continuitylimits and continuity
limits and continuity
Elias Dinsa
 
Duality.pdf
Duality.pdfDuality.pdf
Duality.pdf
ssuser8547f2
 
Convexity in the Theory of the Gamma Function.pdf
Convexity in the Theory of the Gamma Function.pdfConvexity in the Theory of the Gamma Function.pdf
Convexity in the Theory of the Gamma Function.pdf
PeterEsnayderBarranz
 
Functions limits and continuity
Functions limits and continuityFunctions limits and continuity
Functions limits and continuity
sudersana viswanathan
 
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Katsuya Ito
 
Paper 2
Paper 2Paper 2
Lesson05 Continuity Slides+Notes
Lesson05    Continuity Slides+NotesLesson05    Continuity Slides+Notes
Lesson05 Continuity Slides+Notes
Matthew Leingang
 
Lesson 5: Continuity
Lesson 5: ContinuityLesson 5: Continuity
Lesson 5: Continuity
Matthew Leingang
 
Lesson05 Continuity Slides+Notes
Lesson05    Continuity Slides+NotesLesson05    Continuity Slides+Notes
Lesson05 Continuity Slides+Notes
Matthew Leingang
 
R lecture co4_math 21-1
R lecture co4_math 21-1R lecture co4_math 21-1
R lecture co4_math 21-1
Trixia Kimberly Canapati
 
Chapter 5(partial differentiation)
Chapter 5(partial differentiation)Chapter 5(partial differentiation)
Chapter 5(partial differentiation)
Eko Wijayanto
 
Jokyokai20111124
Jokyokai20111124Jokyokai20111124
Jokyokai20111124
kiyohito_nagano
 
Fourier series Introduction
Fourier series IntroductionFourier series Introduction
Fourier series Introduction
Rizwan Kazi
 
functions limits and continuity
functions limits and continuityfunctions limits and continuity
functions limits and continuity
Pume Ananda
 
Limits and derivatives
Limits and derivativesLimits and derivatives
Limits and derivatives
Laxmikant Deshmukh
 

Similar to subdiff_prox.pdf (20)

0.5.derivatives
0.5.derivatives0.5.derivatives
0.5.derivatives
 
Limits, Continuity & Differentiation (Theory)
Limits, Continuity & Differentiation (Theory)Limits, Continuity & Differentiation (Theory)
Limits, Continuity & Differentiation (Theory)
 
Lesson 8: Basic Differentation Rules (slides)
Lesson 8: Basic Differentation Rules (slides)Lesson 8: Basic Differentation Rules (slides)
Lesson 8: Basic Differentation Rules (slides)
 
Lesson 8: Basic Differentation Rules (slides)
Lesson 8: Basic Differentation Rules (slides)Lesson 8: Basic Differentation Rules (slides)
Lesson 8: Basic Differentation Rules (slides)
 
Matrix calculus
Matrix calculusMatrix calculus
Matrix calculus
 
limits and continuity
limits and continuitylimits and continuity
limits and continuity
 
Duality.pdf
Duality.pdfDuality.pdf
Duality.pdf
 
Convexity in the Theory of the Gamma Function.pdf
Convexity in the Theory of the Gamma Function.pdfConvexity in the Theory of the Gamma Function.pdf
Convexity in the Theory of the Gamma Function.pdf
 
Functions limits and continuity
Functions limits and continuityFunctions limits and continuity
Functions limits and continuity
 
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
 
Paper 2
Paper 2Paper 2
Paper 2
 
Lesson05 Continuity Slides+Notes
Lesson05    Continuity Slides+NotesLesson05    Continuity Slides+Notes
Lesson05 Continuity Slides+Notes
 
Lesson 5: Continuity
Lesson 5: ContinuityLesson 5: Continuity
Lesson 5: Continuity
 
Lesson05 Continuity Slides+Notes
Lesson05    Continuity Slides+NotesLesson05    Continuity Slides+Notes
Lesson05 Continuity Slides+Notes
 
R lecture co4_math 21-1
R lecture co4_math 21-1R lecture co4_math 21-1
R lecture co4_math 21-1
 
Chapter 5(partial differentiation)
Chapter 5(partial differentiation)Chapter 5(partial differentiation)
Chapter 5(partial differentiation)
 
Jokyokai20111124
Jokyokai20111124Jokyokai20111124
Jokyokai20111124
 
Fourier series Introduction
Fourier series IntroductionFourier series Introduction
Fourier series Introduction
 
functions limits and continuity
functions limits and continuityfunctions limits and continuity
functions limits and continuity
 
Limits and derivatives
Limits and derivativesLimits and derivatives
Limits and derivatives
 

Recently uploaded

HOW TO USE PINTEREST_by: Clarissa Credito
HOW TO USE PINTEREST_by: Clarissa CreditoHOW TO USE PINTEREST_by: Clarissa Credito
HOW TO USE PINTEREST_by: Clarissa Credito
ClarissaAlanoCredito
 
Ealing London Independent Photography meeting - June 2024
Ealing London Independent Photography meeting - June 2024Ealing London Independent Photography meeting - June 2024
Ealing London Independent Photography meeting - June 2024
Sean McDonnell
 
Codes n Conventions Website Media studies.pptx
Codes n Conventions Website Media studies.pptxCodes n Conventions Website Media studies.pptx
Codes n Conventions Website Media studies.pptx
ZackSpencer3
 
My storyboard for a sword fight scene with lightsabers
My storyboard for a sword fight scene with lightsabersMy storyboard for a sword fight scene with lightsabers
My storyboard for a sword fight scene with lightsabers
AlejandroGuarnGutirr
 
➒➌➎➏➑➐➋➑➐➐ Dpboss Satta Matka Matka Guessing Kalyan Chart Indian Matka Satta ...
➒➌➎➏➑➐➋➑➐➐ Dpboss Satta Matka Matka Guessing Kalyan Chart Indian Matka Satta ...➒➌➎➏➑➐➋➑➐➐ Dpboss Satta Matka Matka Guessing Kalyan Chart Indian Matka Satta ...
➒➌➎➏➑➐➋➑➐➐ Dpboss Satta Matka Matka Guessing Kalyan Chart Indian Matka Satta ...
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
 
FinalFinalSelf-PortraiturePowerPoint.pptx
FinalFinalSelf-PortraiturePowerPoint.pptxFinalFinalSelf-PortraiturePowerPoint.pptx
FinalFinalSelf-PortraiturePowerPoint.pptx
abbieharman
 
Cherries 32 collection of colorful paintings
Cherries 32 collection of colorful paintingsCherries 32 collection of colorful paintings
Cherries 32 collection of colorful paintings
sandamichaela *
 
一比一原版加拿大多伦多大学毕业证(uoft毕业证书)如何办理
一比一原版加拿大多伦多大学毕业证(uoft毕业证书)如何办理一比一原版加拿大多伦多大学毕业证(uoft毕业证书)如何办理
一比一原版加拿大多伦多大学毕业证(uoft毕业证书)如何办理
taqyea
 
Heart Touching Romantic Love Shayari In English with Images
Heart Touching Romantic Love Shayari In English with ImagesHeart Touching Romantic Love Shayari In English with Images
Heart Touching Romantic Love Shayari In English with Images
Short Good Quotes
 
一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理
一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理
一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理
zeyhe
 
哪里购买美国乔治城大学毕业证硕士学位证书原版一模一样
哪里购买美国乔治城大学毕业证硕士学位证书原版一模一样哪里购买美国乔治城大学毕业证硕士学位证书原版一模一样
哪里购买美国乔治城大学毕业证硕士学位证书原版一模一样
tc73868
 
Fed by curiosity and beauty - Remembering Myrsine Zorba
Fed by curiosity and beauty - Remembering Myrsine ZorbaFed by curiosity and beauty - Remembering Myrsine Zorba
Fed by curiosity and beauty - Remembering Myrsine Zorba
mariavlachoupt
 
Fashionista Chic Couture Mazes and Coloring AdventureA
Fashionista Chic Couture Mazes and Coloring AdventureAFashionista Chic Couture Mazes and Coloring AdventureA
Fashionista Chic Couture Mazes and Coloring AdventureA
julierjefferies8888
 
Colour Theory for Painting - Fine Artist.pdf
Colour Theory for Painting - Fine Artist.pdfColour Theory for Painting - Fine Artist.pdf
Colour Theory for Painting - Fine Artist.pdf
Ketan Naik
 
My storyboard for the short film "Maatla".
My storyboard for the short film "Maatla".My storyboard for the short film "Maatla".
My storyboard for the short film "Maatla".
AlejandroGuarnGutirr
 
Domino Express Storyboard - TV Adv Toys 30"
Domino Express Storyboard - TV Adv Toys 30"Domino Express Storyboard - TV Adv Toys 30"
Domino Express Storyboard - TV Adv Toys 30"
Alessandro Occhipinti
 
2024 MATFORCE Youth Poster Contest Winners
2024 MATFORCE Youth Poster Contest Winners2024 MATFORCE Youth Poster Contest Winners
2024 MATFORCE Youth Poster Contest Winners
matforce
 
All the images mentioned in 'See What You're Missing'
All the images mentioned in 'See What You're Missing'All the images mentioned in 'See What You're Missing'
All the images mentioned in 'See What You're Missing'
Dave Boyle
 
一比一原版美国亚利桑那大学毕业证(ua毕业证书)如何办理
一比一原版美国亚利桑那大学毕业证(ua毕业证书)如何办理一比一原版美国亚利桑那大学毕业证(ua毕业证书)如何办理
一比一原版美国亚利桑那大学毕业证(ua毕业证书)如何办理
homgo
 
Complete Lab 123456789123456789123456789
Complete Lab 123456789123456789123456789Complete Lab 123456789123456789123456789
Complete Lab 123456789123456789123456789
vickyvikas51556
 

Recently uploaded (20)

HOW TO USE PINTEREST_by: Clarissa Credito
HOW TO USE PINTEREST_by: Clarissa CreditoHOW TO USE PINTEREST_by: Clarissa Credito
HOW TO USE PINTEREST_by: Clarissa Credito
 
Ealing London Independent Photography meeting - June 2024
Ealing London Independent Photography meeting - June 2024Ealing London Independent Photography meeting - June 2024
Ealing London Independent Photography meeting - June 2024
 
Codes n Conventions Website Media studies.pptx
Codes n Conventions Website Media studies.pptxCodes n Conventions Website Media studies.pptx
Codes n Conventions Website Media studies.pptx
 
My storyboard for a sword fight scene with lightsabers
My storyboard for a sword fight scene with lightsabersMy storyboard for a sword fight scene with lightsabers
My storyboard for a sword fight scene with lightsabers
 
➒➌➎➏➑➐➋➑➐➐ Dpboss Satta Matka Matka Guessing Kalyan Chart Indian Matka Satta ...
➒➌➎➏➑➐➋➑➐➐ Dpboss Satta Matka Matka Guessing Kalyan Chart Indian Matka Satta ...➒➌➎➏➑➐➋➑➐➐ Dpboss Satta Matka Matka Guessing Kalyan Chart Indian Matka Satta ...
➒➌➎➏➑➐➋➑➐➐ Dpboss Satta Matka Matka Guessing Kalyan Chart Indian Matka Satta ...
 
FinalFinalSelf-PortraiturePowerPoint.pptx
FinalFinalSelf-PortraiturePowerPoint.pptxFinalFinalSelf-PortraiturePowerPoint.pptx
FinalFinalSelf-PortraiturePowerPoint.pptx
 
Cherries 32 collection of colorful paintings
Cherries 32 collection of colorful paintingsCherries 32 collection of colorful paintings
Cherries 32 collection of colorful paintings
 
一比一原版加拿大多伦多大学毕业证(uoft毕业证书)如何办理
一比一原版加拿大多伦多大学毕业证(uoft毕业证书)如何办理一比一原版加拿大多伦多大学毕业证(uoft毕业证书)如何办理
一比一原版加拿大多伦多大学毕业证(uoft毕业证书)如何办理
 
Heart Touching Romantic Love Shayari In English with Images
Heart Touching Romantic Love Shayari In English with ImagesHeart Touching Romantic Love Shayari In English with Images
Heart Touching Romantic Love Shayari In English with Images
 
一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理
一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理
一比一原版(UniSA毕业证)南澳大学毕业证成绩单如何办理
 
哪里购买美国乔治城大学毕业证硕士学位证书原版一模一样
哪里购买美国乔治城大学毕业证硕士学位证书原版一模一样哪里购买美国乔治城大学毕业证硕士学位证书原版一模一样
哪里购买美国乔治城大学毕业证硕士学位证书原版一模一样
 
Fed by curiosity and beauty - Remembering Myrsine Zorba
Fed by curiosity and beauty - Remembering Myrsine ZorbaFed by curiosity and beauty - Remembering Myrsine Zorba
Fed by curiosity and beauty - Remembering Myrsine Zorba
 
Fashionista Chic Couture Mazes and Coloring AdventureA
Fashionista Chic Couture Mazes and Coloring AdventureAFashionista Chic Couture Mazes and Coloring AdventureA
Fashionista Chic Couture Mazes and Coloring AdventureA
 
Colour Theory for Painting - Fine Artist.pdf
Colour Theory for Painting - Fine Artist.pdfColour Theory for Painting - Fine Artist.pdf
Colour Theory for Painting - Fine Artist.pdf
 
My storyboard for the short film "Maatla".
My storyboard for the short film "Maatla".My storyboard for the short film "Maatla".
My storyboard for the short film "Maatla".
 
Domino Express Storyboard - TV Adv Toys 30"
Domino Express Storyboard - TV Adv Toys 30"Domino Express Storyboard - TV Adv Toys 30"
Domino Express Storyboard - TV Adv Toys 30"
 
2024 MATFORCE Youth Poster Contest Winners
2024 MATFORCE Youth Poster Contest Winners2024 MATFORCE Youth Poster Contest Winners
2024 MATFORCE Youth Poster Contest Winners
 
All the images mentioned in 'See What You're Missing'
All the images mentioned in 'See What You're Missing'All the images mentioned in 'See What You're Missing'
All the images mentioned in 'See What You're Missing'
 
一比一原版美国亚利桑那大学毕业证(ua毕业证书)如何办理
一比一原版美国亚利桑那大学毕业证(ua毕业证书)如何办理一比一原版美国亚利桑那大学毕业证(ua毕业证书)如何办理
一比一原版美国亚利桑那大学毕业证(ua毕业证书)如何办理
 
Complete Lab 123456789123456789123456789
Complete Lab 123456789123456789123456789Complete Lab 123456789123456789123456789
Complete Lab 123456789123456789123456789
 

subdiff_prox.pdf

  • 1. Subdifferentials and Proximal Operators Pontus Giselsson 1
  • 2. Learning goals • Be able to derive subdifferential and proximal operator formulas • Understand that subdifferentials define affine minorizors • Existence of subgradient for convex functions • Understand maximal monotonicity and Minty’s theorem • Know strong monotonicity and relation to strong convexity • Know different characterizations of smoothness • Understand and be able to use Fermat’s rule • Know subdifferential calculus rules • Understand that prox evaluates subdifferential 2
  • 4. Gradients of convex functions • Recall: A differentiable function f : Rn → R is convex iff f(y) ≥ f(x) + ∇f(x)T (y − x) for all x, y ∈ Rn f(y) f(x) + ∇f(x)T (y − x) (∇f(x), −1) (x, f(x)) • Function f has for all x ∈ Rn an affine minorizer that: • has slope s defined by ∇f • coincides with function f at x • defines normal (∇f(x), −1) to epigraph of f • What if function is nondifferentiable? 4
  • 5. Subdifferentials and subgradients • Subgradients s define affine minorizers to the function that: (s, −1) (s, −1) (s, −1) • coincide with f at x • define normal vector (s, −1) to epigraph of f • can be one of many affine minorizers at nondifferentiable points x • Subdifferential of f : Rn → R at x is set of vectors s satisfying f(y) ≥ f(x) + sT (y − x) for all y ∈ Rn , (1) • Notation: • subdifferential: ∂f : Rn → 2Rn (power-set notation 2Rn ) • subdifferential at x: ∂f(x) = {s : (1) holds} • elements s ∈ ∂f(x) are called subgradients of f at x 5
  • 6. Relation to gradient • If f differentiable at x and ∂f(x) 6= ∅ then ∂f(x) = {∇f(x)}: (s, −1) x1 (∇f(x2), −1) x2 (∇f(x3), −1) x3 • i.e., subdifferential (if nonempty) at x consists of only gradient 6
  • 7. Subgradient existence – Nonconvex example • Function can be differentiable at x but ∂f(x) = ∅ x1 x2 x3 • x1: ∂f(x1) = {0}, ∇f(x1) = 0 • x2: ∂f(x2) = ∅, ∇f(x2) = 0 • x3: ∂f(x3) = ∅, ∇f(x3) = 0 • Gradient is a local concept, subdifferential is a global property 7
  • 8. Subgradient existence – Convex example • Consider the convex function: x1 x2 x3 f(x) = |x| • What are the subdifferentials at points x1, x2, x3? • Subdifferential at x1 is -1 (affine minorizer with slope -1) • Subdifferential at x2 is [-1,1] (affine minorizers with slope [-1,1]) • Subdifferential at x3 is 1 (affine minorizer with slope 1) Fact: • For finite-valued convex functions, a subgradient exists for every x 8
  • 9. Existence for extended-valued convex functions • Let f : Rn → R ∪ {∞} be convex, then: 1. Subgradients exist for all x in relative interior of domf 2. Subgradients sometimes exist for x on boundary of domf 3. No subgradient exists for x outside domf • Examples for second case, boundary points of domf: − √ 1 − x2 + ι[−1,1](x) x2 + ι[−2,2](x) • No subgradient (affine minorizer) exists for left function at x = 1 9
  • 10. Monotonicity • Subdifferential operator is monotone: (sx − sy)T (x − y) ≥ 0 for all sx ∈ ∂f(x) and sy ∈ ∂f(y) • Proof: Add two copies of subdifferential definition f(y) ≥ f(x) + sT x (y − x) with x and y swapped • ∂f : R → 2R : Minimum slope 0 and maximum slope ∞ ∂f x 10
  • 11. Monotonicity beyond subdifferentials • Let A : Rn → 2Rn be monotone, i.e.: (u − v)T (x − y) ≥ 0 for all u ∈ Ax and v ∈ Ay • If n = 1, then A = ∂f for some function f : R → R ∪ {∞} • If n ≥ 2 there exist monotone A that are not subdifferentials 11
  • 12. Maximal monotonicity • Let the set gph ∂f := {(x, u) : u ∈ ∂f(x)} be the graph of ∂f • ∂f is maximally monotone if no other function g exists with gph ∂f ⊂ gph ∂g, with strict inclusion • A result (due to Rockafellar): f is closed convex if and only if ∂f is maximally monotone 12
  • 13. Minty’s theorem • Let ∂f : Rn → 2Rn and α > 0 • ∂f is maximally monotone if and only if range(αI + ∂f) = Rn ∂f1 x maximally monotone ∂f2 x not maximally monotone ∂f1 + αI x full range ∂f2 + αI x not full range • Interpretation: No “holes” in gph ∂f 13
  • 14. Strong convexity • Recall that f is σ-strongly convex if f − σ 2 k · k2 2 is convex • If f is σ-strongly convex then f(y) ≥ f(x) + sT (y − x) + σ 2 kx − yk2 2 holds for all x ∈ dom∂f, s ∈ ∂f(x), and y ∈ Rn • The function has convex quadratic minorizers instead of affine f(y) f(x1) + sT 1 (y − x1) + σ 2 kx1 − yk2 2 x1 (s1, −1) f(x2) + sT 2,1(y − x2) + σ 2 kx2 − yk2 2 (s2,1, −1) f(x2) + sT 2,2(y − x2) + σ 2 kx2 − yk2 2 x2 (s2,2, −1) • Multiple lower bounds at x2 with subgradients s2,1 and s2,2 14
  • 15. Strong monotonicity • If f σ-strongly convex function, then ∂f is σ-strongly monotone: (sx − sy)T (x − y) ≥ σkx − yk2 2 for all sx ∈ ∂f(x) and sy ∈ ∂f(y) • Proof: Add two copies of strong convexity inequality f(y) ≥ f(x) + sT x (y − x) + σ 2 kx − yk2 2 with x and y swapped • ∂f is σ-strongly monotone if and only if ∂f − σI is monotone • ∂f : R → 2R : Minimum slope σ and maximum slope ∞ ∂f x 15
  • 16. Strongly convex functions – An equivalence The following are equivalent (i) f is closed and σ-strongly convex (ii) ∂f is maximally monotone and σ-strongly monotone Proof: (i)⇒(ii): we know this from before (ii)⇒(i): (ii) ⇒ ∂f − σI = ∂(f − σ 2 k · k2 2) maximally monotone ⇒ f − σ 2 k · k2 2 closed convex ⇒ f closed and σ-strongly convex 16
  • 17. Smoothness and convexity • A differentiable function f : Rn → R is convex and smooth if f(y) ≤ f(x) + ∇f(x)T (y − x) + β 2 kx − yk2 2 f(y) ≥ f(x) + ∇f(x)T (y − x) holds for all x, y ∈ Rn • f has convex quadratic majorizers and affine minorizers f(x1) + ∇f(x1)T (y − x1) + β 2 kx1 − yk2 2 x1 (∇f(x2), −1) f(x2) + ∇f(x2)T (y − x2) + β 2 kx2 − yk2 2 x2 (∇f(x2), −1) f(y) • Quadratic upper bound is called descent lemma 17
  • 18. Gradient of smooth convex function • Gradient of smooth convex function is monotone and Lipschitz (∇f(x) − ∇f(y))T (x − y) ≥ 0 k∇f(y) − ∇f(x)k2 ≤ βkx − yk2 • ∇f : R → R: Minimum slope 0 and maximum slope β ∇f x • Actually satisfies the stronger 1 β -cocoercivity property: (∇f(x) − ∇f(y))T (x − y) ≥ 1 β k∇f(y) − ∇f(x)k2 2 18
  • 19. Smooth convex functions – Equivalences Let f : Rn → R be differentiable. The following are equivalent: (i) ∇f is 1 β -cocoercive (ii) ∇f is maximally monotone and β-Lipschitz continuous (iii) f is closed convex and satisfies descent lemma (is β-smooth) • Implication (ii)⇒(i) is called the Baillon-Haddad theorem • Will connect smoothness and strong convexity via conjugates in next lecture 19
  • 20. Fermat’s rule Let f : Rn → R ∪ {∞}, then x minimizes f if and only if 0 ∈ ∂f(x) • Proof: x minimizes f if and only if f(y) ≥ f(x) + 0T (y − x) for all y ∈ Rn which by definition of subdifferential is equivalent to 0 ∈ ∂f(x) • Example: several subgradients at solution, including 0 (0, −1) 20
  • 21. Fermat’s rule – Nonconvex example • Fermat’s rule holds also for nonconvex functions • Example: x1 x2 (0, −1) • ∂f(x1) = 0 and ∇f(x1) = 0 (global minimum) • ∂f(x2) = ∅ and ∇f(x2) = 0 (local minimum) • For nonconvex f, we can typically only hope to find local minima 21
  • 22. Subdifferential calculus rules • Subdifferential of sum ∂(f1 + f2) • Subdifferential of composition with matrix ∂(g ◦ L) 22
  • 23. Subdifferential of sum If f1, f2 closed convex and relint domf1 ∩ relint domf2 6= ∅: ∂(f1 + f2) = ∂f1 + ∂f2 • One direction always holds: if x ∈ dom∂f1 ∩ dom∂f2: ∂(f1 + f2)(x) ⊇ ∂f1(x) + ∂f2(x) Proof: let si ∈ ∂fi(x), add subdifferential definitions: f1(y) + f2(y) ≥ f1(x) + f2(x) + (s1 + s2)T (y − x) i.e. s1 + s2 ∈ ∂(f1 + f2)(x) • If f1 and f2 differentiable, we have (without convexity of f) ∇(f1 + f2) = ∇f1 + ∇f2 23
  • 24. Subdifferential of composition If f closed convex and relint dom(f ◦ L) 6= ∅: ∂(f ◦ L)(x) = LT ∂f(Lx) • One direction always holds: If Lx ∈ domf, then ∂(f ◦ L)(x) ⊇ LT ∂f(Lx) Proof: let s ∈ ∂f(Lx), then by definition of subgradient of f: (f ◦ L)(y) ≥ (f ◦ L)(x) + sT (Ly − Lx) = (f ◦ L)(x) + (LT s)T (y − x) i.e., LT s ∈ ∂(f ◦ L)(x) • If f differentiable, we have chain rule (without convexity of f) ∇(f ◦ L)(x) = LT ∇f(Lx) 24
  • 25. A sufficient optimality condition Let f : Rm → R, g : Rn → R, and L ∈ Rm×n then: minimize f(Lx) + g(x) (1) is solved by every x ∈ Rn that satisfies 0 ∈ LT ∂f(Lx) + ∂g(x) (2) • Subdifferential calculus inclusions say: 0 ∈ LT ∂f(Lx) + ∂g(x) ⊆ ∂((f ◦ L)(x) + g(x)) which by Fermat’s rule is equivalent to x solution to (1) • Note: (1) can have solution but no x exists that satisfies (2) 25
  • 26. A necessary and sufficient optimality condition Let f : Rm → R, g : Rn → R, L ∈ Rm×n with f, g closed convex and assume relint dom(f ◦ L) ∩ relint domg 6= ∅ then: minimize f(Lx) + g(x) (1) is solved by x ∈ Rn if and only if x satisfies 0 ∈ LT ∂f(Lx) + ∂g(x) (2) • Subdifferential calculus equality rules say: 0 ∈ LT ∂f(Lx) + ∂g(x) = ∂((f ◦ L)(x) + g(x)) which by Fermat’s rule is equivalent to x solution to (1) • Algorithms search for x that satisfy 0 ∈ LT ∂f(Lx) + ∂g(x) 26
  • 27. A comment to constraint qualification • The condition relint dom(f ◦ L) ∩ relint domg 6= ∅ is called constraint qualification and referred to as CQ • It is a mild condition that rarely is not satisfied dom(f ◦ L) domg no solution dom(f ◦ L) domg solution no CQ dom(f ◦ L) domg solution CQ 27
  • 28. Evaluating subgradients of convex functions • Obviously need to evaluate subdifferentials to solve 0 ∈ LT ∂f(Lx) + ∂g(x) • Explicit evaluation: • If function is differentiable: ∇f (unique) • If function is nondifferentiable: compute element in ∂f • Implicit evaluation: • Proximal operator (specific element of subdifferential) 28
  • 30. Proximal operator • Proximal operator of (convex) g defined as: proxγg(z) = argmin x (g(x) + 1 2γ kx − zk2 2) where γ > 0 is a parameter • Evaluating prox requires solving optimization problem • Objective is strongly convex ⇒ solution exists and is unique 30
  • 31. Prox evaluates the subdifferential • Fermat’s rule on prox definition: x = proxγg(z) if and only if 0 ∈ ∂g(x) + γ−1 (x − z) ⇔ γ−1 (z − x) ∈ ∂g(x) Hence, γ−1 (z − x) is element in ∂g(x) • A subgradient ∂g(x) where x = proxγg(z) is computed • Often used in algorithms when g nonsmooth (no gradient exists) 31
  • 32. Prox is generalization of projection • Recall the indicator function of a set C ιC(x) := ( 0 if x ∈ C ∞ otherwise • Then ΠC(z) = argmin x (kx − zk2 : x ∈ C) = argmin x (1 2 kx − zk2 2 : x ∈ C) = argmin x (1 2 kx − zk2 2 + ιC(x)) = proxιC (z) • Projection onto C equals prox of indicator function of C 32
  • 33. Proximal operator – Example 1 Let g(x) = 1 2 xT Hx + hT x with H positive semidefinite • Gradient satisfies ∇g(x) = Hx + h • Fermat’s rule for x = proxγgz: 0 = ∇g(x) + γ−1 (x − z) ⇔ 0 = Hx + h + γ−1 (x − z) ⇔ (I + γH)x = z − γh ⇔ x = (I + γH)−1 (z − γh) • So proxγgz = (I + γH)−1 (z − γh) 33
  • 34. Proximal operator – Example 2 • Consider the function g with subdifferential ∂g: g(x) = ( −x if x ≤ 0 0 if x ≥ 0 ∂g(x) =      −1 if x < 0 [−1, 0] if x = 0 0 if x > 0 • Graphical representations (−1, −1) (−1, −1) (−0.5, −1) (0, −1) (0, −1) g(x) x ∂g(x) x • Fermat’s rule for x = proxγgz: 0 ∈ ∂g(x) + γ−1 (x − z) 34
  • 35. Proximal operator – Example 2 cont’d • Let x < 0, then Fermat’s rule reads 0 = −1 + γ−1 (x − z) ⇔ x = z + γ which is valid (x < 0) if z < −γ • Let x = 0, then Fermat’s rule reads 0 = [−1, 0] + γ−1 (0 − z) which is valid (x = 0) if z ∈ [−γ, 0] • Let x > 0, then Fermat’s rule reads 0 = 0 + γ−1 (x − z) ⇔ x = z which is valid (x > 0) if z > 0 • The prox satisfies proxγg(z) =      z + γ if z < −γ 0 if z ∈ [−γ, 0] z if z > 0 35
  • 36. Computational cost • Evaluating prox requires solving optimization problem proxγg(z) = argmin x (g(x) + 1 2γ kx − zk2 2) • Prox typically more expensive to evaluate than gradient • Example: Quadratic g(x) = 1 2 xT Hx + hT x: proxγg(z) = (I + γH)−1 (z − γh), ∇g(z) = Hz − h 36