1. L. Vandenberghe ECE236C (Spring 2022)
5. Conjugate functions
• closed functions
• conjugate function
• duality
5.1
2. Closed set
a set 𝐶 is closed if it contains its boundary:
𝑥𝑘 ∈ 𝐶, 𝑥𝑘 → ¯
𝑥 =⇒ ¯
𝑥 ∈ 𝐶
Operations that preserve closedness
• the intersection of (finitely or infinitely many) closed sets is closed
• the union of a finite number of closed sets is closed
• inverse under linear mapping: {𝑥 | 𝐴𝑥 ∈ 𝐶} is closed if 𝐶 is closed
Conjugate functions 5.2
3. Image under linear mapping
the image of a closed set under a linear mapping is not necessarily closed
Example
𝐶 = {(𝑥1, 𝑥2) ∈ R2
+ | 𝑥1𝑥2 ≥ 1}, 𝐴 =
1 0
, 𝐴𝐶 = R++
Sufficient condition: 𝐴𝐶 is closed if
• 𝐶 is closed and convex
• and 𝐶 does not have a recession direction in the nullspace of 𝐴, i.e.,
𝐴𝑦 = 0, ˆ
𝑥 ∈ 𝐶, ˆ
𝑥 + 𝛼𝑦 ∈ 𝐶 for all 𝛼 ≥ 0 =⇒ 𝑦 = 0
in particular, this holds for any matrix 𝐴 if 𝐶 is bounded
Conjugate functions 5.3
4. Closed function
Definition: a function is closed if its epigraph is a closed set
Examples
• 𝑓 (𝑥) = − log(1 − 𝑥2) with dom 𝑓 = {𝑥 | |𝑥| 1}
• 𝑓 (𝑥) = 𝑥 log 𝑥 with dom 𝑓 = R+ and 𝑓 (0) = 0
• indicator function of a closed set 𝐶:
𝛿𝐶 (𝑥) =
0 𝑥 ∈ 𝐶
+∞ otherwise
Not closed
• 𝑓 (𝑥) = 𝑥 log 𝑥 with dom 𝑓 = R++, or with dom 𝑓 = R+ and 𝑓 (0) = 1
• indicator function of a set 𝐶 if 𝐶 is not closed
Conjugate functions 5.4
5. Properties
Sublevel sets: 𝑓 is closed if and only if all its sublevel sets are closed
Minimum: if 𝑓 is closed with bounded sublevel sets then it has a minimizer
Common operations on convex functions that preserve closedness
• sum: 𝑓 = 𝑓1 + 𝑓2 is closed if 𝑓1 and 𝑓2 are closed
• composition with affine mapping: 𝑓 (𝑥) = 𝑔(𝐴𝑥 + 𝑏) is closed if 𝑔 is closed
• supremum: 𝑓 (𝑥) = sup𝛼 𝑓𝛼(𝑥) is closed if each function 𝑓𝛼 is closed
in each case, we assume dom 𝑓 ≠ ∅
Conjugate functions 5.5
7. Conjugate function
the conjugate of a function 𝑓 is
𝑓 ∗
(𝑦) = sup
𝑥∈dom 𝑓
(𝑦𝑇
𝑥 − 𝑓 (𝑥))
𝑓 ∗ is closed and convex (even when 𝑓 is not)
Fenchel’s inequality: the definition implies that
𝑓 (𝑥) + 𝑓 ∗
(𝑦) ≥ 𝑥𝑇
𝑦 for all 𝑥, 𝑦
this is an extension to non-quadratic convex 𝑓 of the inequality
1
2
𝑥𝑇
𝑥 +
1
2
𝑦𝑇
𝑦 ≥ 𝑥𝑇
𝑦
Conjugate functions 5.6
10. Indicator function and norm
Indicator of convex set 𝐶: conjugate is the support function of 𝐶
𝛿𝐶 (𝑥) =
0 𝑥 ∈ 𝐶
+∞ 𝑥 ∉ 𝐶
𝛿∗
𝐶 (𝑦) = sup
𝑥∈𝐶
𝑦𝑇
𝑥
Indicator of convex cone 𝐶: conjugate is indicator of polar (negative dual) cone
𝛿∗
𝐶 (𝑦) = 𝛿−𝐶∗ (𝑦) = 𝛿𝐶∗ (−𝑦) =
0 𝑦𝑇𝑥 ≤ 0 ∀𝑥 ∈ 𝐶
+∞ otherwise
Norm: conjugate is indicator of unit ball for dual norm
𝑓 (𝑥) = k𝑥k 𝑓 ∗
(𝑦) =
0 k𝑦k∗ ≤ 1
+∞ k𝑦k∗ 1
(see next page)
Conjugate functions 5.9
11. Proof: recall the definition of dual norm
k𝑦k∗ = sup
k𝑥k≤1
𝑥𝑇
𝑦
to evaluate 𝑓 ∗(𝑦) = sup𝑥 (𝑦𝑇𝑥 − k𝑥k) we distinguish two cases
• if k𝑦k∗ ≤ 1, then (by definition of dual norm)
𝑦𝑇
𝑥 ≤ k𝑥k for all 𝑥
and equality holds if 𝑥 = 0; therefore sup𝑥 (𝑦𝑇𝑥 − k𝑥k) = 0
• if k𝑦k∗ 1, there exists an 𝑥 with k𝑥k ≤ 1, 𝑥𝑇 𝑦 1; then
𝑓 ∗
(𝑦) ≥ 𝑦𝑇
(𝑡𝑥) − k𝑡𝑥k = 𝑡(𝑦𝑇
𝑥 − k𝑥k)
and right-hand side goes to infinity if 𝑡 → ∞
Conjugate functions 5.10
12. Calculus rules
Separable sum
𝑓 (𝑥1, 𝑥2) = 𝑔(𝑥1) + ℎ(𝑥2) 𝑓 ∗
(𝑦1, 𝑦2) = 𝑔∗
(𝑦1) + ℎ∗
(𝑦2)
Scalar multiplication (𝛼 0)
𝑓 (𝑥) = 𝛼𝑔(𝑥) 𝑓 ∗(𝑦) = 𝛼𝑔∗(𝑦/𝛼)
𝑓 (𝑥) = 𝛼𝑔(𝑥/𝛼) 𝑓 ∗(𝑦) = 𝛼𝑔∗(𝑦)
• the operation 𝑓 (𝑥) = 𝛼𝑔(𝑥/𝛼) is sometimes called “right scalar multiplication”
• a convenient notation is 𝑓 = 𝑔𝛼 for the function (𝑔𝛼)(𝑥) = 𝛼𝑔(𝑥/𝛼)
• conjugates can be written concisely as (𝑔𝛼)∗ = 𝛼𝑔∗ and (𝛼𝑔)∗ = 𝑔∗𝛼
Conjugate functions 5.11
14. The second conjugate
𝑓 ∗∗
(𝑥) = sup
𝑦∈dom 𝑓 ∗
(𝑥𝑇
𝑦 − 𝑓 ∗
(𝑦))
• 𝑓 ∗∗ is closed and convex
• from Fenchel’s inequality, 𝑥𝑇 𝑦 − 𝑓 ∗(𝑦) ≤ 𝑓 (𝑥) for all 𝑦 and 𝑥; therefore
𝑓 ∗∗
(𝑥) ≤ 𝑓 (𝑥) for all 𝑥
equivalently, epi 𝑓 ⊆ epi 𝑓 ∗∗ (for any 𝑓 )
• if 𝑓 is closed and convex, then
𝑓 ∗∗
(𝑥) = 𝑓 (𝑥) for all 𝑥
equivalently, epi 𝑓 = epi 𝑓 ∗∗ (if 𝑓 is closed and convex); proof on next page
Conjugate functions 5.13
15. Proof (by contradiction): assume 𝑓 is closed and convex, and epi 𝑓 ∗∗ ≠ epi 𝑓
suppose (𝑥, 𝑓 ∗∗(𝑥)) ∉ epi 𝑓 ; then there is a strict separating hyperplane:
𝑎
𝑏
𝑇
𝑧 − 𝑥
𝑠 − 𝑓 ∗∗(𝑥)
≤ 𝑐 0 for all (𝑧, 𝑠) ∈ epi 𝑓
holds for some 𝑎, 𝑏, 𝑐 with 𝑏 ≤ 0 (𝑏 0 gives a contradiction as 𝑠 → ∞)
• if 𝑏 0, define 𝑦 = 𝑎/(−𝑏) and maximize left-hand side over (𝑧, 𝑠) ∈ epi 𝑓 :
𝑓 ∗
(𝑦) − 𝑦𝑇
𝑥 + 𝑓 ∗∗
(𝑥) ≤ 𝑐/(−𝑏) 0
this contradicts Fenchel’s inequality
• if 𝑏 = 0, choose ˆ
𝑦 ∈ dom 𝑓 ∗ and add small multiple of ( ˆ
𝑦, −1) to (𝑎, 𝑏):
𝑎 + 𝜖 ˆ
𝑦
−𝜖
𝑇
𝑧 − 𝑥
𝑠 − 𝑓 ∗∗(𝑥)
≤ 𝑐 + 𝜖
𝑓 ∗
( ˆ
𝑦) − 𝑥𝑇
ˆ
𝑦 + 𝑓 ∗∗
(𝑥)
0
now apply the argument for 𝑏 0
Conjugate functions 5.14
18. Strongly convex function
Definition (page 1.18) 𝑓 is 𝜇-strongly convex (for k · k) if dom 𝑓 is convex and
𝑓 (𝜃𝑥 + (1 − 𝜃)𝑦) ≤ 𝜃 𝑓 (𝑥) + (1 − 𝜃) 𝑓 (𝑦) −
𝜇
2
𝜃(1 − 𝜃)k𝑥 − 𝑦k2
for all 𝑥, 𝑦 ∈ dom 𝑓 and 𝜃 ∈ [0, 1]
First-order condition
• if 𝑓 is 𝜇-strongly convex, then
𝑓 (𝑦) ≥ 𝑓 (𝑥) + 𝑔𝑇
(𝑦 − 𝑥) +
𝜇
2
k𝑦 − 𝑥k2
for all 𝑥, 𝑦 ∈ dom 𝑓 , 𝑔 ∈ 𝜕 𝑓 (𝑥)
• for differentiable 𝑓 this is the inequality (4) on page 1.19
Conjugate functions 5.17
19. Proof
• recall the definition of directional derivative (page 2.28 and 2.29):
𝑓 0
(𝑥, 𝑦 − 𝑥) = inf
𝜃0
𝑓 (𝑥 + 𝜃(𝑦 − 𝑥)) − 𝑓 (𝑥)
𝜃
and the infimum is approached as 𝜃 → 0
• if 𝑓 is 𝜇-strongly convex and subdifferentiable at 𝑥, then for all 𝑦 ∈ dom 𝑓 ,
𝑓 0
(𝑥, 𝑦 − 𝑥) ≤ inf
𝜃∈(0,1]
(1 − 𝜃) 𝑓 (𝑥) + 𝜃 𝑓 (𝑦) − (𝜇/2)𝜃(1 − 𝜃)k𝑦 − 𝑥k2 − 𝑓 (𝑥)
𝜃
= 𝑓 (𝑦) − 𝑓 (𝑥) −
𝜇
2
k𝑦 − 𝑥k2
• from page 2.31, the directional derivative is the support function of 𝜕 𝑓 (𝑥):
𝑔𝑇
(𝑦 − 𝑥) ≤ sup
˜
𝑔∈𝜕 𝑓 (𝑥)
˜
𝑔𝑇
(𝑦 − 𝑥)
= 𝑓 0
(𝑥; 𝑦 − 𝑥)
≤ 𝑓 (𝑦) − 𝑓 (𝑥) −
𝜇
2
k𝑦 − 𝑥k2
Conjugate functions 5.18
20. Conjugate of strongly convex function
assume 𝑓 is closed and strongly convex with parameter 𝜇 0 for the norm k · k
• 𝑓 ∗ is defined for all 𝑦 (i.e., dom 𝑓 ∗ = R𝑛)
• 𝑓 ∗ is differentiable everywhere, with gradient
∇ 𝑓 ∗
(𝑦) = argmax
𝑥
(𝑦𝑇
𝑥 − 𝑓 (𝑥))
• ∇ 𝑓 ∗ is Lipschitz continuous with constant 1/𝜇 for the dual norm k · k∗:
k∇ 𝑓 ∗
(𝑦) − ∇ 𝑓 ∗
(𝑦0
)k ≤
1
𝜇
k𝑦 − 𝑦0
k∗ for all 𝑦 and 𝑦0
Conjugate functions 5.19
21. Proof: if 𝑓 is strongly convex and closed
• 𝑦𝑇𝑥 − 𝑓 (𝑥) has a unique maximizer 𝑥 for every 𝑦
• 𝑥 maximizes 𝑦𝑇𝑥 − 𝑓 (𝑥) if and only if 𝑦 ∈ 𝜕 𝑓 (𝑥); from page 5.15
𝑦 ∈ 𝜕 𝑓 (𝑥) ⇐⇒ 𝑥 ∈ 𝜕 𝑓 ∗
(𝑦) = {∇ 𝑓 ∗
(𝑦)}
hence ∇ 𝑓 ∗(𝑦) = argmax𝑥 (𝑦𝑇𝑥 − 𝑓 (𝑥))
• from first-order condition on page 5.17: if 𝑦 ∈ 𝜕 𝑓 (𝑥), 𝑦0 ∈ 𝜕 𝑓 (𝑥0):
𝑓 (𝑥0
) ≥ 𝑓 (𝑥) + 𝑦𝑇
(𝑥0
− 𝑥) +
𝜇
2
k𝑥0
− 𝑥k2
𝑓 (𝑥) ≥ 𝑓 (𝑥0
) + (𝑦0
)𝑇
(𝑥 − 𝑥0
) +
𝜇
2
k𝑥0
− 𝑥k2
combining these inequalities shows
𝜇k𝑥 − 𝑥0
k2
≤ (𝑦 − 𝑦0
)𝑇
(𝑥 − 𝑥0
) ≤ k𝑦 − 𝑦0
k∗k𝑥 − 𝑥0
k
• now substitute 𝑥 = ∇ 𝑓 ∗(𝑦) and 𝑥0 = ∇ 𝑓 ∗(𝑦0)
Conjugate functions 5.20
23. Duality
primal: minimize 𝑓 (𝑥) + 𝑔(𝐴𝑥)
dual: maximize −𝑔∗
(𝑧) − 𝑓 ∗
(−𝐴𝑇
𝑧)
• follows from Lagrange duality applied to reformulated primal
minimize 𝑓 (𝑥) + 𝑔(𝑦)
subject to 𝐴𝑥 = 𝑦
dual function for the formulated problem is:
inf
𝑥,𝑦
( 𝑓 (𝑥) + 𝑧𝑇
𝐴𝑥 + 𝑔(𝑦) − 𝑧𝑇
𝑦) = − 𝑓 ∗
(−𝐴𝑇
𝑧) − 𝑔∗
(𝑧)
• Slater’s condition (for convex 𝑓 , 𝑔): strong duality holds if there exists an ˆ
𝑥 with
ˆ
𝑥 ∈ int dom 𝑓 , 𝐴ˆ
𝑥 ∈ int dom 𝑔
this also guarantees that the dual optimum is attained if optimal value is finite
Conjugate functions 5.21
24. Set constraint
minimize 𝑓 (𝑥)
subject to 𝐴𝑥 − 𝑏 ∈ 𝐶
Primal and dual problem
primal: minimize 𝑓 (𝑥) + 𝛿𝐶 (𝐴𝑥 − 𝑏)
dual: maximize −𝑏𝑇
𝑧 − 𝛿∗
𝐶 (𝑧) − 𝑓 ∗
(−𝐴𝑇
𝑧)
Examples
constraint set 𝐶 support function 𝛿∗
𝐶 (𝑧)
equality 𝐴𝑥 = 𝑏 {0} 0
norm inequality k𝐴𝑥 − 𝑏k ≤ 1 unit k · k-ball k𝑧k∗
conic inequality 𝐴𝑥 𝐾 𝑏 −𝐾 𝛿𝐾∗ (𝑧)
Conjugate functions 5.22
25. Norm regularization
minimize 𝑓 (𝑥) + k𝐴𝑥 − 𝑏k
• take 𝑔(𝑦) = k𝑦 − 𝑏k in general problem
minimize 𝑓 (𝑥) + 𝑔(𝐴𝑥)
• conjugate of k · k is indicator of unit ball for dual norm
𝑔∗
(𝑧) = 𝑏𝑇
𝑧 + 𝛿𝐵(𝑧) where 𝐵 = {𝑧 | k𝑧k∗ ≤ 1}
• hence, dual problem can be written as
maximize −𝑏𝑇 𝑧 − 𝑓 ∗(−𝐴𝑇 𝑧)
subject to k𝑧k∗ ≤ 1
Conjugate functions 5.23
26. Optimality conditions
minimize 𝑓 (𝑥) + 𝑔(𝑦)
subject to 𝐴𝑥 = 𝑦
assume 𝑓 , 𝑔 are convex and Slater’s condition holds
Optimality conditions: 𝑥 is optimal if and only if there exists a 𝑧 such that
1. primal feasibility: 𝑥 ∈ dom 𝑓 and 𝑦 = 𝐴𝑥 ∈ dom 𝑔
2. 𝑥 and 𝑦 = 𝐴𝑥 are minimizers of the Lagrangian 𝑓 (𝑥) + 𝑧𝑇 𝐴𝑥 + 𝑔(𝑦) − 𝑧𝑇 𝑦:
−𝐴𝑇
𝑧 ∈ 𝜕 𝑓 (𝑥), 𝑧 ∈ 𝜕𝑔(𝐴𝑥)
if 𝑔 is closed, this can be written symmetrically as
−𝐴𝑇
𝑧 ∈ 𝜕 𝑓 (𝑥), 𝐴𝑥 ∈ 𝜕𝑔∗
(𝑧)
Conjugate functions 5.24
27. References
• J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and Minimization Algoritms
(1993), chapter X.
• D.P. Bertsekas, A. Nedić, A.E. Ozdaglar, Convex Analysis and Optimization
(2003), chapter 7.
• R. T. Rockafellar, Convex Analysis (1970).
Conjugate functions 5.25