Nonsmooth Optimization

1,815 views

Published on

AACIMP 2009 Summer School lecture by
Başak Akteke-Öztürk. "Modern Operational Research and Its Mathematical Methods" course.

Published in: Education, Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,815
On SlideShare
0
From Embeds
0
Number of Embeds
79
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Nonsmooth Optimization

  1. 1. 1
  2. 2. Preliminaries • Rn, n-dimensional real Euclidean space and x, y ∈ Rn n • Usual inner product (x, y) = xT y = [ xiyi] i=1 1 • Euclidean norm x = (x, x) = (xT x) 2 • f : O → R is smooth (continuously differentiable), if the gradient f : O → R is defined and continuous on an open T ∂f (x) ∂f (x) ∂f (x) set O ⊆ Rn: f (x) = , ,..., ∂x1 ∂x2 ∂xn 2
  3. 3. Smooth Functions - Directional Derivative • Directional derivatives f (x; u), f (x; −u) of f at x ∈ O, in the direction of u ∈ Rn: f (x + αu) − f (x) f (x; u) := lim = ( f (x), u), α→+0 α • f (x; e1), f (x; e2), . . . , f (x; en), ei(i = 1, 2, . . . , n) unit vectors • ( f (x), e1) = fx1 , ( f (x), e2) = fx2 and ( f (x), en) = fxn . • Note that f (x; u) = −f (x; −u). 3
  4. 4. Smooth Functions - 1st order approximation • A first-order approximation of f near x ∈ O by means of the Taylor series with remainder term: f (x + δ) = f (x) + ( f (x), δ) + ox(δ) (x + δ ∈ O), ox(αδ) • lim = 0 where δ ∈ Rn is small enough. α→0 α • a smooth function can be locally replaced by a “simple” linear approximation of it 4
  5. 5. Smooth Functions - Optimality Conditions First-order necessary conditions for an extremum: • For x∗ ∈ O to be a local minimizer of f on Rn, it is necessary that f (x∗) = 0n, • For x∗ ∈ O to be a local maximizer of f on Rn, it is necessary that f (x∗) = 0n. 5
  6. 6. Smooth Functions - Descent/Ascent Directions Directions of steepest descent and ascent if x is not a stationary point, • the unit steepest descent direction ud of the function f at a f (x) point x: ud(x) = − , f (x) • the unit steepest ascent direction ua of the function f at a f (x) point x: ua(x) = . f (x) • One steepest descent direction, only one steepest ascent di- rection and u0(x) = −u1(x) 6
  7. 7. Smooth Functions - Chain Rule • Chain rule: Let f : Rn → R, g : Rn → R, h : Rn → Rn. • If f ∈ C 1(O), g ∈ C 1(O) and f (x) = g(h(x)) then, T f (x) = T g(h(x)) h(x) ∂hj (x) • h(x) = is an n × n matrix. ∂xi i,j=1,2,...,n 7
  8. 8. Nonsmooth Optimization • Deals with nondifferentiable functions • The problem is to find a proper replacement for the concept of gradient • Different research groups work on nonsmooth function classes; hence there are different theories to handle the different non- smooth problems • Tools replacing the gradient 8
  9. 9. Keywords of Nonsmooth Optimization • Convex Functions, Lipschitz Continuous Functions • Generalized directional derivatives, Generalized Derivatives • Subgradient method, Bundle method, Discrete Gradient Al- gorithm • Asplund Spaces 9
  10. 10. Convex Functions • O ⊆ Rn a nonempty convex set if αx + (1 − α)y ∈ O for all x, y ∈ O, α ∈ [0, 1] • f : O → R, R := [−∞, ∞] s.t. f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) for any x, y ∈ O, λ ∈ [0, 1]. 10
  11. 11. Convex Functions • Every local minimum is a global minimum • ξ a subgradient of f at a nondifferentiable point x ∈ domf if it satisfies the subgradient inequality, i.e., f (y) ≥ f (x) + (ξ, y − x). • Set of subgradients of is called subdifferential, ∂f (x) ∂f (x) := {ξ ∈ Rn | f (y) ≥ f (x) + (ξ, y − x) ∀y ∈ Rn}. 11
  12. 12. Convex Functions • The subgradients at a point can be characterized by direc- tional derivative: f (x; u) = sup (ξ, u). ξ∈∂f (x) • x in the interior of domf , subdifferential ∂f (x) is compact then the directional derivative is finite • Subdifferential in relation with the directional derivative ∂f (x) = {ξ ∈ Rn | f (x; u) ≥ (ξ, u) ∀u ∈ Rn}. 12
  13. 13. Lipschitz Continuous Functions • f : O → R is Lipschitz continuous for some constant K if for all y, z in an open set O: |f (y) − f (z)| ≤ K y − z • Differentiable almost everywhere • Clarke subdifferential ∂C f (x) of Lipschitz continuous f at x ∂C f (x) = co{ξ ∈ Rn | ξ = lim f (xk ), xk → x, xk ∈ D} k→∞ D is the set where the function is differentiable. 13
  14. 14. Lipschitz Continuous Functions • Mean Value Theorem for Clarke subdifferentials ξ f (b) − f (a) = (ξ, b − a) • Nonsmooth chain rule with respect to Clarke subdifferential m ∂C (g ◦ F )(x) ⊆ co ξiµi | ξ = (ξ1, ξ2, . . . , ξm) ∈ ∂C g(F (x)) i=1 µi ∈ ∂C fi(x) (i = 1, 2, . . . , m) • F (·) = (f1(·), f2(·), . . . , fm(·)) a vector valued function, g : Rm → R, g ◦ F : Rn → R are Lipschitz continuous 14
  15. 15. Regular Functions • Locally Lipschitz functions have directional derivative fC (x; u) = f (x; u) • Ex: Semismooth functions: f : Rn → R at x ∈ Rn is locally Lipschitz for every u ∈ Rn the following limit exists: lim (ξ, u) ξ∈∂f (x+αu) v→u α→+0 15
  16. 16. Max- and Min-type Functions • f (x) = max {f1(x), f2(x), . . . , fm(x)}, fi : Rn → R (i = 1, 2, . . . , m)     • ∂C f (x) ⊆ co ∂C fi(x) , i∈J(x)   where J(x) := {i = 1, 2, . . . , m | f (x) = fi(x)} • Ex: f (x) = max {f1(x), f2(x)} 16
  17. 17. Quasidifferentiable Functions • f : Rn → R is quasidifferentiable if f (x; u) exist finitely ∀x in the direction u and ¯ there exists [∂f (x), ∂ f (x)] • f (x; u) = max (ξ, u) + min (φ, u) ξ∈∂f (x) ¯ φ∈∂ f (x) ¯ • [∂f (x), ∂ f (x)] is the quasidifferential, ∂f (x) subdifferential, ∂f (x) superdifferential 17
  18. 18. Directional Derivatives f : O → R, O ⊂ Rn, x ∈ O in the direction u ∈ Rn • Dini Directional Derivative • Hadamard Directional Derivative • Clarke Directional Derivative • Michel-Penot Directional Derivative 18
  19. 19. Dini Directional Derivative • upper Dini directionally differentiable fD (x; u) := lim sup f (x+αu)−f (x) α α→+0 • lower Dini directionally differentiable fD (x; −u) := lim inf f (x+αu)−f (x) α α→+0 • Dini subdifferentiable fD (x; u) = fD (x; −u) 19
  20. 20. Hadamard Directional Derivative • upper Hadamard directionally differentiable fH (x; u) := lim sup f (x+αv)−f (x) α α→+0v→u • lower Hadamard directionally differentiable fH (x; −u) := lim inf f (x+αv)−f (x) α α→+0v→u • Hadamard Subdifferentiable fH (x; u) = fH (x; −u) 20
  21. 21. Clarke Directional Derivative • upper Clarke directionally differentiable fC (x; u) := lim sup f (x+αu)−f (y) α y→xα→+0 • lower Clarke directionally differentiable fC (x; −u) := lim inf f (x+αu)−f (y) α y→xα→+0 • Clarke Subdifferentiable fC (x; u) = fC (x; −u) 21
  22. 22. Michel-Penot Directional Derivative • upper Michel-Penot directionally differentiable 1 fM P (x; u) := sup {lim sup α [f (x + α(u + v)) − f (x + αv)]} v∈Rn α→0 • lower Michel-Penot directionally differentiable 1 fM P (x; −u) := inf {lim inf α [f (x + α(u + v)) − f (x + αv)]} v∈Rn α→0 • Michel-Penot Subdifferentiable fM P (x; u) = fM P (x; −u) 22
  23. 23. Subdifferentials and Optimality Conditions • f (x; u) = max (ξ, u) ∀u ∈ Rn ξ∈∂f (x) • For a point x∗ to be a minimizer, it is necessary that 0n ∈ ∂f (x) • A point x∗ satisfying 0n ∈ ∂f (x) is called stationary point 23
  24. 24. Nonsmooth Optimization Methods • Subgradient Algorithm (and -Subgradient Methods) • Bundle Methods • Discrete Gradients 24
  25. 25. Descent Methods • min f (x) subject to x ∈ Rn • Objective is to find dk f (xk + dk ) < f (xk ), • min f (xk + d) − f (xk ) subject to d ∈ Rn. • f (x) twice continuously differentiable, expanding f (xk + d) f (xk + d) − f (xk ) = f (xk , d) + d (d) (d) → 0 as d → 0 25
  26. 26. Descent Methods • We know f (xk , d) = f (xk )T d • min f (xk )T d d∈Rn subject to d ≤ 1. • Search direction in descent is obtained − f (xk ) f (x ) k • To find xk+1, a line search performed along dk to obtain t from which next point xk + tdk is computed 26
  27. 27. Subgradient Algorithm • Developed for minimizing convex functions • min f (x) subject to x ∈ Rn • x0 given, generates a sequence {xk }∞ according to k=0 x k+1 = xk − α v k , v k ∈ ∂f (xk ) k • Simple generalization of a descent method with line search • Opposite direction of subgradient is not descent line search cannot be used 27
  28. 28. Subgradient Algorithm • Does not converge to a stationary point • Special rules for computation of a step size • Theorem by Shor N.Z.: S ∗ set of minimum points of f , {xk } using step αk := α vk for any and any x∗ ∈ S ∗, one can find a k = ¯ k f (¯) = f (x¯) and x − x∗ < α(1+ ) x k ¯ 2 28
  29. 29. Bundle Method • At current iterate xk , we have trial points y j ∈ Rn (j ∈ Jk ⊂ {1, 2, . . . , k}) • Idea: underestimate f by using a piecewise-linear functions • Subdifferential of f at x: ∂f (x) = {v j ∈ Rn | (v, z − x) ≤ f (z) − f (x) ∀z ∈ Rn} • fk (x) = max {f (y j ) + (v j , x − y j )} ˆ j∈Jk • fk (x) ≤ f (x) ∀x ∈ Rn and fk (y j ) = f (y j ) j ∈ Jk ˆ ˆ 29
  30. 30. Bundle Method • Serious Step: xk+1 := y k+1 := xk + tdk , t > 0 in case a sufficient decrease achieved at xk+1, • Null Step: xk+1 := xk , in case no sufficient decrease achieved, gradient information is enriched by new subgradient vk+1 ∈ ∂f (yk+1) in the bundle. 30
  31. 31. Bundle Method • Standart concepts: serious step and null step • The convergence problem is avoided by making sure that they are descent methods. • Descent direction is found by solving a QP involving the cutting plane approximation of the function over a bunddle of subgradients. • Utilize the information from the previous iterations by storing the subgradient information into a bundle. 31
  32. 32. Asplund Spaces • Nonsmooth referred to functions, spaces can also be referred • Banach spaces: complete normed vector spaces • Frechet derivative, Gateaux derivative • f is Frechet differentiable on an open set U ⊂ V , if its Gateaux derivative linear, bounded at each point of U and the Gateaux derivative is a continuous map U → L(V, W ). • Asplund Spaces: a Banach space, every convex continuous function is generically Frechet differentiable 32
  33. 33. Referanslar Clarke, F.H., 1983. Optimization and Nonsmooth Analysis, Wiley-Interscience, New York. Demyanov, V.F., 2002. The Rise of Nonsmooth Analysis: Its Main Tools, Cybernetics and Systems Analysis, 38(4), 2002. Jongen, H. Th., Pallaschke, D., 1988. On linearization and continuous selections of functions, Optimization 19(3), 343-353. Rockafellar, R.T., 1972. Convex Analysis, Princeton University Press, New Jersey. Schittkowski K., 1992. Solving nonlinear programming problems with very many constraints, Optimization, 25, 179-196. 33
  34. 34. Weber, G.-W., 1993. Minimization of a max-type function: Characterization of structural stability, in: Parametric Optimiza- tion and Related Topics III, J. Guddat, J., H. Th. Jongen, and B. Kummer, and F. Nozicka, eds., Peter Lang publishing house, Frankfurt a.M., Bern, New York, pp. 519538.

×