SlideShare a Scribd company logo
1 of 41
OptimizationOptimization
COS 323COS 323
IngredientsIngredients
• Objective functionObjective function
• VariablesVariables
• ConstraintsConstraints
Find values of the variablesFind values of the variables
that minimize or maximize the objective functionthat minimize or maximize the objective function
while satisfying the constraintswhile satisfying the constraints
Different Kinds of OptimizationDifferent Kinds of Optimization
Figure from: Optimization Technology CenterFigure from: Optimization Technology Center
http://www-fp.mcs.anl.gov/otc/Guide/OptWeb/http://www-fp.mcs.anl.gov/otc/Guide/OptWeb/
Different Optimization TechniquesDifferent Optimization Techniques
• Algorithms have very different flavorAlgorithms have very different flavor
depending on specific problemdepending on specific problem
– Closed form vs. numerical vs. discreteClosed form vs. numerical vs. discrete
– Local vs. global minimaLocal vs. global minima
– Running times ranging from O(1) to NP-hardRunning times ranging from O(1) to NP-hard
• Today:Today:
– Focus on continuous numerical methodsFocus on continuous numerical methods
Optimization in 1-DOptimization in 1-D
• Look for analogies to bracketing in root-Look for analogies to bracketing in root-
findingfinding
• What does it mean toWhat does it mean to bracke tbracke t a minimum?a minimum?
((xxle ftle ft,, ff((xxle ftle ft))))
((xxrig htrig ht,, ff((xxrig htrig ht))))
((xxm idm id ,, ff((xxm idm id ))))
xxle ftle ft << xxm idm id << xxrig htrig ht
ff((xxm idm id ) <) < ff((xxle ftle ft))
ff((xxm idm id ) <) < ff((xxrig htrig ht))
xxle ftle ft << xxm idm id << xxrig htrig ht
ff((xxm idm id ) <) < ff((xxle ftle ft))
ff((xxm idm id ) <) < ff((xxrig htrig ht))
Optimization in 1-DOptimization in 1-D
• Once we have these properties, there isOnce we have these properties, there is atat
leastleast oneone locallocal minimum betweenminimum between xxleftleft andand xxrightright
• Establishing bracket initially:Establishing bracket initially:
– GivenGiven xxinitialinitial,, incre m e ntincre m e nt
– EvaluateEvaluate ff((xxinitialinitial),), ff((xxinitialinitial+ incre m e nt+ incre m e nt))
– If decreasing, step until find an increaseIf decreasing, step until find an increase
– Else, step in opposite direction until find an increaseElse, step in opposite direction until find an increase
– Grow increment at each stepGrow increment at each step
• For maximization: substitute –For maximization: substitute –ff forfor ff
Optimization in 1-DOptimization in 1-D
• Strategy: evaluate function at someStrategy: evaluate function at some xxnewnew
((xxle ftle ft,, ff((xxle ftle ft))))
((xxrig htrig ht,, ff((xxrig htrig ht))))
((xxm idm id ,, ff((xxm idm id ))))
((xxne wne w ,, ff((xxne wne w ))))
Optimization in 1-DOptimization in 1-D
• Strategy: evaluate function at someStrategy: evaluate function at some xxnewnew
– Here, new “bracket” points areHere, new “bracket” points are xxnewnew ,, xxmidmid ,, xxrightright
((xxle ftle ft,, ff((xxle ftle ft))))
((xxrig htrig ht,, ff((xxrig htrig ht))))
((xxm idm id ,, ff((xxm idm id ))))
((xxne wne w ,, ff((xxne wne w ))))
Optimization in 1-DOptimization in 1-D
• Strategy: evaluate function at someStrategy: evaluate function at some xxnewnew
– Here, new “bracket” points areHere, new “bracket” points are xxleftleft,, xxnewnew ,, xxmidmid
((xxle ftle ft,, ff((xxle ftle ft))))
((xxrig htrig ht,, ff((xxrig htrig ht))))
((xxm idm id ,, ff((xxm idm id ))))
((xxne wne w ,, ff((xxne wne w ))))
Optimization in 1-DOptimization in 1-D
• Unlike with root-finding, can’t alwaysUnlike with root-finding, can’t always
guarantee that interval will be reduced by aguarantee that interval will be reduced by a
factor of 2factor of 2
• Let’s find the optimal place forLet’s find the optimal place for xx midmid , relative to, relative to
left and right, that will guarantee same factorleft and right, that will guarantee same factor
of reduction regardless of outcomeof reduction regardless of outcome
Optimization in 1-DOptimization in 1-D
ifif ff((xxnewnew) <) < ff((xxmidmid ))
new interval =new interval = αα
elseelse
new interval = 1–new interval = 1–αα22
αα
αα22
Golden Section SearchGolden Section Search
• To assure same interval, wantTo assure same interval, want αα = 1–= 1–αα22
• So,So,
• This is the “golden ratio” = 0.618…This is the “golden ratio” = 0.618…
• So, interval decreases by 30% per iterationSo, interval decreases by 30% per iteration
– Line ar co nve rg e nceLine ar co nve rg e nce
ϕα =
−
=
2
15
Error ToleranceError Tolerance
• Around minimum, derivative = 0, soAround minimum, derivative = 0, so
• Rule of thumb: pointless to ask for moreRule of thumb: pointless to ask for more
accuracy than sqrt(accuracy than sqrt(εε ))
– Can use double precision if you want a single-Can use double precision if you want a single-
precision result (and/or have single-precision data)precision result (and/or have single-precision data)
ε
ε
~
machine)()()(
...)()()(
2
2
1
2
2
1
x
xxfxfxxf
xxfxfxxf
∆⇒
=∆′′=−∆+
+∆′′+=∆+
Faster 1-D OptimizationFaster 1-D Optimization
• Trade off super-linear convergence forTrade off super-linear convergence for
worse robustnessworse robustness
– Combine with Golden Section search for safetyCombine with Golden Section search for safety
• Usual bag of tricks:Usual bag of tricks:
– Fit parabola through 3 points, find minimumFit parabola through 3 points, find minimum
– Compute derivatives as well as positions, fit cubicCompute derivatives as well as positions, fit cubic
– UseUse se co ndse co nd derivatives: Newtonderivatives: Newton
Newton’s MethodNewton’s Method
Newton’s MethodNewton’s Method
Newton’s MethodNewton’s Method
Newton’s MethodNewton’s Method
Newton’s MethodNewton’s Method
• At each step:At each step:
• Requires 1Requires 1stst
and 2and 2ndnd
derivativesderivatives
• Quadratic convergenceQuadratic convergence
)(
)(
1
k
k
kk
xf
xf
xx
′′
′
−=+
Multi-Dimensional OptimizationMulti-Dimensional Optimization
• Important in many areasImportant in many areas
– Fitting a model to measured dataFitting a model to measured data
– Finding best design in some parameter spaceFinding best design in some parameter space
• Hard in generalHard in general
– Weird shapes: multiple extrema, saddles,Weird shapes: multiple extrema, saddles,
curved or elongated valleys, etc.curved or elongated valleys, etc.
– Can’t bracketCan’t bracket
• In general, easier than rootfindingIn general, easier than rootfinding
– Can always walk “downhill”Can always walk “downhill”
Newton’s Method inNewton’s Method in
Multiple DimensionsMultiple Dimensions
• Replace 1Replace 1stst
derivative with gradient,derivative with gradient,
22ndnd
derivative with Hessianderivative with Hessian








=








=∇
∂
∂
∂∂
∂
∂∂
∂
∂
∂
∂
∂
∂
∂
2
22
2
2
2
),(
y
f
yx
f
yx
f
x
f
y
f
x
f
H
f
yxf
Newton’s Method inNewton’s Method in
Multiple DimensionsMultiple Dimensions
• Replace 1Replace 1stst
derivative with gradient,derivative with gradient,
22ndnd
derivative with Hessianderivative with Hessian
• So,So,
• Tends to be extremely fragile unless functionTends to be extremely fragile unless function
very smooth and starting close to minimumvery smooth and starting close to minimum
)()(1
1 kkkk xfxHxx

∇−= −
+
Important classification of methodsImportant classification of methods
• UseUse function + gradient + Hessianfunction + gradient + Hessian (Newton)(Newton)
• UseUse function + gradientfunction + gradient (most descent methods)(most descent methods)
• UseUse function values onlyfunction values only (Nelder-Mead, called(Nelder-Mead, called
also “simplex”, or “amoeba” method)also “simplex”, or “amoeba” method)
Steepest Descent MethodsSteepest Descent Methods
• What if you can’t / don’t want toWhat if you can’t / don’t want to
use 2use 2ndnd
derivative?derivative?
• ““Quasi-Newton” methods estimate HessianQuasi-Newton” methods estimate Hessian
• Alternative: walk along (negative of)Alternative: walk along (negative of)
gradient…gradient…
– PerformPerform 1-D minimization1-D minimization along line passingalong line passing
through current point in the direction of thethrough current point in the direction of the
gradientgradient
– Once done, re-compute gradient, iterateOnce done, re-compute gradient, iterate
Problem With Steepest DescentProblem With Steepest Descent
Problem With Steepest DescentProblem With Steepest Descent
Conjugate Gradient MethodsConjugate Gradient Methods
• Idea: avoid “undoing”Idea: avoid “undoing”
minimization that’sminimization that’s
already been donealready been done
• Walk along directionWalk along direction
• Polak and RibierePolak and Ribiere
formula:formula:
kkkk dgd β+−= ++ 11
kk
kk
k
gg
gggk
T
1
T
)(1
−
=
++
β
Conjugate Gradient MethodsConjugate Gradient Methods
• Conjugate gradient implicitly obtainsConjugate gradient implicitly obtains
information about Hessianinformation about Hessian
• For quadratic function inFor quadratic function in nn dimensions, getsdimensions, gets
e xacte xact solution insolution in nn steps (ignoring roundoffsteps (ignoring roundoff
error)error)
• Works well in practice…Works well in practice…
Value-Only Methods in Multi-DimensionsValue-Only Methods in Multi-Dimensions
• If can’t evaluate gradients, life is hardIf can’t evaluate gradients, life is hard
• Can use approximate (numerically evaluated)Can use approximate (numerically evaluated)
gradients:gradients:














≈














=∇ −⋅+
−⋅+
−⋅+
∂
∂
∂
∂
∂
∂

δ
δ
δ
δ
δ
δ
)()(
)()(
)()(
3
2
1
3
2
1
)( xfexf
xfexf
xfexf
e
f
e
f
e
f
xf
Generic Optimization StrategiesGeneric Optimization Strategies
• Uniform sampling:Uniform sampling:
– Cost rises exponentially with # of dimensionsCost rises exponentially with # of dimensions
• Simulated annealing:Simulated annealing:
– Search in random directionsSearch in random directions
– Start with large steps, gradually decreaseStart with large steps, gradually decrease
– ““Annealing schedule” – how fast to cool?Annealing schedule” – how fast to cool?
Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)
• Keep track ofKeep track of nn+1 points in+1 points in nn dimensionsdimensions
– Vertices of aVertices of a sim ple xsim ple x (triangle in 2D(triangle in 2D
tetrahedron in 3D, etc.)tetrahedron in 3D, etc.)
• At each iteration: simplex can move,At each iteration: simplex can move,
expand, or contractexpand, or contract
– Sometimes known asSometimes known as am o e ba m e tho dam o e ba m e tho d ::
simplex “oozes” along the functionsimplex “oozes” along the function
Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)
• Basic operation:Basic operation: reflectionreflection
worst pointworst point
(highest function value)(highest function value)
location probed bylocation probed by
reflectionreflection stepstep
Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)
• If reflection resulted in best (lowest) value soIf reflection resulted in best (lowest) value so
far,far,
try antry an expansionexpansion
• Else, if reflection helped at all, keep itElse, if reflection helped at all, keep it
location probed bylocation probed by
expansionexpansion stepstep
Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)
• If reflection didn’t help (reflected point stillIf reflection didn’t help (reflected point still
worst) try aworst) try a contractioncontraction
location probed bylocation probed by
contrationcontration stepstep
Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)
• If all else failsIf all else fails shrinkshrink the simplex aroundthe simplex around
thethe be stbe st pointpoint
Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)
• Method fairly efficient at each iterationMethod fairly efficient at each iteration
(typically 1-2 function evaluations)(typically 1-2 function evaluations)
• Can takeCan take lo tslo ts of iterationsof iterations
• Somewhat flakey – sometimes needsSomewhat flakey – sometimes needs re startre start
after simplex collapses on itself, etc.after simplex collapses on itself, etc.
• Benefits: simple to implement, doesn’t needBenefits: simple to implement, doesn’t need
derivative, doesn’t care about functionderivative, doesn’t care about function
smoothness, etc.smoothness, etc.
Rosenbrock’s FunctionRosenbrock’s Function
• Designed specifically for testingDesigned specifically for testing
optimization techniquesoptimization techniques
• Curved, narrow valleyCurved, narrow valley
222
)1()(100),( xxyyxf −+−=
Constrained OptimizationConstrained Optimization
• Equality constraints: optimizeEquality constraints: optimize f(x)f(x)
subject tosubject to gg ii(x)(x)=0=0
• Method of Lagrange multipliers: convert to aMethod of Lagrange multipliers: convert to a
higher-dimensional problemhigher-dimensional problem
• Minimize w.r.t.Minimize w.r.t.∑+ )()( xgxf iiλ );( 11 knxx λλ 
Constrained OptimizationConstrained Optimization
• Inequality constraints are harder…Inequality constraints are harder…
• If objective function and constraints all linear,If objective function and constraints all linear,
this is “linear programming”this is “linear programming”
• Observation: minimum must lie at corner ofObservation: minimum must lie at corner of
region formed by constraintsregion formed by constraints
• Simplex method: move from vertex to vertex,Simplex method: move from vertex to vertex,
minimizing objective functionminimizing objective function
Constrained OptimizationConstrained Optimization
• General “nonlinear programming” hardGeneral “nonlinear programming” hard
• Algorithms for special cases (e.g. quadratic)Algorithms for special cases (e.g. quadratic)
Global OptimizationGlobal Optimization
• In general, can’t guarantee that you’ve foundIn general, can’t guarantee that you’ve found
global (rather than local) minimumglobal (rather than local) minimum
• Some heuristics:Some heuristics:
– Multi-start: try local optimization fromMulti-start: try local optimization from
several starting positionsseveral starting positions
– Very slow simulated annealingVery slow simulated annealing
– Use analytical methods (or graphing) to determineUse analytical methods (or graphing) to determine
behavior, guide methods to correct neighborhoodsbehavior, guide methods to correct neighborhoods

More Related Content

Viewers also liked

Teaching the Dynamics and Significance of Non-Newtonian Materials
Teaching the Dynamics and Significance of Non-Newtonian MaterialsTeaching the Dynamics and Significance of Non-Newtonian Materials
Teaching the Dynamics and Significance of Non-Newtonian MaterialsMarco Morales
 
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPUA comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPUAlex Camargo
 
Energy Minimization Using Gromacs
Energy Minimization Using GromacsEnergy Minimization Using Gromacs
Energy Minimization Using GromacsRajendra K Labala
 
Seminar energy minimization mettthod
Seminar energy minimization mettthodSeminar energy minimization mettthod
Seminar energy minimization mettthodPavan Badgujar
 
energy minimization
energy minimizationenergy minimization
energy minimizationpradeep kore
 
BIOS 203 Lecture 3: Classical molecular dynamics
BIOS 203 Lecture 3: Classical molecular dynamicsBIOS 203 Lecture 3: Classical molecular dynamics
BIOS 203 Lecture 3: Classical molecular dynamicsbios203
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and SimulationsAbhilash Kannan
 
Lecture 2-cs648
Lecture 2-cs648Lecture 2-cs648
Lecture 2-cs648Rajiv Omar
 

Viewers also liked (13)

Function Approx2009
Function Approx2009Function Approx2009
Function Approx2009
 
Teaching the Dynamics and Significance of Non-Newtonian Materials
Teaching the Dynamics and Significance of Non-Newtonian MaterialsTeaching the Dynamics and Significance of Non-Newtonian Materials
Teaching the Dynamics and Significance of Non-Newtonian Materials
 
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPUA comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
 
Energy Minimization Using Gromacs
Energy Minimization Using GromacsEnergy Minimization Using Gromacs
Energy Minimization Using Gromacs
 
Seminar energy minimization mettthod
Seminar energy minimization mettthodSeminar energy minimization mettthod
Seminar energy minimization mettthod
 
CFD
CFDCFD
CFD
 
energy minimization
energy minimizationenergy minimization
energy minimization
 
BIOS 203 Lecture 3: Classical molecular dynamics
BIOS 203 Lecture 3: Classical molecular dynamicsBIOS 203 Lecture 3: Classical molecular dynamics
BIOS 203 Lecture 3: Classical molecular dynamics
 
Energy minimization
Energy minimizationEnergy minimization
Energy minimization
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and Simulations
 
Knapsack Problem
Knapsack ProblemKnapsack Problem
Knapsack Problem
 
Lecture 2-cs648
Lecture 2-cs648Lecture 2-cs648
Lecture 2-cs648
 

Similar to Optimization

cos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.pptcos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.pptdevesh604174
 
Backdoors to Satisfiability
Backdoors to SatisfiabilityBackdoors to Satisfiability
Backdoors to Satisfiabilitymsramanujan
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lectureMuhammadAhmedShah2
 
Bounded Model Checking
Bounded Model CheckingBounded Model Checking
Bounded Model CheckingIlham Amezzane
 
1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vector1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vectorDr Fereidoun Dejahang
 
05-constraint-satisfaction-problems-(us).ppt
05-constraint-satisfaction-problems-(us).ppt05-constraint-satisfaction-problems-(us).ppt
05-constraint-satisfaction-problems-(us).pptsky54012
 
Dynamic programming class 16
Dynamic programming class 16Dynamic programming class 16
Dynamic programming class 16Kumar
 
Playing Go with Clojure
Playing Go with ClojurePlaying Go with Clojure
Playing Go with Clojureztellman
 
CSP UNIT 2 AIML.ppt
CSP UNIT 2 AIML.pptCSP UNIT 2 AIML.ppt
CSP UNIT 2 AIML.pptssuser6e2b26
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012Ted Dunning
 
PSO and Its application in Engineering
PSO and Its application in EngineeringPSO and Its application in Engineering
PSO and Its application in EngineeringPrince Jain
 
Algorithm For optimization.pptx
Algorithm For optimization.pptxAlgorithm For optimization.pptx
Algorithm For optimization.pptxKARISHMA JAIN
 
LINEAR PROGRAMMING
LINEAR PROGRAMMINGLINEAR PROGRAMMING
LINEAR PROGRAMMINGrashi9
 
cutnpeel_wsdm2022_slide.pdf
cutnpeel_wsdm2022_slide.pdfcutnpeel_wsdm2022_slide.pdf
cutnpeel_wsdm2022_slide.pdfHyeonjeongShin6
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Maninda Edirisooriya
 

Similar to Optimization (20)

cos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.pptcos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.ppt
 
Backdoors to Satisfiability
Backdoors to SatisfiabilityBackdoors to Satisfiability
Backdoors to Satisfiability
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lecture
 
Optim_methods.pdf
Optim_methods.pdfOptim_methods.pdf
Optim_methods.pdf
 
Bounded Model Checking
Bounded Model CheckingBounded Model Checking
Bounded Model Checking
 
1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vector1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vector
 
Approximation Algorithms TSP
Approximation Algorithms   TSPApproximation Algorithms   TSP
Approximation Algorithms TSP
 
05-constraint-satisfaction-problems-(us).ppt
05-constraint-satisfaction-problems-(us).ppt05-constraint-satisfaction-problems-(us).ppt
05-constraint-satisfaction-problems-(us).ppt
 
Dynamic programming class 16
Dynamic programming class 16Dynamic programming class 16
Dynamic programming class 16
 
Playing Go with Clojure
Playing Go with ClojurePlaying Go with Clojure
Playing Go with Clojure
 
Lec10 alignment
Lec10 alignmentLec10 alignment
Lec10 alignment
 
CSP UNIT 2 AIML.ppt
CSP UNIT 2 AIML.pptCSP UNIT 2 AIML.ppt
CSP UNIT 2 AIML.ppt
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
PSO and Its application in Engineering
PSO and Its application in EngineeringPSO and Its application in Engineering
PSO and Its application in Engineering
 
Algorithm For optimization.pptx
Algorithm For optimization.pptxAlgorithm For optimization.pptx
Algorithm For optimization.pptx
 
LINEAR PROGRAMMING
LINEAR PROGRAMMINGLINEAR PROGRAMMING
LINEAR PROGRAMMING
 
cutnpeel_wsdm2022_slide.pdf
cutnpeel_wsdm2022_slide.pdfcutnpeel_wsdm2022_slide.pdf
cutnpeel_wsdm2022_slide.pdf
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
 
CI_L01_Optimization.pdf
CI_L01_Optimization.pdfCI_L01_Optimization.pdf
CI_L01_Optimization.pdf
 
5163147.ppt
5163147.ppt5163147.ppt
5163147.ppt
 

Optimization

  • 2. IngredientsIngredients • Objective functionObjective function • VariablesVariables • ConstraintsConstraints Find values of the variablesFind values of the variables that minimize or maximize the objective functionthat minimize or maximize the objective function while satisfying the constraintswhile satisfying the constraints
  • 3. Different Kinds of OptimizationDifferent Kinds of Optimization Figure from: Optimization Technology CenterFigure from: Optimization Technology Center http://www-fp.mcs.anl.gov/otc/Guide/OptWeb/http://www-fp.mcs.anl.gov/otc/Guide/OptWeb/
  • 4. Different Optimization TechniquesDifferent Optimization Techniques • Algorithms have very different flavorAlgorithms have very different flavor depending on specific problemdepending on specific problem – Closed form vs. numerical vs. discreteClosed form vs. numerical vs. discrete – Local vs. global minimaLocal vs. global minima – Running times ranging from O(1) to NP-hardRunning times ranging from O(1) to NP-hard • Today:Today: – Focus on continuous numerical methodsFocus on continuous numerical methods
  • 5. Optimization in 1-DOptimization in 1-D • Look for analogies to bracketing in root-Look for analogies to bracketing in root- findingfinding • What does it mean toWhat does it mean to bracke tbracke t a minimum?a minimum? ((xxle ftle ft,, ff((xxle ftle ft)))) ((xxrig htrig ht,, ff((xxrig htrig ht)))) ((xxm idm id ,, ff((xxm idm id )))) xxle ftle ft << xxm idm id << xxrig htrig ht ff((xxm idm id ) <) < ff((xxle ftle ft)) ff((xxm idm id ) <) < ff((xxrig htrig ht)) xxle ftle ft << xxm idm id << xxrig htrig ht ff((xxm idm id ) <) < ff((xxle ftle ft)) ff((xxm idm id ) <) < ff((xxrig htrig ht))
  • 6. Optimization in 1-DOptimization in 1-D • Once we have these properties, there isOnce we have these properties, there is atat leastleast oneone locallocal minimum betweenminimum between xxleftleft andand xxrightright • Establishing bracket initially:Establishing bracket initially: – GivenGiven xxinitialinitial,, incre m e ntincre m e nt – EvaluateEvaluate ff((xxinitialinitial),), ff((xxinitialinitial+ incre m e nt+ incre m e nt)) – If decreasing, step until find an increaseIf decreasing, step until find an increase – Else, step in opposite direction until find an increaseElse, step in opposite direction until find an increase – Grow increment at each stepGrow increment at each step • For maximization: substitute –For maximization: substitute –ff forfor ff
  • 7. Optimization in 1-DOptimization in 1-D • Strategy: evaluate function at someStrategy: evaluate function at some xxnewnew ((xxle ftle ft,, ff((xxle ftle ft)))) ((xxrig htrig ht,, ff((xxrig htrig ht)))) ((xxm idm id ,, ff((xxm idm id )))) ((xxne wne w ,, ff((xxne wne w ))))
  • 8. Optimization in 1-DOptimization in 1-D • Strategy: evaluate function at someStrategy: evaluate function at some xxnewnew – Here, new “bracket” points areHere, new “bracket” points are xxnewnew ,, xxmidmid ,, xxrightright ((xxle ftle ft,, ff((xxle ftle ft)))) ((xxrig htrig ht,, ff((xxrig htrig ht)))) ((xxm idm id ,, ff((xxm idm id )))) ((xxne wne w ,, ff((xxne wne w ))))
  • 9. Optimization in 1-DOptimization in 1-D • Strategy: evaluate function at someStrategy: evaluate function at some xxnewnew – Here, new “bracket” points areHere, new “bracket” points are xxleftleft,, xxnewnew ,, xxmidmid ((xxle ftle ft,, ff((xxle ftle ft)))) ((xxrig htrig ht,, ff((xxrig htrig ht)))) ((xxm idm id ,, ff((xxm idm id )))) ((xxne wne w ,, ff((xxne wne w ))))
  • 10. Optimization in 1-DOptimization in 1-D • Unlike with root-finding, can’t alwaysUnlike with root-finding, can’t always guarantee that interval will be reduced by aguarantee that interval will be reduced by a factor of 2factor of 2 • Let’s find the optimal place forLet’s find the optimal place for xx midmid , relative to, relative to left and right, that will guarantee same factorleft and right, that will guarantee same factor of reduction regardless of outcomeof reduction regardless of outcome
  • 11. Optimization in 1-DOptimization in 1-D ifif ff((xxnewnew) <) < ff((xxmidmid )) new interval =new interval = αα elseelse new interval = 1–new interval = 1–αα22 αα αα22
  • 12. Golden Section SearchGolden Section Search • To assure same interval, wantTo assure same interval, want αα = 1–= 1–αα22 • So,So, • This is the “golden ratio” = 0.618…This is the “golden ratio” = 0.618… • So, interval decreases by 30% per iterationSo, interval decreases by 30% per iteration – Line ar co nve rg e nceLine ar co nve rg e nce ϕα = − = 2 15
  • 13. Error ToleranceError Tolerance • Around minimum, derivative = 0, soAround minimum, derivative = 0, so • Rule of thumb: pointless to ask for moreRule of thumb: pointless to ask for more accuracy than sqrt(accuracy than sqrt(εε )) – Can use double precision if you want a single-Can use double precision if you want a single- precision result (and/or have single-precision data)precision result (and/or have single-precision data) ε ε ~ machine)()()( ...)()()( 2 2 1 2 2 1 x xxfxfxxf xxfxfxxf ∆⇒ =∆′′=−∆+ +∆′′+=∆+
  • 14. Faster 1-D OptimizationFaster 1-D Optimization • Trade off super-linear convergence forTrade off super-linear convergence for worse robustnessworse robustness – Combine with Golden Section search for safetyCombine with Golden Section search for safety • Usual bag of tricks:Usual bag of tricks: – Fit parabola through 3 points, find minimumFit parabola through 3 points, find minimum – Compute derivatives as well as positions, fit cubicCompute derivatives as well as positions, fit cubic – UseUse se co ndse co nd derivatives: Newtonderivatives: Newton
  • 19. Newton’s MethodNewton’s Method • At each step:At each step: • Requires 1Requires 1stst and 2and 2ndnd derivativesderivatives • Quadratic convergenceQuadratic convergence )( )( 1 k k kk xf xf xx ′′ ′ −=+
  • 20. Multi-Dimensional OptimizationMulti-Dimensional Optimization • Important in many areasImportant in many areas – Fitting a model to measured dataFitting a model to measured data – Finding best design in some parameter spaceFinding best design in some parameter space • Hard in generalHard in general – Weird shapes: multiple extrema, saddles,Weird shapes: multiple extrema, saddles, curved or elongated valleys, etc.curved or elongated valleys, etc. – Can’t bracketCan’t bracket • In general, easier than rootfindingIn general, easier than rootfinding – Can always walk “downhill”Can always walk “downhill”
  • 21. Newton’s Method inNewton’s Method in Multiple DimensionsMultiple Dimensions • Replace 1Replace 1stst derivative with gradient,derivative with gradient, 22ndnd derivative with Hessianderivative with Hessian         =         =∇ ∂ ∂ ∂∂ ∂ ∂∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ 2 22 2 2 2 ),( y f yx f yx f x f y f x f H f yxf
  • 22. Newton’s Method inNewton’s Method in Multiple DimensionsMultiple Dimensions • Replace 1Replace 1stst derivative with gradient,derivative with gradient, 22ndnd derivative with Hessianderivative with Hessian • So,So, • Tends to be extremely fragile unless functionTends to be extremely fragile unless function very smooth and starting close to minimumvery smooth and starting close to minimum )()(1 1 kkkk xfxHxx  ∇−= − +
  • 23. Important classification of methodsImportant classification of methods • UseUse function + gradient + Hessianfunction + gradient + Hessian (Newton)(Newton) • UseUse function + gradientfunction + gradient (most descent methods)(most descent methods) • UseUse function values onlyfunction values only (Nelder-Mead, called(Nelder-Mead, called also “simplex”, or “amoeba” method)also “simplex”, or “amoeba” method)
  • 24. Steepest Descent MethodsSteepest Descent Methods • What if you can’t / don’t want toWhat if you can’t / don’t want to use 2use 2ndnd derivative?derivative? • ““Quasi-Newton” methods estimate HessianQuasi-Newton” methods estimate Hessian • Alternative: walk along (negative of)Alternative: walk along (negative of) gradient…gradient… – PerformPerform 1-D minimization1-D minimization along line passingalong line passing through current point in the direction of thethrough current point in the direction of the gradientgradient – Once done, re-compute gradient, iterateOnce done, re-compute gradient, iterate
  • 25. Problem With Steepest DescentProblem With Steepest Descent
  • 26. Problem With Steepest DescentProblem With Steepest Descent
  • 27. Conjugate Gradient MethodsConjugate Gradient Methods • Idea: avoid “undoing”Idea: avoid “undoing” minimization that’sminimization that’s already been donealready been done • Walk along directionWalk along direction • Polak and RibierePolak and Ribiere formula:formula: kkkk dgd β+−= ++ 11 kk kk k gg gggk T 1 T )(1 − = ++ β
  • 28. Conjugate Gradient MethodsConjugate Gradient Methods • Conjugate gradient implicitly obtainsConjugate gradient implicitly obtains information about Hessianinformation about Hessian • For quadratic function inFor quadratic function in nn dimensions, getsdimensions, gets e xacte xact solution insolution in nn steps (ignoring roundoffsteps (ignoring roundoff error)error) • Works well in practice…Works well in practice…
  • 29. Value-Only Methods in Multi-DimensionsValue-Only Methods in Multi-Dimensions • If can’t evaluate gradients, life is hardIf can’t evaluate gradients, life is hard • Can use approximate (numerically evaluated)Can use approximate (numerically evaluated) gradients:gradients:               ≈               =∇ −⋅+ −⋅+ −⋅+ ∂ ∂ ∂ ∂ ∂ ∂  δ δ δ δ δ δ )()( )()( )()( 3 2 1 3 2 1 )( xfexf xfexf xfexf e f e f e f xf
  • 30. Generic Optimization StrategiesGeneric Optimization Strategies • Uniform sampling:Uniform sampling: – Cost rises exponentially with # of dimensionsCost rises exponentially with # of dimensions • Simulated annealing:Simulated annealing: – Search in random directionsSearch in random directions – Start with large steps, gradually decreaseStart with large steps, gradually decrease – ““Annealing schedule” – how fast to cool?Annealing schedule” – how fast to cool?
  • 31. Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead) • Keep track ofKeep track of nn+1 points in+1 points in nn dimensionsdimensions – Vertices of aVertices of a sim ple xsim ple x (triangle in 2D(triangle in 2D tetrahedron in 3D, etc.)tetrahedron in 3D, etc.) • At each iteration: simplex can move,At each iteration: simplex can move, expand, or contractexpand, or contract – Sometimes known asSometimes known as am o e ba m e tho dam o e ba m e tho d :: simplex “oozes” along the functionsimplex “oozes” along the function
  • 32. Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead) • Basic operation:Basic operation: reflectionreflection worst pointworst point (highest function value)(highest function value) location probed bylocation probed by reflectionreflection stepstep
  • 33. Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead) • If reflection resulted in best (lowest) value soIf reflection resulted in best (lowest) value so far,far, try antry an expansionexpansion • Else, if reflection helped at all, keep itElse, if reflection helped at all, keep it location probed bylocation probed by expansionexpansion stepstep
  • 34. Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead) • If reflection didn’t help (reflected point stillIf reflection didn’t help (reflected point still worst) try aworst) try a contractioncontraction location probed bylocation probed by contrationcontration stepstep
  • 35. Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead) • If all else failsIf all else fails shrinkshrink the simplex aroundthe simplex around thethe be stbe st pointpoint
  • 36. Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead) • Method fairly efficient at each iterationMethod fairly efficient at each iteration (typically 1-2 function evaluations)(typically 1-2 function evaluations) • Can takeCan take lo tslo ts of iterationsof iterations • Somewhat flakey – sometimes needsSomewhat flakey – sometimes needs re startre start after simplex collapses on itself, etc.after simplex collapses on itself, etc. • Benefits: simple to implement, doesn’t needBenefits: simple to implement, doesn’t need derivative, doesn’t care about functionderivative, doesn’t care about function smoothness, etc.smoothness, etc.
  • 37. Rosenbrock’s FunctionRosenbrock’s Function • Designed specifically for testingDesigned specifically for testing optimization techniquesoptimization techniques • Curved, narrow valleyCurved, narrow valley 222 )1()(100),( xxyyxf −+−=
  • 38. Constrained OptimizationConstrained Optimization • Equality constraints: optimizeEquality constraints: optimize f(x)f(x) subject tosubject to gg ii(x)(x)=0=0 • Method of Lagrange multipliers: convert to aMethod of Lagrange multipliers: convert to a higher-dimensional problemhigher-dimensional problem • Minimize w.r.t.Minimize w.r.t.∑+ )()( xgxf iiλ );( 11 knxx λλ 
  • 39. Constrained OptimizationConstrained Optimization • Inequality constraints are harder…Inequality constraints are harder… • If objective function and constraints all linear,If objective function and constraints all linear, this is “linear programming”this is “linear programming” • Observation: minimum must lie at corner ofObservation: minimum must lie at corner of region formed by constraintsregion formed by constraints • Simplex method: move from vertex to vertex,Simplex method: move from vertex to vertex, minimizing objective functionminimizing objective function
  • 40. Constrained OptimizationConstrained Optimization • General “nonlinear programming” hardGeneral “nonlinear programming” hard • Algorithms for special cases (e.g. quadratic)Algorithms for special cases (e.g. quadratic)
  • 41. Global OptimizationGlobal Optimization • In general, can’t guarantee that you’ve foundIn general, can’t guarantee that you’ve found global (rather than local) minimumglobal (rather than local) minimum • Some heuristics:Some heuristics: – Multi-start: try local optimization fromMulti-start: try local optimization from several starting positionsseveral starting positions – Very slow simulated annealingVery slow simulated annealing – Use analytical methods (or graphing) to determineUse analytical methods (or graphing) to determine behavior, guide methods to correct neighborhoodsbehavior, guide methods to correct neighborhoods