SlideShare a Scribd company logo
1 of 84
Download to read offline
Optimization Methods
in Engineering Design
Day-5
Course Materials
β€’ Arora, Introduction to Optimum Design, 3e, Elsevier,
(https://www.researchgate.net/publication/273120102_Introductio
n_to_Optimum_design)
β€’ Parkinson, Optimization Methods for Engineering Design, Brigham
Young University
(http://apmonitor.com/me575/index.php/Main/BookChapters)
β€’ Iqbal, Fundamental Engineering Optimization Methods, BookBoon
(https://bookboon.com/en/fundamental-engineering-optimization-
methods-ebook)
Numerical Optimization
β€’ Consider an unconstrained NP problem: min
𝒙
𝑓 𝒙
β€’ Use an iterative method to solve the problem: π’™π‘˜+1 = π’™π‘˜ + π›Όπ‘˜π’…π‘˜,
where π’…π‘˜
is a search direction and π›Όπ‘˜ is the step size, such that the
function value decreases at each step, i.e., 𝑓 π’™π‘˜+1
< 𝑓 π’™π‘˜
β€’ We expect lim
π‘˜β†’βˆž
π’™π‘˜ = π’™βˆ—
β€’ The general iterative method is a two-step process:
– Finding a suitable search direction π’…π‘˜
along which the function
value locally decreases and any constraints are obeyed.
– Performing line search along π’…π‘˜
to find π’™π‘˜+1
such that 𝑓 π’™π‘˜+1
attains its minimum value.
The Iterative Method
β€’ Iterative algorithm:
1. Initialize: chose 𝒙0
2. Check termination: 𝛻𝑓 π’™π‘˜
β‰… 0
3. Find a suitable search direction π’…π‘˜
,
that obeys the descent condition:
𝛻𝑓 π’™π‘˜ 𝑇
π’…π‘˜ < 0
4. Search along π’…π‘˜
to find where
𝑓 π’™π‘˜+1
attains minimum value
(line search problem)
5. Return to step 2
The Line Search Problem
β€’ Assuming a suitable search direction π’…π‘˜ has been determined, we
seek to determine a step length π›Όπ‘˜, that minimizes 𝑓 π’™π‘˜+1 .
β€’ Assuming π’™π‘˜
and π’…π‘˜
are known, the projected function value along
π’…π‘˜ is expressed as:
𝑓 π’™π‘˜
+ π›Όπ‘˜π’…π‘˜
= 𝑓 π’™π‘˜
+ π›Όπ’…π‘˜
= 𝑓(𝛼)
β€’ The line search problem to choose 𝛼 to minimize 𝑓 π’™π‘˜+1 along π’…π‘˜
is defined as:
min
𝛼
𝑓(𝛼) = 𝑓 π’™π‘˜
+ Ξ±π’…π‘˜
β€’ Assuming that a solution exists, it is found by setting 𝑓′ 𝛼 = 0.
Example: Quadratic Function
β€’ Consider minimizing a quadratic function:
𝑓 𝒙 = 1
2 𝒙𝑇𝑨𝒙 βˆ’ 𝒃𝑇𝒙, 𝛻𝑓 = 𝑨𝒙 βˆ’ 𝒃
β€’ Given a descent direction 𝒅, the line search problem is defined as:
min
𝛼
𝑓(𝛼) = π’™π‘˜
+ 𝛼𝒅
𝑇
𝑨 π’™π‘˜
+ 𝛼𝒅 βˆ’ 𝒃𝑇
π’™π‘˜
+ 𝛼𝒅
β€’ A solution is found by setting 𝑓′
𝛼 = 0, where
𝑓′ 𝛼 = 𝒅𝑇𝑨 π’™π‘˜ + 𝛼𝒅 βˆ’ 𝒅𝑇𝒃 = 0
𝛼 = βˆ’
𝒅𝑇
π‘¨π’™π‘˜
βˆ’ 𝒃
𝒅𝑇𝑨𝒅
= βˆ’
𝛻𝑓(π’™π‘˜
)𝑇
𝒅
𝒅𝑇𝑨𝒅
β€’ Finally, π’™π‘˜+1 = π’™π‘˜ + 𝛼𝒅.
Computer Methods for Line Search Problem
β€’ Interval reduction methods
– Golden search
– Fibonacci search
β€’ Approximate search methods
– Arjimo’s rule
– Quadrature curve fitting
Interval Reduction Methods
β€’ The interval reduction methods find the minimum of a unimodal
function in two steps:
– Bracketing the minimum to an interval
– Reducing the interval to desired accuracy
β€’ The bracketing step aims to find a three-point pattern, such that for
π‘₯1, π‘₯2, π‘₯3, 𝑓 π‘₯1 β‰₯ 𝑓 π‘₯2 < 𝑓 π‘₯3 .
Fibonacci’s Method
β€’ The Fibonacci’s method uses Fibonacci numbers to achieve
maximum interval reduction in a given number of steps.
β€’ The Fibonacci number sequence is generated as:
𝐹0 = 𝐹1 = 1, 𝐹𝑖 = πΉπ‘–βˆ’1 + πΉπ‘–βˆ’2, 𝑖 β‰₯ 2.
β€’ The properties of Fibonacci numbers include:
– They achieve the golden ratio 𝜏 = lim
π‘›β†’βˆž
πΉπ‘›βˆ’1
𝐹𝑛
=
5βˆ’1
2
β‰… 0.618034
– The number of interval reductions 𝑛 required to achieve a desired
accuracy πœ€ (where 1/𝐹𝑛 < πœ€) is specified in advance.
– For given 𝐼1 and 𝑛, 𝐼2 =
πΉπ‘›βˆ’1
𝐹𝑛
𝐼1, 𝐼3 = 𝐼1 βˆ’ 𝐼2, 𝐼4 = 𝐼2 βˆ’ 𝐼3, etc.
The Golden Section Method
β€’ The golden section method uses the golden ratio: 𝜏 = 0.618034.
β€’ The golden section algorithm is given as:
1. Initialize: specify π‘₯1, π‘₯4 𝐼1 = π‘₯4 βˆ’ π‘₯1 , πœ€, 𝑛: πœπ‘› <
πœ€
𝐼1
2. Compute π‘₯2 = 𝜏π‘₯1 + 1 βˆ’ 𝜏 π‘₯4, evaluate 𝑓2
3. For 𝑖 = 1, … , 𝑛 βˆ’ 1
Compute π‘₯3 = 1 βˆ’ 𝜏 π‘₯1 + 𝜏π‘₯4, evaluate 𝑓3; if 𝑓2 < 𝑓3, set
π‘₯4 ← π‘₯1, π‘₯1 ← π‘₯3; else set π‘₯1 ← π‘₯2, π‘₯2 ← π‘₯3, 𝑓2 ← 𝑓3
Approximate Search Methods
β€’ Consider the line search problem: min
𝛼
𝑓(𝛼) = 𝑓 π’™π‘˜ + Ξ±π’…π‘˜
β€’ Sufficient Descent Condition. The sufficient descent condition guards
against π’…π‘˜ becoming too close to 𝛻𝑓 π’™π‘˜ . The condition is stated as:
𝛻𝑓 π’™π‘˜ 𝑇
π’…π‘˜
< βˆ’π‘ 𝛻𝑓 π’™π‘˜ 2
, 𝑐 > 0
β€’ Sufficient Decrease Condition. The sufficient decrease condition ensures
a nontrivial reduction in the function value. The condition is stated as:
𝑓 π’™π‘˜ + π›Όπ’…π‘˜ βˆ’ 𝑓 π’™π‘˜ ≀ πœ‡ 𝛼 𝛻𝑓 π’™π‘˜ 𝑇
π’…π‘˜, 0 < πœ‡ < 1
β€’ Curvature Condition. The curvature condition guards against 𝛼 becoming
too small. The condition is stated as:
𝑓 π’™π‘˜
+ π›Όπ’…π‘˜ 𝑇
π’…π‘˜
β‰₯ 𝑓 π’™π‘˜
+ πœ‚ 𝛻𝑓 π’™π‘˜ 𝑇
π’…π‘˜
, 0 < πœ‡ < πœ‚ < 1
Approximate Line Search
β€’ Strong Wolfe Conditions. The strong Wolfe conditions commonly
used by all line search algorithms include:
1. The sufficient decrease condition (Arjimo’s rule):
𝑓 𝛼 ≀ 𝑓 0 + πœ‡π›Όπ‘“β€²
(0), 0 < πœ‡ < 1
2. Strong curvature condition:
𝑓′
𝛼 ≀ πœ‚ 𝑓′
0 , 0 < πœ‡ ≀ πœ‚ < 1
Approximate Line Search
β€’ The approximate line search includes two steps:
– Bracketing the minimum
– Estimating the minimum
β€’ Bracketing the Minimum. In the bracketing step we seek an interval
𝛼, 𝛼 such that 𝑓′
𝛼 < 0 and 𝑓′
𝛼 > 0.
– Since for any descent direction, 𝑓′
0 < 0, therefore, 𝛼 = 0 serves
as a lower bound on 𝛼. To find an upper bound, gradually increase
𝛼, e.g., 𝛼 = 1,2, …,
– Assume that for some 𝛼𝑖 > 0, we get 𝑓′
𝛼𝑖 < 0 and 𝑓′
𝛼𝑖+1 > 0;
then, 𝛼𝑖 serves as an upper bound.
Approximate Line Search
β€’ Estimating the Minimum. Once the minimum has been bracketed
to a small interval, a quadratic or cubic polynomial approximation is
used to find the minimizer.
β€’ If the polynomial minimizer 𝛼 satisfies strong Wolfe’s conditions for
the desired πœ‡ and πœ‚ values (say πœ‡ = 0.2, πœ‚ = 0.5), it is taken as the
function minimizer.
β€’ Otherwise, 𝛼 is used to replace one of the 𝛼 or 𝛼, and the
polynomial approximation step repeated.
Quadratic Curve Fitting
β€’ Assuming that the interval 𝛼𝑙, 𝛼𝑒 contains the minimum of a
unimodal function, 𝑓 𝛼 , its quadratic approximation, given as:
π‘ž 𝛼 = π‘Ž0 + π‘Ž1𝛼 + π‘Ž2𝛼2
, is obtained using three points
𝛼𝑙, π›Όπ‘š, 𝛼𝑒 , where the mid-point may be used for π›Όπ‘š
The quadratic coefficients {π‘Ž0, π‘Ž1, π‘Ž2} are solved as:
π‘Ž2 =
1
π›Όπ‘’βˆ’π›Όπ‘š
𝑓 𝛼𝑒 βˆ’π‘“ 𝛼𝑙
π›Όπ‘’βˆ’π›Όπ‘™
βˆ’
𝑓 π›Όπ‘š βˆ’π‘“ 𝛼𝑙
π›Όπ‘šβˆ’π›Όπ‘™
π‘Ž1 =
1
π›Όπ‘šβˆ’π›Όπ‘™
𝑓 π›Όπ‘š βˆ’ 𝑓 𝛼𝑙 βˆ’ π‘Ž2(𝛼𝑙 + π›Όπ‘š)
π‘Ž0 = 𝑓(𝛼𝑙) βˆ’ π‘Ž1𝛼𝑙 βˆ’ π‘Ž2𝛼𝑙
2
Then, the minimum is given as: π›Όπ‘šπ‘–π‘› = βˆ’
π‘Ž1
2π‘Ž2
Example: Approximate Search
β€’ Let 𝑓 𝛼 = π‘’βˆ’π›Ό + 𝛼2, 𝑓′ 𝛼 = 2𝛼 βˆ’ π‘’βˆ’π›Ό, 𝑓 0 = 1, 𝑓′ 0 = βˆ’1.
Let πœ‡ = 0.2, and try 𝛼 = 0.1, 0.2, …, to bracket the minimum.
β€’ From the sufficient decrease condition, the minimum is bracketed
in the interval: [0, 0.5]
β€’ Using quadratic approximation, the minimum is found as:
π‘₯βˆ—
= 0.3531
The exact solution is given as: π›Όπ‘šπ‘–π‘› = 0.3517
β€’ The Matlab commands are:
Define the function:
f=@(x) x.*x+exp(-x);
mu=0.2; al=0:.1:1;
Example: Approximate Search
β€’ Bracketing the minimum:
f1=feval(f,al)
1.0000 0.9148 0.8587 0.8308 0.8303 0.8565 0.9088 0.9866
1.0893 1.2166 1.3679
>> f2=f(0)-mu*al
1.0000 0.9800 0.9600 0.9400 0.9200 0.9000 0.8800 0.8600
0.8400 0.8200 0.8000
>> idx=find(f1<=f2)
β€’ Quadratic approximation to find the minimum:
al=0; am=0.25; au=0.5;
a2 = ((f(au)-f(al))/(au-al)-(f(am)-f(al))/(am-al))/(au-am);
a1 = (f(am)-f(al))/(am-al)-a2*(al+am);
xmin = -a1/a2/2 % 0.3531
Computer Methods for Finding the Search Direction
β€’ Gradient based methods
– Steepest descent method
– Conjugate gradient method
– Quasi Newton methods
β€’ Hessian based methods
– Newton’s method
– Trust region methods
Steepest Descent Method
β€’ The steepest descent method determines the search direction as:
π’…π‘˜ = βˆ’π›»π‘“(π’™π‘˜),
β€’ The update rule is given as: π’™π‘˜+1
= π’™π‘˜
βˆ’ π›Όπ‘˜ βˆ™ 𝛻𝑓(π’™π‘˜
)
where π›Όπ‘˜ is determined by minimizing 𝑓(π’™π‘˜+1) along π’…π‘˜
β€’ Example: quadratic function
𝑓 𝒙 =
1
2
𝒙𝑇
𝑨𝒙 βˆ’ 𝒃𝑇
𝒙, 𝛻𝑓 = 𝑨𝒙 βˆ’ 𝒃
Then, π’™π‘˜+1
= π’™π‘˜
βˆ’ 𝛼 βˆ™ 𝛻𝑓 π’™π‘˜
; 𝛼 =
𝛻 𝑓 π’™π‘˜ 𝑇
𝛻 𝑓 π’™π‘˜
𝛻 𝑓 π’™π‘˜ 𝑇
𝐀𝛻 𝑓 π’™π‘˜
Define π’“π‘˜ = 𝒃 βˆ’ π‘¨π’™π‘˜
Then, π’™π‘˜+1
= π’™π‘˜
+ π›Όπ‘˜π’“π‘˜; π›Όπ‘˜ =
π’“π‘˜
𝑇
π’“π‘˜
π’“π‘˜
π‘‡π΄π’“π‘˜
Steepest Descent Algorithm
β€’ Initialize: choose 𝒙0
β€’ For π‘˜ = 0,1,2, …
– Compute 𝛻𝑓(π’™π‘˜
)
– Check convergence: if 𝛻𝑓(π’™π‘˜
) < πœ–, stop.
– Set π’…π‘˜ = βˆ’π›»π‘“(π’™π‘˜)
– Line search problem: Find min
𝛼β‰₯0
𝑓 π’™π‘˜ + π›Όπ’…π‘˜
– Set π’™π‘˜+1
= π’™π‘˜
+ π›Όπ’…π‘˜
.
Example: Steepest Descent
β€’ Consider min
𝒙
𝑓 𝒙 = 0.1π‘₯1
2
+ π‘₯2
2
,
𝛻𝑓 𝒙 =
0.2π‘₯1
2π‘₯2
, 𝛻2𝑓 π‘₯ =
0.1 0
0 1
; let 𝒙0 =
5
1
, then, 𝑓 𝒙0 = 3.5,
𝑑1 = βˆ’π›»π‘“ 𝒙0 =
βˆ’1
βˆ’2
, 𝛼 = 0.61
𝒙1 =
4.39
βˆ’0.22
, 𝑓 𝒙1 = 1.98
Continuing..
Example: Steepest Descent
β€’ MATLAB code:
H=[.2 0;0 2];
f=@(x) x'*H*x/2; df=@(x) H*x; ddf=H;
x=[5;1];
xall=x';
for i=1:10
d=-df(x);
a=d'*d/(d'*H*d);
x=x+a*d;
xall=[xall;x'];
end
plot(xall(:,1),xall(:,2)), grid
axis([-1 5 -1 5]), axis equal
Steepest Descent Method
β€’ The steepest descent method becomes slow close to the optimum
β€’ The method progresses in a zigzag fashion, since
𝑑
𝑑𝛼
𝑓 π’™π‘˜
+ π›Όπ’…π‘˜
= 𝛻 𝑓 π’™π‘˜+1 𝑇
π’…π‘˜
= βˆ’π›» 𝑓 π’™π‘˜+1 𝑇
𝛻 𝑓 π’™π‘˜
= 0
β€’ The method has linear convergence with rate constant
𝐢 =
𝑓 π’™π‘˜+1 βˆ’π‘“ π’™βˆ—
𝑓 π’™π‘˜ βˆ’π‘“ π’™βˆ— ≀
π‘π‘œπ‘›π‘‘ 𝑨 βˆ’1
π‘π‘œπ‘›π‘‘ 𝑨 +1
2
Preconditioning
β€’ Preconditioning (scaling) can be used to reduce the condition
number of the Hessian matrix and hence aid convergence
β€’ Consider 𝑓 𝒙 = 0.1π‘₯1
2
+ π‘₯2
2
= 𝒙𝑇
𝑨𝒙, where 𝑨 = π‘‘π‘–π‘Žπ‘”(0.1, 1)
β€’ Define a linear transformation: 𝒙 = π‘·π’š, where 𝑷 = π‘‘π‘–π‘Žπ‘”( 10, 1);
then, 𝑓 𝒙 = π’šπ‘‡
𝑷𝑇
π‘¨π‘·π’š = π’šπ‘‡
π’š
β€’ Since π‘π‘œπ‘›π‘‘ 𝑰 = 1, the steepest descent method in the case of a
quadratic function converges in a single iteration
Conjugate Gradient Method
β€’ For any square matrix 𝑨, the set of 𝑨-conjugate vectors is defined
by: 𝒅𝑖𝑇
𝑨𝒅𝑗 = 0, 𝑖 β‰  𝑗
β€’ Let π’ˆπ‘˜ = 𝛻 𝑓 π’™π‘˜ denote the gradient; then, starting from
𝒅0
= βˆ’π’ˆ0, a set of 𝑨-conjugate directions is generated as:
𝒅0 = βˆ’π’ˆ0; π’…π‘˜+1 = βˆ’π’ˆπ‘˜+1 + π›½π‘˜π’…π‘˜ π‘˜ β‰₯ 0, …
where π›½π‘˜ =
π’ˆπ‘˜+1
𝑇
π‘¨π’…π‘˜
π’…π‘˜π‘‡
π‘¨π’…π‘˜
There are multiple ways to generate conjugate directions
β€’ Using {𝒅0
, 𝒅2
, … , π’…π‘›βˆ’1
} as search directions, a quadratic function is
minimized in 𝑛 steps.
Conjugate Directions Method
β€’ The parameter π›½π‘˜ can be computed in different ways:
– By substituting π‘¨π’…π‘˜
=
1
π›Όπ‘˜
(π’ˆπ‘˜+1 βˆ’ π’ˆπ‘˜), we obtain:
π›½π‘˜ =
π’ˆπ‘˜+1
𝑇
(π’ˆπ‘˜+1βˆ’π’ˆπ‘˜)
π’…π‘˜π‘‡
(π’ˆπ‘˜+1βˆ’π’ˆπ‘˜)
(the Hestenes-Stiefel formula)
– In the case of exact line search, π‘”π‘˜+1
𝑇
π’…π‘˜
= 0; then
π›½π‘˜ =
π’ˆπ‘˜+1
𝑇
(π’ˆπ‘˜+1βˆ’π’ˆπ‘˜)
π’ˆπ‘˜
π‘‡π’ˆπ‘˜
(the Polak-Ribiere formula)
– Also, for exact line search π’ˆπ‘˜+1
𝑇
π’ˆπ‘˜ = π›½π‘˜βˆ’1(π’ˆπ‘˜ + π›Όπ‘˜π‘¨π’…π‘˜
)𝑇
π’…π‘˜βˆ’1
= 0,
resulting in π›½π‘˜ =
π’ˆπ‘˜+1
𝑇
π’ˆπ‘˜+1
π’ˆπ‘˜
π‘‡π’ˆπ‘˜
(the Fletcher-Reeves formula)
Other versions of π›½π‘˜ have also been proposed.
Example: Conjugate Gradient Method
β€’ Consider min
𝒙
𝑓 𝒙 = 0.1π‘₯1
2
+ π‘₯2
2
,
𝛻𝑓 𝒙 =
0.2π‘₯1
2π‘₯2
, 𝛻2𝑓 π‘₯ =
0.1 0
0 1
; let 𝒙0 =
5
1
, then 𝑓 𝒙0 = 3.5,
𝑑0 = βˆ’ 𝛻𝑓 𝒙0 =
βˆ’1
βˆ’2
, 𝛼 = 0.61
𝒙1 =
4.39
βˆ’0.22
, 𝑓 𝒙1 = 1.98
𝛽0 = 0.19
𝑑1 =
βˆ’0.535
0.027
, 𝛼 = 8.2
𝒙1 =
0
0
Example: Conjugate Gradient Method
β€’ MATLAB code
H=[.2 0;0 2];
f=@(x) x'*H*x/2; df=@(x) H*x; ddf=H;
x=[5;1]; n=2;
xall=zeros(n+1,n); xall(1,:)=x';
d=-df(x); a=d'*d/(d'*H*d);
x=x+a*d; xall(2,:)=x';
for i=1:size(x,1)-1
b=df(x)'*H*d/(d'*H*d);
d=-df(x)+b*d;
r=-df(x);
a=r'*r/(d'*H*d);
x=x+a*d;
xall(i+2,:)=x';
end
plot(xall(:,1),xall(:,2)), grid
axis([-1 5 -1 5]), axis equal
Conjugate Gradient Algorithm
β€’ Conjugate-Gradient Algorithm (Griva, Nash & Sofer, p454):
β€’ Initialize: Choose 𝒙0 = 𝟎, 𝒓0 = 𝒃, 𝒅(βˆ’1) = 0, 𝛽0 = 0.
β€’ For 𝑖 = 0,1, …
– Check convergence: if 𝒓𝑖 < πœ–, stop.
– If 𝑖 > 0, set 𝛽𝑖 =
𝒓𝑖
𝑇
𝒓𝑖
π’“π‘–βˆ’1
𝑇 π’“π‘–βˆ’1
– Set 𝒅𝑖
= 𝒓𝑖 + π›½π‘–π’…π‘–βˆ’1
; 𝛼𝑖 =
𝒓𝑖
𝑇
𝒓𝑖
𝒅𝑖𝑇
𝑨𝒅𝑖
; 𝒙𝑖+1 = 𝒙𝑖 + 𝛼𝑖𝒅𝑖
;
𝒓𝑖+1 = 𝒓𝑖 βˆ’ 𝛼𝑖𝑨𝒅𝑖.
Conjugate Gradient Method
β€’ Assume that an update that includes steps 𝛼𝑖 along 𝑛 conjugate
vectors 𝒅𝑖 is assembled as: 𝑦 = 𝛼𝑖𝒅𝑖
𝑛
𝑖=1 .
β€’ Then, for a quadratic function, the minimization problem is
decomposed into a set of one-dimensional problems, i.e.,
min
𝑦
𝑓(π’š) ≑ min
𝛼𝑖
1
2
𝛼𝑖
2
𝒅𝑖𝑇
𝑨𝒅𝑖
βˆ’ 𝛼𝑖𝒃𝑇
𝒅𝑖
𝑛
𝑖=1
β€’ By setting the derivative with respect to 𝛼𝑖 equal to zero, i.e.,
𝛼𝑖𝒅𝑖𝑇
𝑨𝒅𝑖
βˆ’ 𝒃𝑇
𝒅𝑖
= 0, we obtain: 𝛼𝑖 =
𝒃𝑇𝒅𝑖
𝒅𝑖𝑇
𝑨𝒅𝑖
.
β€’ This shows that the CG algorithm iteratively determines the
conjugate directions 𝒅𝑖 and their coefficients 𝛼𝑖.
CG Rate of Convergence
β€’ Conjugate gradient methods achieve superlinear convergence:
– In the case of quadratic functions, the minimum is reached exactly
in 𝑛 iterations.
– For general nonlinear functions, convergence in 2𝑛 iterations is to
be expected.
β€’ Nonlinear CG methods typically have the lowest per iteration
computational costs of all gradient methods.
Newton’s Method
β€’ Consider minimizing the second order approximation of 𝑓 𝒙 :
min
𝒅
𝑓 π’™π‘˜ + Δ𝒙 = 𝑓 π’™π‘˜ + 𝛻𝑓 π’™π‘˜
𝑇Δ𝒙 + 1
2 Ξ”π’™π‘‡π‘―π‘˜Ξ”π’™
β€’ Apply FONC: π‘―π‘˜π’… + π’ˆπ‘˜ = 𝟎, where π’ˆπ‘˜ = 𝛻𝑓 π’™π‘˜
Then, assuming that π‘―π‘˜ = 𝛻2
𝑓 π’™π‘˜ stays positive definite, the
Newton’s update rule is derived as: π’™π‘˜+1 = π’™π‘˜ βˆ’ π‘―π‘˜
βˆ’1
π’ˆπ‘˜
β€’ Note:
– The convergence of the Newton’s method is dependent on π‘―π‘˜
staying positive definite.
– A step size may be included in the Newton’s method, i.e.,
π’™π‘˜+1 = π’™π‘˜ βˆ’ π›Όπ‘˜π‘―π‘˜
βˆ’1
π’ˆπ‘˜
Marquardt Modification to Newton’s Method
β€’ To ensure the positive definite condition on π‘―π‘˜, Marquardt
proposed the following modification to Newton’s method:
π‘―π‘˜ + πœ†π‘° 𝒅 = βˆ’π’ˆπ‘˜
where πœ† is selected to ensure that the Hessian is positive definite.
β€’ Since π‘―π‘˜ + πœ†π‘° is also symmetric, the resulting system of linear
equations can be solved for 𝒅 as:
𝑳𝑫𝑳𝑇
𝒅 = βˆ’π›»π‘“ π’™π‘˜
Newton’s Algorithm
Newton’s Method (Griva, Nash, & Sofer, p. 373):
1. Initialize: Choose 𝒙0, specify πœ–
2. For π‘˜ = 0,1, …
3. Check convergence: If 𝛻𝑓 π’™π‘˜ < πœ–, stop
4. Factorize modified Hessian as 𝛻2
𝑓 π’™π‘˜ + 𝑬 = 𝑳𝑫𝑳𝑇
and solve
𝑳𝑫𝑳𝑇 𝒅 = βˆ’π›»π‘“ π’™π‘˜ for 𝒅
5. Perform line search to determine π›Όπ‘˜ and update the solution
estimate as π’™π‘˜+1 = π’™π‘˜ + π›Όπ‘˜ π’…π‘˜
Rate of Convergence
β€’ Newton’s method achieves quadratic rate of convergence in the
close neighborhood of the optimal point, and superlinear
convergence otherwise.
β€’ The main drawback of the Newton’s method is its computational
cost: the Hessian matrix needs to be computed at every step, and a
linear system of equations needs to be solved to obtain the update.
β€’ Due to the high computational and storage costs, classic Newton’s
method is rarely used in practice.
Quasi Newton’s Methods
β€’ The quasi-Newton methods derive from a generalization of secant
method, that approximates the second derivative as:
𝑓′′
(π‘₯π‘˜) β‰…
𝑓′ π‘₯π‘˜ βˆ’π‘“β€²(π‘₯π‘˜βˆ’1)
π‘₯π‘˜βˆ’π‘₯π‘˜βˆ’1
β€’ In the multi-dimensional case, the secant condition is generalized
as: π‘―π‘˜ π’™π‘˜ βˆ’ π’™π‘˜βˆ’1 = 𝛻𝑓 π’™π‘˜ βˆ’ 𝛻𝑓 π’™π‘˜βˆ’1
β€’ Define π‘­π‘˜ = π‘―π‘˜
βˆ’1
, then
π’™π‘˜ βˆ’ π’™π‘˜βˆ’1 = π‘­π‘˜ 𝛻𝑓 π’™π‘˜ βˆ’ 𝛻𝑓 π’™π‘˜βˆ’1
β€’ The quasi-Newton methods iteratively update π‘―π‘˜ or π‘­π‘˜ as:
– Direct update: π‘―π‘˜+1 = π‘―π‘˜ + βˆ†π‘―π‘˜, 𝑯0 = 𝑰
– Inverse update: π‘­π‘˜+1 = π‘­π‘˜ + βˆ†π‘­π‘˜, 𝑭 = π‘―βˆ’1
, 𝑭0 = 𝑰
Quasi-Newton Methods
β€’ Quasi-Newton update:
Let π’”π‘˜ = π’™π‘˜+1 βˆ’ π’™π‘˜, π’šπ‘˜ = 𝛻𝑓 π’™π‘˜+1 βˆ’ 𝛻𝑓 π’™π‘˜ ; then,
– The DFP (Davison-Fletcher-Powell) formula for inverse Hessian
update is given as:
π‘­π‘˜+1 = π‘­π‘˜ βˆ’
π‘­π‘˜π’šπ‘˜ π‘­π‘˜π’šπ‘˜
𝑇
π’šπ‘˜
π‘‡π‘­π‘˜π’šπ‘˜
+
π’”π‘˜π’”π‘˜
𝑇
π’šπ‘˜
π‘‡π’”π‘˜
– The BGFS (Broyden, Fletcher, Goldfarb, Shanno) formula for direct
Hessian update is given as:
π‘―π‘˜+1 = π‘―π‘˜ βˆ’
π‘―π‘˜π’”π‘˜ π‘―π‘˜π’”π‘˜
𝑇
π’”π‘˜
π‘‡π‘―π‘˜π’”π‘˜
+
π’šπ‘˜π’šπ‘˜
𝑇
π’šπ‘˜
π‘‡π’”π‘˜
Quasi-Newton Algorithm
The Quasi-Newton Algorithm (Griva, Nash & Sofer, p.415):
β€’ Initialize: Choose 𝒙0, 𝑯0 (e.g., 𝑯0 = 𝑰), specify πœ€
β€’ For π‘˜ = 0,1, …
– Check convergence: If 𝛻𝑓 π’™π‘˜ < πœ€, stop
– Solve π‘―π‘˜π’… = βˆ’π›»π‘“ π’™π‘˜ for π’…π‘˜
(alternatively, 𝒅 = βˆ’π‘­π‘˜π›»π‘“ π’™π‘˜ )
– Solve min
𝛼
𝑓 π’™π‘˜ + π›Όπ’…π‘˜ for π›Όπ‘˜, and update the current estimate:
π’™π‘˜+1 = π’™π‘˜ + π›Όπ‘˜ π’…π‘˜
– Compute π’”π‘˜, π’šπ‘˜, and update π‘―π‘˜ (or π‘­π‘˜ as applicable)
Example: Quasi-Newton Method
β€’ Consider the problem: min
π‘₯1,π‘₯2
𝑓(π‘₯1, π‘₯2) = 2π‘₯1
2
βˆ’ π‘₯1π‘₯2 + π‘₯2
2
, where
𝑯 =
4 βˆ’ 1
βˆ’1 2
, 𝛻𝑓 = 𝑯
π‘₯1
π‘₯2
. Let 𝒙0 =
1
1
, 𝑓0 = 4, 𝑯0 = 𝑰, 𝑭0 = 𝑰;
Choose 𝒅0
= βˆ’π›»π‘“ π‘₯0
=
βˆ’3
βˆ’1
;
then 𝑓 𝛼 = 2 1 βˆ’ 3𝛼 2
+ 1 βˆ’ 𝛼 2
βˆ’ (1 βˆ’ 3𝛼)(1 βˆ’ 𝛼),
Using 𝑓′
𝛼 = 0 β†’ 𝛼 =
5
16
β†’ 𝒙1 = 0.625
0.688
, 𝑓1 = 0.875;
then π’š1 =
βˆ’3.44
0.313
, 𝑭1 =
1.193 0.065
0.065 1.022
, 𝑯1 =
0.381 βˆ’0.206
βˆ’0.206 0.9313
,
and using either update formula 𝒅1
=
0.4375
βˆ’1.313
; for the next step,
𝑓 𝛼 = 5.36𝛼2 βˆ’ 3.83𝛼 + 0.875 β†’ 𝛼 = βˆ’0.3572, 𝒙2 =
0.2188
0.2188
.
Example: Quasi-Newton Method
β€’ For quadratic function, convergence is achieved in two iterations.
Trust-Region Methods
β€’ The trust-region methods locally employ a quadratic approximation
π‘žπ‘˜ π’™π‘˜ to the nonlinear objective function.
β€’ The approximation is valid in the neighborhood of π’™π‘˜ defined by
Ξ©π‘˜ = 𝒙: πšͺ(𝒙 βˆ’ π’™π‘˜) ≀ βˆ†π‘˜ , where πšͺ is a scaling parameter.
β€’ The method aims to find a π’™π‘˜+1 ∈ Ξ©π‘˜, that satisfies the sufficient
decrease condition in 𝑓(𝒙).
β€’ The quality of the quadratic approximation is estimated by the
reliability index: π›Ύπ‘˜ =
𝑓(π’™π‘˜)βˆ’π‘“(π’™π‘˜+1)
π‘žπ‘˜ π’™π‘˜ βˆ’π‘žπ‘˜ π’™π‘˜+1
. If this ratio is close to unity,
the trust region may be expanded in the next iteration.
Trust-Region Methods
β€’ At each iteration π‘˜, trust-region algorithm solves a constrained
optimization sub-problem involving quadratic approximation:
min
𝒅
π‘žπ‘˜ 𝒅 = 𝑓 π’™π‘˜ + 𝛻𝑓 π’™π‘˜
𝑇
𝒅 +
1
2
𝒅𝑇
𝛻2
𝑓 π’™π‘˜ 𝒅
Subject to: 𝒅 ≀ βˆ†π‘˜
Lagrangian function: β„’ π‘₯, πœ† = 𝑓 π’™π‘˜ + πœ† 𝒅 βˆ’ βˆ†π‘˜
FONC: 𝛻2𝑓 π’™π‘˜ + πœ†π‘° π’…π‘˜ = βˆ’π›»π‘“ π’™π‘˜ , πœ† 𝒅 βˆ’ βˆ†π‘˜ = 0
β€’ The resulting search direction π’…π‘˜ is given as: π’…π‘˜ = π’…π‘˜(πœ†).
– For large βˆ†π‘˜ and a positive-definite 𝛻2𝑓 π’™π‘˜ , the Lagrange
multiplier πœ† β†’ 0, and π’…π‘˜
(πœ†) reduces to the Newton’s direction.
– For βˆ†π‘˜β†’ 0, πœ† β†’ ∞, and π’…π‘˜
(πœ†) aligns with the steepest-descent
direction.
Trust-Region Algorithm
β€’ Trust-Region Algorithm (Griva, Nash & Sofer, p.392):
β€’ Initialize: choose 𝒙0, βˆ†0; specify πœ€, 0 < πœ‡ < πœ‚ < 1 (e.g., πœ‡ =
1
4
; πœ‚ =
3
4
)
β€’ For π‘˜ = 0,1, …
– Check convergence: If 𝛻𝑓 π’™π‘˜ < πœ€, stop
– Solve the subproblem: min
𝒅
π‘žπ‘˜ 𝒅 subject to 𝒅 ≀ βˆ†π‘˜
– Compute π›Ύπ‘˜,
β€’ if π›Ύπ‘˜ < πœ‡, set π’™π‘˜+1 = π’™π‘˜, βˆ†π‘˜+1=
1
2
βˆ†π‘˜
β€’ else if π›Ύπ‘˜ < πœ‚, set π’™π‘˜+1 = π’™π‘˜ + π’…π‘˜
, βˆ†π‘˜+1= βˆ†π‘˜
β€’ else set π’™π‘˜+1 = π’™π‘˜ + π’…π‘˜
, βˆ†π‘˜+1= 2βˆ†π‘˜
Computer Methods for Constrained Problems
β€’ Penalty and Barrier methods
β€’ Augmented Lagrangian method (AL)
β€’ Sequential linear programming (SLP)
β€’ Sequential quadratic programming (SQP)
Penalty and Barrier Methods
β€’ Consider the general optimization problem: min
𝒙
𝑓 𝒙
Subject to
β„Žπ‘– 𝒙 = 0, 𝑖 = 1, … , 𝑝;
𝑔𝑗 𝒙 ≀ 0, 𝑗 = 𝑖, … , π‘š;
π‘₯𝑖𝐿 ≀ π‘₯𝑖 ≀ π‘₯π‘–π‘ˆ, 𝑖 = 1, … , 𝑛.
β€’ Define a composite function to be used for constraint compliance:
Ξ¦ 𝒙, π‘Ÿ = 𝑓 𝒙 + 𝑃 𝑔 𝒙 , β„Ž 𝒙 , 𝒓
where 𝑃 defines a loss function, and 𝒓 is a vector of weights (penalty
parameters)
Penalty and Barrier Methods
β€’ Penalty Function Method. A penalty function method employs a
quadratic loss function and iterates through the infeasible region
𝑃 𝑔 𝒙 , β„Ž 𝒙 , 𝒓 = π‘Ÿ 𝑔𝑖
+
𝒙
2
𝑖 + β„Žπ‘– 𝒙 2
𝑖
𝑔𝑖
+
𝒙 = max 0, 𝑔𝑖 𝒙 , π‘Ÿ > 0
β€’ Barrier Function Method. A barrier method employs a log barrier
function and iterates through the feasible region
𝑃 𝑔 𝒙 , β„Ž 𝒙 , 𝒓 =
1
π‘Ÿ
log βˆ’π‘”π‘– π‘₯
𝑖
β€’ For both penalty and barrier methods, as π‘Ÿ β†’ ∞, 𝒙(π‘Ÿ) β†’ π’™βˆ—
The Augmented Lagrangian Method
β€’ Consider an equality-constrained problem: min
𝒙
𝑓 𝒙
Subject to: β„Žπ‘– 𝒙 = 0, 𝑖 = 1, … , 𝑙
β€’ Define the augmented Lagrangian (AL) as:
𝒫 𝒙, 𝒗, π‘Ÿ = 𝑓 𝒙 + π‘£π‘—β„Žπ‘— 𝒙 +
1
2
π‘Ÿβ„Žπ‘—
2
𝒙
𝑗
where the additional term defines an exterior penalty function with
π‘Ÿ as the penalty parameter.
β€’ For inequality constrained problems, the AL may be defined as:
𝒫 𝒙, 𝒖, π‘Ÿ = 𝑓 𝒙 +
𝑒𝑖𝑔𝑖 𝒙 +
1
2
π‘Ÿπ‘”π‘–
2
𝒙 , if 𝑔𝑗 +
𝑒𝑗
π‘Ÿ
β‰₯ 0
βˆ’
1
2π‘Ÿ
𝑒𝑖
2
, if 𝑔𝑗 +
𝑒𝑗
π‘Ÿ
< 0
𝑖
where a large π‘Ÿ makes the Hessian of AL positive definite at 𝒙.
The Augmented Lagrangian Method
β€’ The dual function for the AL is defined as:
πœ“ 𝒗 = min
𝒙
𝒫 𝒙, 𝒗, π‘Ÿ = 𝑓 𝒙 + π‘£π‘—β„Žπ‘— 𝒙 +
1
2
π‘Ÿ β„Žπ‘— 𝒙
2
𝑗
β€’ The resulting dual optimization problem is: max
𝒗
πœ“ 𝒗
β€’ The dual problem may be solved via Newton’s method as:
π’—π‘˜+1
= π’—π‘˜
βˆ’
𝑑2πœ“
𝑑𝑣𝑖𝑑𝑣𝑗
βˆ’1
𝒉
where
𝑑2πœ“
𝑑𝑣𝑖𝑑𝑣𝑗
= βˆ’π›»β„Žπ‘–
𝑇
𝛻2𝒫 βˆ’1π›»β„Žπ‘—
β€’ For large 𝒓, the Newton’s update may be approximated as:
𝑣𝑗
π‘˜+1
= 𝑣𝑗
π‘˜
+ π‘Ÿ
π‘—β„Žπ‘—, 𝑗 = 1, … , 𝑙
Example: Augmented Lagrangian
β€’ Maximize the volume of a cylindrical tank subject to surface area
constraint:
max
𝑑,𝑙
𝑓 𝑑, 𝑙 =
πœ‹π‘‘2𝑙
4
, subject to β„Ž:
πœ‹π‘‘2
4
+ πœ‹π‘‘π‘™ βˆ’ 𝐴0 = 0
β€’ We can normalize the problem as:
min
𝑑,𝑙
𝑓 𝑑, 𝑙 = βˆ’π‘‘2
𝑙, subject to β„Ž: 𝑑2
+ 4𝑑𝑙 βˆ’ 1 = 0
β€’ The solution to the primal problem is obtained as:
Lagrangian function: β„’ 𝑑, 𝑙, πœ† = βˆ’π‘‘2
𝑙 + πœ†(𝑑2
+ 4𝑑𝑙 βˆ’ 1)
FONC: πœ† 𝑑 + 2𝑙 βˆ’ 𝑑𝑙 = 0, πœ†π‘‘ 𝑑 + 4 βˆ’ 𝑑2
= 0, 𝑑2
+ 4𝑑𝑙 βˆ’ 1 = 0
Optimal solution: π‘‘βˆ— = 2π‘™βˆ— = 4πœ†βˆ— =
1
3
.
Example: Augmented Lagrangian
β€’ Alternatively, define the Augmented Lagrangian function as:
𝒫 𝑑, 𝑙, πœ†, π‘Ÿ = βˆ’π‘‘2𝑙 + πœ† 𝑑2 + 4𝑑𝑙 βˆ’ 1 +
1
2
π‘Ÿ 𝑑2 + 4𝑑𝑙 βˆ’ 1 2
β€’ Define the dual function: πœ“ πœ† = min
𝑑,𝑙
𝒫 𝑑, 𝑙, πœ†, π‘Ÿ
β€’ Define dual optimization problem: max
𝑑,𝑙
πœ“ πœ†
β€’ Solution to the dual problem: πœ†βˆ—
= πœ†π‘šπ‘Žπ‘₯ = 0.144
β€’ Solution to the design variables: π‘‘βˆ— = 2π‘™βˆ— = 0.577
Sequential Linear Programming
β€’ Consider the general optimization problem: min
𝒙
𝑓 𝒙
Subject to
β„Žπ‘– 𝒙 = 0, 𝑖 = 1, … , 𝑝;
𝑔𝑗 𝒙 ≀ 0, 𝑗 = 𝑖, … , π‘š;
π‘₯𝑖𝐿 ≀ π‘₯𝑖 ≀ π‘₯π‘–π‘ˆ, 𝑖 = 1, … , 𝑛.
β€’ Let π’™π‘˜ denote the current estimate of the design variables, and let
𝒅 denote the change in variables; define the first order expansion
of the objective and constraint functions in the neighborhood of π’™π‘˜
𝑓 π’™π‘˜
+ 𝒅 = 𝑓 π’™π‘˜
+ 𝛻𝑓 π’™π‘˜ 𝑇
𝒅
𝑔𝑖 π’™π‘˜ + 𝒅 = 𝑔𝑖 π’™π‘˜ + 𝛻𝑔𝑖 π’™π‘˜ 𝑇
𝒅, 𝑖 = 1, … , π‘š
β„Žπ‘— π’™π‘˜ + 𝒅 = β„Žπ‘— π’™π‘˜ + π›»β„Žπ‘— π’™π‘˜ 𝑇
𝒅, 𝑗 = 1, … , 𝑙
Sequential Linear Programming
β€’ Let π‘“π‘˜ = 𝑓 π’™π‘˜ , 𝑔𝑖
π‘˜
= 𝑔𝑖 π’™π‘˜ , β„Žπ‘—
π‘˜
= β„Žπ‘— π’™π‘˜ ; 𝑏𝑖 = βˆ’π‘”π‘–
π‘˜
, 𝑒𝑗 = βˆ’β„Žπ‘—
π‘˜
,
𝒄 = 𝛻𝑓 π’™π‘˜ , 𝒂𝑖 = 𝛻𝑔𝑖 π’™π‘˜ , 𝒏𝑗 = π›»β„Žπ‘— π’™π‘˜ ,
𝑨 = 𝒂1, 𝒂2, … , π’‚π‘š , 𝑡 = 𝒏1, 𝒏2, … , 𝒏𝑙 .
β€’ Using first order expansion, define an LP subprogram for the
current iteration of the NLP problem:
min
𝒅
𝑓 = 𝒄𝑇𝒅
Subject to: 𝑨𝑇
𝒅 ≀ 𝒃,
𝑡𝑇𝒅 = 𝒆
where 𝑓 represents first-order change in the cost function, and the
columns of 𝑨 and 𝑡 matrices represent, respectively, the gradients
of inequality and equality constraints.
β€’ The resulting LP problem can be solved via the Simplex method.
Sequential Linear Programming
β€’ We may note that:
– Since both positive and negative changes to design variables π’™π‘˜ are
allowed, the variables 𝑑𝑖 are unrestricted in sign
– The SLP method requires additional constraints of the form:
βˆ’ βˆ†π‘–π‘™
π‘˜
≀ 𝑑𝑖
π‘˜
≀ βˆ†π‘–π‘’
π‘˜
(termed move limits) to bind the LP solution.
These limits represent maximum allowable change in 𝑑𝑖 in the
current iteration and are selected as percentage of current value.
– Move limits serve dual purpose of binding the solution and
obviating the need for line search.
– Overly restrictive move limits tend to make the SLP problem
infeasible.
SLP Example
β€’ Consider the convex NLP problem:
min
π‘₯1,π‘₯2
𝑓(π‘₯1, π‘₯2) = π‘₯1
2
βˆ’ π‘₯1π‘₯2 + π‘₯2
2
Subject to: 1 βˆ’ π‘₯1
2
βˆ’ π‘₯2
2
≀ 0; βˆ’π‘₯1 ≀ 0, βˆ’π‘₯2 ≀ 0
The problem has a single minimum at: π’™βˆ—
=
1
2
,
1
2
β€’ The objective and constraint gradients are:
𝛻𝑓𝑇
= 2π‘₯1 βˆ’ π‘₯2, 2π‘₯2 βˆ’ π‘₯1 ,
𝛻𝑔1
𝑇
= βˆ’2π‘₯1, βˆ’2π‘₯2 , 𝛻𝑔2
𝑇
= βˆ’1,0 , 𝛻𝑔3
𝑇
= [0, βˆ’1].
β€’ Let 𝒙0
= 1, 1 , then 𝑓0
= 1, 𝒄𝑇
= 1 1 , 𝑏1 = 𝑏2 = 𝑏3 = 1;
𝒂1
𝑇
= βˆ’2 βˆ’ 2 , 𝒂2
𝑇
= βˆ’1 0 , 𝒂3
𝑇
= 0 βˆ’ 1
SLP Example
β€’ Define the LP subproblem at the current step as:
min
𝑑1,𝑑2
𝑓 π‘₯1, π‘₯2 = 𝑑1 + 𝑑2
Subject to:
βˆ’2 βˆ’2
βˆ’1 0
0 βˆ’1
𝑑1
𝑑2
≀
1
1
1
β€’ In the absence of move limits, the LP problem is unbounded; using
50% move limits, the SLP update is given as: π’…βˆ—
= βˆ’
1
2
, βˆ’
1
2
𝑇
,
𝒙1 =
1
2
,
1
2
𝑇
, with resulting constraint violation: 𝑔𝑖 =
1
2
, 0, 0 ;
smaller move limits may be used to reduce the constraint violation.
Sequential Linear Programming
SLP Algorithm (Arora, p. 508):
β€’ Initialize: choose 𝒙0, πœ€1 > 0, πœ€2 > 0.
β€’ For π‘˜ = 0,1,2, …
– Choose move limits βˆ†π‘–π‘™
π‘˜
, βˆ†π‘–π‘’
π‘˜
as some fraction of current design π’™π‘˜
– Compute π‘“π‘˜
, 𝒄, 𝑔𝑖
π‘˜
, β„Žπ‘—
π‘˜
, 𝑏𝑖, 𝑒𝑗
– Formulate and solve the LP subproblem for π’…π‘˜
– If 𝑔𝑖 ≀ πœ€1; 𝑖 = 1, … , π‘š; β„Žπ‘— ≀ πœ€1; 𝑖 = 1, … , 𝑝; and π’…π‘˜ ≀ πœ€2, stop
– Substitute π’™π‘˜+1 ← π’™π‘˜ + π›Όπ’…π‘˜, π‘˜ ← π‘˜ + 1.
Sequential Quadratic Programming
β€’ Sequential quadratic programming (SQP) uses a quadratic
approximation to the objective function at every step of iteration.
β€’ The SQP problem is defined as:
min
𝒅
𝑓 = 𝒄𝑇
𝒅 +
1
2
𝒅𝑇
𝒅
Subject to, 𝑨𝑇𝒅 ≀ 𝒃, 𝑡𝑇𝒅 = 𝒆
β€’ SQP does not require move limits, alleviating the shortcomings of
the SLP method.
β€’ The SQP problem is convex; hence, it has a single global minimum.
β€’ SQP can be solved via Simplex based linear complementarity problem
(LCP) framework.
Sequential Quadratic Programming
β€’ The Lagrangian function for the SQP problem is defined as:
β„’ 𝒅, 𝒖, 𝒗 = 𝒄𝑇𝒅 + 1
2
𝒅𝑇𝒅 + 𝒖𝑇 𝑨𝑇𝒅 βˆ’ 𝒃 + 𝒔 + 𝒗𝑇(𝑡𝑇𝒅 βˆ’ 𝒆)
β€’ Then the KKT conditions are:
Optimality: 𝛁ℒ = 𝒄 + 𝒅 + 𝑨𝒖 + 𝑡𝒗 = 𝟎,
Feasibility: 𝑨𝑇
𝒅 + 𝒔 = 𝒃, 𝑡𝑇
𝒅 = 𝒆 ,
Complementarity: 𝒖𝑇
𝒔 = 𝟎,
Non-negativity: 𝒖 β‰₯ 𝟎, 𝒔 β‰₯ 𝟎
Sequential Quadratic Programming
β€’ Since 𝒗 is unrestricted in sign, let 𝒗 = π’š βˆ’ 𝒛, π’š β‰₯ 𝟎, 𝒛 β‰₯ 𝟎, and
the KKT conditions are compactly written as:
𝑰 𝑨
𝑨𝑇
𝟎
𝑡𝑇
𝟎
𝟎
𝑰
𝟎
𝑡 βˆ’π‘΅
𝟎 𝟎
𝟎 𝟎
𝒅
𝒖
𝒔
π’š
𝒛
=
βˆ’π’„
𝒃
𝒆
,
or 𝑷𝑿 = 𝑸
β€’ The complementary slackness conditions, 𝒖𝑇𝒔 = 𝟎, translate as:
𝑿𝑖𝑿𝑖+π‘š = 0, 𝑖 = 𝑛 + 1, β‹― , 𝑛 + π‘š.
β€’ The resulting problem can be solved via Simplex method using LCP
framework.
Descent Function Approach
β€’ In SQP methods, the line search step is based on minimization of a
descent function that penalizes constraint violations, i.e.,
Ξ¦ 𝒙 = 𝑓 𝒙 + 𝑅𝑉 𝒙
where 𝑓 𝒙 is the cost function, 𝑉 𝒙 represents current
maximum constraint violation, and 𝑅 > 0 is a penalty parameter.
β€’ The descent function value at the current iteration is computed as:
Ξ¦π‘˜ = π‘“π‘˜ + π‘…π‘‰π‘˜,
𝑅 = max π‘…π‘˜, π‘Ÿπ‘˜ where π‘Ÿπ‘˜ = 𝑒𝑖
π‘˜
π‘š
𝑖=1 + 𝑣𝑗
π‘˜
𝑝
𝑗=1
π‘‰π‘˜ = max {0; 𝑔𝑖, 𝑖 = 1, . . . , π‘š; β„Žπ‘— , 𝑗 = 1, … , 𝑝}
β€’ The line search subproblem is defined as:
min
𝛼
Ξ¦ 𝛼 = Ξ¦ π’™π‘˜
+ π›Όπ’…π‘˜
SQP Algorithm
SQP Algorithm (Arora, p. 526):
β€’ Initialize: choose 𝒙0, 𝑅0 = 1, πœ€1 > 0, πœ€2 > 0.
β€’ For π‘˜ = 0,1,2, …
– Compute π‘“π‘˜
, 𝑔𝑖
π‘˜
, β„Žπ‘—
π‘˜
, 𝒄, 𝑏𝑖, 𝑒𝑗; compute π‘‰π‘˜.
– Formulate and solve the QP subproblem to obtain π’…π‘˜ and the
Lagrange multipliers π’–π‘˜
and π’—π‘˜
.
– If π‘‰π‘˜ ≀ πœ€1 and π’…π‘˜
≀ πœ€2, stop.
– Compute 𝑅; formulate and solve line search subproblem for 𝛼
– Set π’™π‘˜+1
← π’™π‘˜
+ π›Όπ’…π‘˜
, π‘…π‘˜+1 ← 𝑅, π‘˜ ← π‘˜ + 1
β€’ The above algorithm is convergent, i.e., Ξ¦ π’™π‘˜
≀ Ξ¦ 𝒙0
; π’™π‘˜
converges to the KKT point π’™βˆ—
SQP with Approximate Line Search
β€’ The SQP algorithm can use with approximate line search as follows:
Let 𝑑𝑗, 𝑗 = 0,1, … denote a trial step size,
π’™π‘˜+1,𝑗
denote the trial design point,
π‘“π‘˜+1,𝑗
= 𝑓( π’™π‘˜+1,𝑗
) denote the function value at the trial solution, and
Ξ¦π‘˜+1,𝑗 = π‘“π‘˜+1,𝑗
+ π‘…π‘‰π‘˜+1,𝑗 is the penalty function at the trial solution.
β€’ The trial solution is required to satisfy the descent condition:
Ξ¦π‘˜+1,𝑗 + 𝑑𝑗𝛾 π’…π‘˜ 2
≀ Ξ¦π‘˜,𝑗, 0 < 𝛾 < 1
where a common choice is: 𝛾 =
1
2
, πœ‡ =
1
2
, 𝑑𝑗 = πœ‡π‘—
, 𝑗 = 0,1,2, ….
β€’ The above descent condition ensures that the constraint violation
decreases at each step of the method.
SQP Example
β€’ Consider the NLP problem: min
π‘₯1,π‘₯2
𝑓(π‘₯1, π‘₯2) = π‘₯1
2
βˆ’ π‘₯1π‘₯2 + π‘₯2
2
subject to 𝑔1: 1 βˆ’ π‘₯1
2
βˆ’ π‘₯2
2
≀ 0, 𝑔2: βˆ’π‘₯1 ≀ 0, 𝑔3: βˆ’π‘₯2 ≀ 0
Then 𝛻𝑓𝑇
= 2π‘₯1 βˆ’ π‘₯2, 2π‘₯2 βˆ’ π‘₯1 , 𝛻𝑔1
𝑇
= βˆ’2π‘₯1, βˆ’2π‘₯2 , 𝛻𝑔2
𝑇
=
βˆ’1,0 , 𝛻𝑔3
𝑇
= [0, βˆ’1]. Let π‘₯0 = 1, 1 ; then, 𝑓0 = 1, 𝒄 = 1, 1 𝑇,
𝑔1 1,1 = 𝑔2 1,1 = 𝑔3 1,1 = βˆ’1.
β€’ Since all constraints are initially inactive, 𝑉0 = 0, and 𝒅 = βˆ’π’„ =
βˆ’1, βˆ’1 𝑇; the line search problem is: min
𝛼
Ξ¦ 𝛼 = 1 βˆ’ 𝛼 2;
β€’ By setting Ξ¦β€²
𝛼 = 0, we get the analytical solution: 𝛼 = 1; thus
π‘₯1 = 0, 0 , which results in a large constraint violation
SQP Example
β€’ Alternatively, we may use approximate line search as follows:
– Let 𝑅0 = 10, 𝛾 = πœ‡ =
1
2
; let 𝑑0 = 1, then 𝒙1,0 = 0,0 , 𝑓1,0 = 0,
𝑉1,0 = 1, Ξ¦1,0 = 10; 𝒅0 2 = 2, and the descent condition
Ξ¦1,0 +
1
2
𝒅0 2
≀ Ξ¦0 = 1 is not met at the trial point.
– Next, for 𝑑1 =
1
2
, we get: 𝒙1,1 =
1
2
,
1
2
, 𝑓1,1 =
1
4
, V1,1 =
1
2
,
Ξ¦1,1 = 5
1
4
, and the descent condition fails again;
– Next, for 𝑑2 =
1
4
, we get: 𝒙1,2
=
3
4
,
3
4
, V1,2 = 0, 𝑓1,2
= Ξ¦1,2 =
9
16
,
and the descent condition checks as: Ξ¦1,2 +
1
8
𝒅0 2
≀ Ξ¦0 = 1.
– Therefore, we set 𝛼 = 𝑑2 =
1
4
, 𝒙1
= 𝒙1,2
=
3
4
,
3
4
with no
constraint violation.
The Active Set Strategy
β€’ To reduce the computational cost of solving the QP subproblem, we
may only include the active constraints in the problem.
β€’ For π’™π‘˜
∈ Ω, the set of potentially active constraints is defined as:
β„π‘˜ = 𝑖: 𝑔𝑖
π‘˜
> βˆ’πœ€; 𝑖 = 1, … , π‘š ⋃ 𝑗: 𝑗 = 1, … , 𝑝 for some πœ€.
β€’ For π’™π‘˜
βˆ‰ Ξ©, let π‘‰π‘˜ = max {0; 𝑔𝑖
π‘˜
, 𝑖 = 1, . . . , π‘š; β„Žπ‘—
π‘˜
, 𝑗 = 1, … , 𝑝};
then, the active constraint set is defined as:
β„π‘˜ = 𝑖: 𝑔𝑖
π‘˜
> π‘‰π‘˜ βˆ’ πœ€; 𝑖 = 1, … , π‘š ⋃ 𝑗: β„Žπ‘—
π‘˜
> π‘‰π‘˜ βˆ’ πœ€; 𝑗 = 1, … , 𝑝
β€’ The gradients of inactive constraints, i.e., those not in β„π‘˜, do not
need to be computed
SQP via Newton’s Method
β€’ Consider the following equality constrained problem:
min
𝒙
𝑓(𝒙), subject to β„Žπ‘– 𝒙 = 0, 𝑖 = 1, … , 𝑙
β€’ The Lagrangian function is given as: β„’ 𝒙, 𝒗 = 𝑓 𝒙 + 𝒗𝑇𝒉(𝒙)
β€’ The KKT conditions are: 𝛻ℒ 𝒙, 𝒗 = 𝛻𝑓 𝒙 + 𝑡𝒗 = 𝟎, 𝒉 𝒙 = 𝟎
where 𝑡 = 𝛁𝒉(𝒙) is a Jacobian matrix whose 𝑖th column is π›»β„Žπ‘– 𝒙
β€’ Using first order Taylor series expansion (with shorthand notation):
π›»β„’π‘˜+1 = π›»β„’π‘˜ + 𝛻2β„’π‘˜Ξ”π’™ + 𝑁Δ𝒗
π’‰π‘˜+1 = π’‰π‘˜ + 𝑡𝑇Δ𝒙
β€’ By expanding Δ𝒗 = π’—π‘˜+1
βˆ’ π’—π‘˜
, π›»β„’π‘˜
= π›»π‘“π‘˜
+ π‘΅π’—π‘˜
, and assuming
π’—π‘˜ β‰… π’—π‘˜+1 we obtain: 𝛻2
β„’π‘˜
𝑡
𝑡𝑇
𝟎
Ξ”π’™π‘˜
π’—π‘˜+1 = βˆ’
π›»π‘“π‘˜
π’‰π‘˜
which is similar to N-R update, but uses Hessian of the Lagrangian
SQP via Newton’s Method
β€’ Alternately, we consider minimizing the quadratic approximation:
min
Δ𝒙
1
2
Δ𝒙𝑇𝛻2ℒΔ𝒙 + 𝛻𝑓𝑇Δ𝒙
Subject to: β„Žπ‘– π‘₯ + 𝒏𝑖
𝑇
Δ𝒙 = 0, 𝑖 = 𝑖, … , 𝑙
β€’ The KKT conditions are: 𝛻𝑓 + 𝛻2
ℒΔ𝒙 + 𝑡𝒗 = 𝟎, 𝒉 + 𝑡Δ𝒙 = 𝟎
β€’ Thus the QP subproblem can be solved via Newton’s method!
𝛻2
β„’π‘˜
𝑡
𝑡𝑇
𝟎
Ξ”π’™π‘˜
π’—π‘˜+1 = βˆ’
π›»π‘“π‘˜
π’‰π‘˜
β€’ The Hessian of the Lagrangian can be updated via BFGS method as:
π‘―π‘˜+1
= π‘―π‘˜
+ π‘«π‘˜
βˆ’ π‘¬π‘˜
where π‘«π‘˜ =
π’šπ‘˜π’šπ‘˜π‘‡
π’šπ‘˜π‘‡
Ξ”π’™π‘˜
, π‘¬π‘˜ =
π’„π‘˜π’„π‘˜π‘‡
π’„π‘˜π‘‡
Ξ”π’™π‘˜
, π’„π‘˜ = π‘―π‘˜Ξ”π’™π‘˜, π’šπ‘˜ = π›»β„’π‘˜+1 βˆ’ β„’π‘˜
Example: SQP with Hessian Update
β€’ Consider the NLP problem: min
π‘₯1,π‘₯2
𝑓(π‘₯1, π‘₯2) = π‘₯1
2
βˆ’ π‘₯1π‘₯2 + π‘₯2
2
subject to 𝑔1: 1 βˆ’ π‘₯1
2
βˆ’ π‘₯2
2
≀ 0, 𝑔2: βˆ’π‘₯1 ≀ 0, 𝑔3: βˆ’π‘₯2 ≀ 0
Let π‘₯0
= 1, 1 ; then, 𝑓0
= 1, 𝒄 = 1, 1 𝑇
, 𝑔1 1,1 = 𝑔2 1,1 =
𝑔3 1,1 = βˆ’1; 𝛻𝑔1
𝑇
= βˆ’2, βˆ’2 , 𝛻𝑔2
𝑇
= βˆ’1,0 , 𝛻𝑔3
𝑇
= [0, βˆ’1].
β€’ Using approximate line search, 𝛼 =
1
4
, 𝒙1 =
3
4
,
3
4
.
β€’ For the Hessian update, we have:
𝑓1 = 0.5625, 𝑔1 = βˆ’0.125, 𝑔2 = 𝑔3 = βˆ’0.75; 𝒄1 = [0.75, 0.75];
𝛻𝑔1
𝑇
= βˆ’
3
2
, βˆ’
3
2
, 𝛻𝑔2
𝑇
= βˆ’1,0 , 𝛻𝑔3
𝑇
= 0, βˆ’1 ; Δ𝒙0
= βˆ’0.25, βˆ’0.25 ;
then, 𝑫0 = 8
1 1
1 1
, 𝑬0 = 8
1 1
1 1
, 𝑯1 = 𝑯0
SQP with Hessian Update
β€’ For the next step, the QP problem is defined as:
min
𝑑1,𝑑2
𝑓 =
3
4
𝑑1 + 𝑑2 +
1
2
𝑑1
2
+ 𝑑2
2
Subject to: βˆ’
3
2
𝑑1 + 𝑑2 ≀ 0, βˆ’π‘‘1 ≀ 0, βˆ’π‘‘2 ≀ 0
β€’ The application of KKT conditions results in a linear system of
equations, which are solved to obtain:
𝒙𝑇
= 𝑑1, 𝑑2, 𝑒1, 𝑒2, 𝑒3, 𝑠1, 𝑠2, 𝑠3 = 0.188, 0.188, 0, 0, 0,0.125, 0.75, 0.75
Modified SQP Algorithm
Modified SQP Algorithm (Arora, p. 558):
β€’ Initialize: choose 𝒙0, 𝑅0 = 1, 𝑯0 = 𝐼; πœ€1, πœ€2 > 0.
β€’ For π‘˜ = 0,1,2, …
– Compute π‘“π‘˜
, 𝑔𝑖
π‘˜
, β„Žπ‘—
π‘˜
, 𝒄, 𝑏𝑖, 𝑒𝑗, and π‘‰π‘˜. If π‘˜ > 0, compute π‘―π‘˜
– Formulate and solve the modified QP subproblem for search
direction π’…π‘˜
and the Lagrange multipliers π’–π‘˜
and π’—π‘˜
.
– If π‘‰π‘˜ ≀ πœ€1 and π’…π‘˜
≀ πœ€2, stop.
– Compute 𝑅; formulate and solve line search subproblem for 𝛼
– Set π’™π‘˜+1
← π’™π‘˜
+ π›Όπ’…π‘˜
, π‘…π‘˜+1 ← 𝑅, π‘˜ ← π‘˜ + 1.
SQP Algorithm
%SQP subproblem via Hessian update
% input: xk (current design); Lk (Hessian of Lagrangian
estimate)
%initialize
n=size(xk,1);
if ~exist('Lk','var'), Lk=diag(xk+(~xk)); end
tol=1e-7;
%function and constraint values
fk=f(xk);
dfk=df(xk);
gk=g(xk);
dgk=dg(xk);
%N-R update
A=[Lk dgk; dgk' 0*dgk'*dgk];
b=[-dfk;-gk];
dx=Ab;
dxk=dx(1:n);
lam=dx(n+1:end);
SQP Algorithm
%inactive constraints
idx1=find(lam<0);
if idx1
[dxk,lam]=inactive(lam,A,b,n);
end
%check termination
if abs(dxk)<tol, return, end
%adjust increment for constraint compliance
P=@(xk) f(xk)+lam'*abs(g(xk));
while P(xk+dxk)>P(xk),
dxk=dxk/2;
if abs(dxk)<tol, break, end
end
%Hessian update
dL=@(x) df(x)+dg(x)*lam;
Lk=update(Lk, xk, dxk, dL);
xk=xk+dxk;
disp([xk' f(xk) P(xk)])
SQP Algorithm
%function definitions
function [dxk,lam]=inactive(lam,A,b,n)
idx1=find(lam<0);
lam(idx1)=0;
idx2=find(lam);
v=[1:n,n+idx2];
A=A(v,v); b=b(v);
dx=Ab;
dxk=dx(1:n);
lam(idx2)=dx(n+1:end);
end
function Lk=update(Lk, xk, dxk, dL)
ga=dL(xk+dxk)-dL(xk);
Hx=Lk*dxk;
Dk=ga*ga'/(ga'*dxk);
Ek=Hx*Hx'/(Hx'*dxk);
Lk=Lk+Dk-Ek;
end
Generalized Reduced Gradient
β€’ The GRG method finds the search direction by projecting the
objective function gradient onto the constraint hyperplane.
β€’ The GRG points tangent to the constraint hyperplane, so that
iterative steps try to conform to the constraints.
β€’ The constraints are effectively used to implicitly eliminate variables
and reduce problem dimensions.
Implicit Elimination
β€’ Consider an equality constrained problem in two variables:
Objective: min 𝑓 𝒙 , 𝒙𝑇
= π‘₯1, π‘₯2
Subject to: 𝑔 𝒙 = 0
β€’ The variation in the objective and constraint functions are:
𝑑𝑓 = 𝛻𝑓 𝑇
𝑑𝒙 =
πœ•π‘“
πœ•π‘₯1
𝑑π‘₯1 +
πœ•π‘“
πœ•π‘₯2
𝑑π‘₯2
𝑑𝑔 = 𝛻𝑔 𝑇
𝑑𝒙 =
πœ•π‘”
πœ•π‘₯1
𝑑π‘₯1 +
πœ•π‘”
πœ•π‘₯2
𝑑π‘₯2 = 0
β€’ Solve for 𝑑π‘₯2 = βˆ’
πœ•π‘”/πœ•π‘₯1
πœ•π‘”/πœ•π‘₯2
𝑑π‘₯1 and substitute in the objective function:
𝑑𝑓 =
πœ•π‘“
πœ•π‘₯1
βˆ’
πœ•π‘“
πœ•π‘₯2
πœ•π‘”/πœ•π‘₯1
πœ•π‘”/πœ•π‘₯2
𝑑π‘₯1
β€’ Then the reduced gradient of 𝑓 along π‘₯1 is given as:
𝛻𝑓𝑅 =
πœ•π‘“
πœ•π‘₯1
βˆ’
πœ•π‘“
πœ•π‘₯2
πœ•π‘”/πœ•π‘₯1
πœ•π‘”/πœ•π‘₯2
Implicit Elimination
β€’ Consider a problem in 𝑛 variable with π‘š equality constraints:
Objective: min 𝑓 𝒙 , 𝒙𝑇 = π‘₯1, π‘₯2, … , π‘₯𝑛
Subject to: 𝑔𝑖 𝒙 = 0, 𝑖 = 1, … , π‘š
β€’ We define π‘š basic variables in terms of 𝑛 βˆ’ π‘š nonbasic variables;
let 𝒙𝑇
= π’šπ‘‡
, 𝒛𝑇
, where π’š are basic and 𝒛 are nonbasic.
β€’ The gradient vector is partitioned as: 𝛻𝑓𝑇 = 𝛻𝑓 π’š 𝑇, 𝛻𝑓 𝒛 𝑇 .
β€’ The variations in the objective and constraint functions are:
𝑑𝑓 = 𝛻𝑓 π’š 𝑇
π‘‘π’š + 𝛻𝑓 𝒛 𝑇
𝑑𝒛
π‘‘π’ˆ =
πœ•πœ“
πœ•π’š
π‘‘π’š +
πœ•πœ“
πœ•π’›
𝑑𝒛 = 𝟎
where the matrices of partial derivatives are defined as:
πœ•πœ“
πœ•π’š 𝑖𝑗
=
πœ•π‘”π‘–
πœ•π‘¦π‘—
;
πœ•πœ“
πœ•π’› 𝑖𝑗
=
πœ•π‘”π‘–
πœ•π‘§π‘—
Generalized Reduced Gradient
β€’ Since
πœ•πœ“
πœ•π’š
is a square π‘š Γ— π‘š matrix, we may solve for π‘‘π’š as:
π‘‘π’š = βˆ’
πœ•πœ“
πœ•π’š
βˆ’1 πœ•πœ“
πœ•π’›
, and substitute in 𝑑𝑓 to obtain:
𝑑𝑓 = 𝛻𝑓 𝒛 𝑇
𝑑𝒛 βˆ’ 𝛻𝑓 π’š 𝑇 πœ•πœ“
πœ•π’š
βˆ’1 πœ•πœ“
πœ•π’›
𝑑𝒛
β€’ Then the reduce gradient 𝛻𝑓𝑅 is defined as:
𝛻𝑓𝑅
𝑇
= 𝛻𝑓 𝒛 𝑇
βˆ’ 𝛻𝑓 π’š 𝑇 πœ•πœ“
πœ•π’š
βˆ’1 πœ•πœ“
πœ•π’›
β€’ Next, we choose negative of 𝛻𝑓𝑅
𝑇
as the search direction and
perform a line search to determine step size; then Δ𝒛 = βˆ’π›Όπ›»π‘“π‘…,
Ξ”π’š =
πœ•πœ“
πœ•π’š
βˆ’1 πœ•πœ“
πœ•π’›
Δ𝒛
GRG Algorithm
β€’ Initialize: choose 𝒙0; evaluate objective function and constraints;
convert binding inequality constraints to equality constraints.
β€’ Partition the variables into π‘š basic and 𝑛 βˆ’ π‘š nonbasic ones, e.g.,
choose first π‘š values, or π‘š highest values as basic variables.
β€’ Compute the 𝛻𝑓𝑅 along nonbasic variables. If 𝛻𝑓𝑅 = 0, exit.
β€’ Set Δ𝒛 = βˆ’π›»π‘“π‘…/ 𝛻𝑓𝑅 , Ξ”π’š = βˆ’
πœ•πœ“
πœ•π’š
βˆ’1 πœ•πœ“
πœ•π’›
Δ𝒛.
β€’ Do a line search along Δ𝒙 to obtain Ξ±.
β€’ Check feasibility at π’™π‘˜ + 𝛼Δ𝒙. If necessary, use Newton-Raphson
iterations to adjust Ξ”π’š as: Ξ”π’šπ‘˜+1 = Ξ”π’šπ‘˜ βˆ’
πœ•πœ“
πœ•π’š
βˆ’1
π‘”π‘˜
β€’ Update: π’™π‘˜+1 = π’™π‘˜ + 𝛼Δ𝒙
Generalized Reduced Gradient
β€’ Consider an equality constrained problem
Objective: min 𝑓 𝒙 = 3π‘₯1 + 2π‘₯2 + 2π‘₯1
2
βˆ’ π‘₯1π‘₯2 + 1.5π‘₯2
2
Subject to: 𝑔 𝒙 = π‘₯1
2
βˆ’ π‘₯2 βˆ’ 1 = 0
β€’ Let 𝒙0 =
βˆ’1
0
; then 𝑓0 = βˆ’1, 𝛻𝑓0 =
βˆ’1
3
, 𝑔0 = 0, 𝛻𝑔0 =
βˆ’2
βˆ’1
.
β€’ Let π’š = π‘₯2 on the first iteration; then 𝛻𝑓𝑅
𝑇
= βˆ’1 βˆ’ 3
βˆ’2
βˆ’1
= βˆ’7.
β€’ Let Δ𝒛 = 1, then Ξ”π’š =
βˆ’2
βˆ’1
1 = 2. By doing a line search along
Δ𝒙 =
0.333
0.667
, we obtain 𝒙1
=
βˆ’0.350
βˆ’0.577
, 𝑓1
= βˆ’2.13.
β€’ The optimum is reached in three iterations: π’™βˆ— =
βˆ’0.634
βˆ’0.598
,
𝑓 π’™βˆ— = βˆ’2.137.
Generalized Reduced Gradient
β€’ Consider an inequality constrained problem:
Objective: min 𝑓 𝒙 = π‘₯1
2
+ π‘₯2
Subject to: 𝑔1 𝒙 = π‘₯1
2
+ π‘₯2
2
βˆ’ 9 ≀ 0, π‘₯1 + π‘₯2 βˆ’ 1 ≀ 0
β€’ Add slack variable to inequality constraints:
𝑔1 𝒙 = π‘₯1
2
+ π‘₯2
2
βˆ’ 9 + 𝑠1 = 0, 𝑔2 𝒙 = π‘₯1 + π‘₯2 βˆ’ 1 + 𝑠2 = 0
Then 𝛻𝑓 𝒙 =
2π‘₯1
1
; 𝛻𝑔1 𝒙 =
2π‘₯1
2π‘₯2
; 𝛻𝑔2 𝒙 =
1
1
β€’ Let 𝒙0 =
2.56
βˆ’1.56
; then 𝑓0 = 4.99, 𝛻𝑓0 =
5.12
1
, π’ˆ0 =
βˆ’0.013
0
,
β€’ Since 𝑔2 is binding, add 𝑠2 to variables: 𝛻𝑓0
=
5.12
1
1
, 𝛻𝑔2
0
=
1
1
1
Generalized Reduced Gradient
β€’ Let 𝑦 = π‘₯1, 𝒛 =
π‘₯2
𝑠2
; then 𝛻𝑓 𝑦 = 5.12, 𝛻𝑓 𝒛 =
1
0
, 𝛻𝑔2 𝑦 = 1,
𝛻𝑔2 𝒛 =
1
1
, therefore 𝛻𝑓𝑅 𝒛 =
1
0
βˆ’
1
1
5.12 =
βˆ’4.12
βˆ’5.12
β€’ Let Δ𝒛 = βˆ’π›»π‘“π‘… 𝒛 , Δ𝑦 = βˆ’[1 1]Δ𝒛 = βˆ’9.24; then, Δ𝒙 =
βˆ’9.24
4.12
and
𝒔0 = Δ𝒙/ Δ𝒙 . Suppose we limit the maximum step size to 𝛼 ≀ 0.5,
then 𝒙1 = 𝒙0 + 0.5𝒔0 =
2.103
βˆ’1.356
with 𝑓 π‘₯1 = 𝑓1 = 3.068. There are
no constraint violations, hence first iteration is completed.
β€’ After seven iterations: 𝒙7 =
0.003
βˆ’3.0
with 𝑓7 = βˆ’3.0
β€’ The optimum is at: π’™βˆ— =
0.0
βˆ’3.0
with π‘“βˆ— = βˆ’3.0
GRG for LP Problems
β€’ Consider an LP problem: min 𝑓(𝒙) = 𝒄𝑇
𝒙
Subject to: 𝑨𝒙 = 𝒃, 𝒙 β‰₯ 𝟎
β€’ Let 𝒙 be partitioned into π‘š basic variables and 𝑛 βˆ’ π‘š nonbasic
variables: 𝒙𝑇
= [π’šπ‘‡
, 𝒛𝑇
].
β€’ The objective function is partitioned as: 𝑓 𝒙 = 𝒄𝑦
𝑇
π’š + 𝒄𝑧
𝑇
𝒛
β€’ The constraints are partitioned as: π‘©π’š + 𝑡𝒛 = 𝒃, π’š β‰₯ 𝟎, 𝒛 β‰₯ 𝟎.
Then π’š = π‘©βˆ’1𝒃 βˆ’ π‘©βˆ’1𝑡𝒛
β€’ The objective function in terms of independent variables is:
𝑓 𝒛 = 𝒄𝑦
𝑇
π‘©βˆ’1
𝒃𝒛 + (𝒄𝑧
𝑇
βˆ’ 𝒄𝑦
𝑇
π‘©βˆ’1
𝑡)𝒛
β€’ The reduced costs for nonbasic variables are given as:
𝒓𝑐
𝑇
= 𝒄𝑧
𝑇
βˆ’ 𝒄𝑦
𝑇
π‘©βˆ’1
𝑡, or 𝒓𝑐
𝑇
= 𝒄𝑧
𝑇
βˆ’ 𝝀𝑇
𝑡
GRG for LP Problems
β€’ Using Tableu notation, the reduced costs are computed as:
𝑩 𝑡 𝒃
π’„π’š
𝑇 𝒄𝑧
𝑇 0 β†’
𝑰 π‘©βˆ’1𝑡 π‘©βˆ’1𝒃
𝟎 𝒓𝑐
𝑇 βˆ’π’„π’š
π‘‡π‘©βˆ’1𝒃
β€’ The objective function variation is given as:
𝑑𝑓 = π›»π‘“π’š
π‘‡π‘‘π’š + 𝛻𝑓𝒛
𝑇𝑑𝒛
β€’ The reduced gradient along the constraint surface is given as:
𝛻𝑓𝑅
𝑇
= 𝛻𝒛𝑓𝑇
βˆ’ π›»π’šπ‘“π‘‡
π‘©βˆ’1
𝑡 = 𝒓𝑐
𝑇
GRG Algorithm for LP Problems
1. Choose the largest π‘š components of 𝒙 as basic variables
2. Compute the reduced gradient 𝛻𝑓𝑅
𝑇
= 𝒓𝑐
𝑇
3. Let Δ𝑧𝑖 =
βˆ’π‘Ÿπ‘– 𝑖𝑓 π‘Ÿπ‘– ≀ 0
βˆ’π‘₯π‘–π‘Ÿπ‘– 𝑖𝑓 π‘Ÿπ‘– > 0
4. If Δ𝒛 = 0, stop; otherwise set Ξ”π’š = π‘©βˆ’1
𝑡Δ𝒛
5. Compute step size: let 𝛼1 = max 𝛼: π’š + Ξ”π’š β‰₯ 0, 𝒛 + Δ𝒛 β‰₯ 0 ,
𝛼2 = min 𝑓 𝒙 + Δ𝒙 β‰₯ 0 , 𝛼 = min{𝛼1, 𝛼2}
6. Update: π’™π‘˜+1
= π’™π‘˜
+ 𝛼Δ𝒙
7. If 𝛼2 β‰₯ 𝛼1, update 𝑩, 𝑡 (use pivoting)
8. Return to 1
View publication stats
View publication stats

More Related Content

Similar to Optimum engineering design - Day 5. Clasical optimization methods

Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)Thatchaphol Saranurak
Β 
Coursera 2week
Coursera  2weekCoursera  2week
Coursera 2weekcsl9496
Β 
Tavas and pashmaks
Tavas and pashmaksTavas and pashmaks
Tavas and pashmaksminsu kim
Β 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxgnans Kgnanshek
Β 
A short introduction to Quantum Computing and Quantum Cryptography
A short introduction to Quantum Computing and Quantum CryptographyA short introduction to Quantum Computing and Quantum Cryptography
A short introduction to Quantum Computing and Quantum CryptographyFacultad de InformΓ‘tica UCM
Β 
Applied Algorithms and Structures week999
Applied Algorithms and Structures week999Applied Algorithms and Structures week999
Applied Algorithms and Structures week999fashiontrendzz20
Β 
10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptx10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptxshyedshahriar
Β 
Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Asafak Husain
Β 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!ChenYiHuang5
Β 
Coordinate Descent method
Coordinate Descent methodCoordinate Descent method
Coordinate Descent methodSanghyuk Chun
Β 
Quantum factorization.pdf
Quantum factorization.pdfQuantum factorization.pdf
Quantum factorization.pdfssuser8b461f
Β 
PO_groupVI_Term assignment.pptx
PO_groupVI_Term assignment.pptxPO_groupVI_Term assignment.pptx
PO_groupVI_Term assignment.pptxPATELPUNIT2
Β 
Intro. to computational Physics ch2.pdf
Intro. to computational Physics ch2.pdfIntro. to computational Physics ch2.pdf
Intro. to computational Physics ch2.pdfJifarRaya
Β 
Fortran chapter 2.pdf
Fortran chapter 2.pdfFortran chapter 2.pdf
Fortran chapter 2.pdfJifarRaya
Β 
Intro to statistical signal processing
Intro to statistical signal processingIntro to statistical signal processing
Intro to statistical signal processingNadav Carmel
Β 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
Β 

Similar to Optimum engineering design - Day 5. Clasical optimization methods (20)

Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)
Β 
Coursera 2week
Coursera  2weekCoursera  2week
Coursera 2week
Β 
Tavas and pashmaks
Tavas and pashmaksTavas and pashmaks
Tavas and pashmaks
Β 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptx
Β 
A short introduction to Quantum Computing and Quantum Cryptography
A short introduction to Quantum Computing and Quantum CryptographyA short introduction to Quantum Computing and Quantum Cryptography
A short introduction to Quantum Computing and Quantum Cryptography
Β 
Applied Algorithms and Structures week999
Applied Algorithms and Structures week999Applied Algorithms and Structures week999
Applied Algorithms and Structures week999
Β 
10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptx10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptx
Β 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
Β 
Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]
Β 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
Β 
Coordinate Descent method
Coordinate Descent methodCoordinate Descent method
Coordinate Descent method
Β 
Quantum factorization.pdf
Quantum factorization.pdfQuantum factorization.pdf
Quantum factorization.pdf
Β 
Curve fitting
Curve fittingCurve fitting
Curve fitting
Β 
Curve fitting
Curve fittingCurve fitting
Curve fitting
Β 
Scalable k-means plus plus
Scalable k-means plus plusScalable k-means plus plus
Scalable k-means plus plus
Β 
PO_groupVI_Term assignment.pptx
PO_groupVI_Term assignment.pptxPO_groupVI_Term assignment.pptx
PO_groupVI_Term assignment.pptx
Β 
Intro. to computational Physics ch2.pdf
Intro. to computational Physics ch2.pdfIntro. to computational Physics ch2.pdf
Intro. to computational Physics ch2.pdf
Β 
Fortran chapter 2.pdf
Fortran chapter 2.pdfFortran chapter 2.pdf
Fortran chapter 2.pdf
Β 
Intro to statistical signal processing
Intro to statistical signal processingIntro to statistical signal processing
Intro to statistical signal processing
Β 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
Β 

More from SantiagoGarridoBulln

Genetic Algorithms. Algoritmos GenΓ©ticos y cΓ³mo funcionan.
Genetic Algorithms. Algoritmos GenΓ©ticos y cΓ³mo funcionan.Genetic Algorithms. Algoritmos GenΓ©ticos y cΓ³mo funcionan.
Genetic Algorithms. Algoritmos GenΓ©ticos y cΓ³mo funcionan.SantiagoGarridoBulln
Β 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationSantiagoGarridoBulln
Β 
OptimumEngineeringDesign-Day2a.pdf
OptimumEngineeringDesign-Day2a.pdfOptimumEngineeringDesign-Day2a.pdf
OptimumEngineeringDesign-Day2a.pdfSantiagoGarridoBulln
Β 
OptimumEngineeringDesign-Day-1.pdf
OptimumEngineeringDesign-Day-1.pdfOptimumEngineeringDesign-Day-1.pdf
OptimumEngineeringDesign-Day-1.pdfSantiagoGarridoBulln
Β 
CI_L02_Optimization_ag2_eng.pdf
CI_L02_Optimization_ag2_eng.pdfCI_L02_Optimization_ag2_eng.pdf
CI_L02_Optimization_ag2_eng.pdfSantiagoGarridoBulln
Β 
Lecture_Slides_Mathematics_06_Optimization.pdf
Lecture_Slides_Mathematics_06_Optimization.pdfLecture_Slides_Mathematics_06_Optimization.pdf
Lecture_Slides_Mathematics_06_Optimization.pdfSantiagoGarridoBulln
Β 
OptimumEngineeringDesign-Day7.pdf
OptimumEngineeringDesign-Day7.pdfOptimumEngineeringDesign-Day7.pdf
OptimumEngineeringDesign-Day7.pdfSantiagoGarridoBulln
Β 
CI_L11_Optimization_ag2_eng.pptx
CI_L11_Optimization_ag2_eng.pptxCI_L11_Optimization_ag2_eng.pptx
CI_L11_Optimization_ag2_eng.pptxSantiagoGarridoBulln
Β 
CI L11 Optimization 3 GlobalOptimization.pdf
CI L11 Optimization 3 GlobalOptimization.pdfCI L11 Optimization 3 GlobalOptimization.pdf
CI L11 Optimization 3 GlobalOptimization.pdfSantiagoGarridoBulln
Β 
complete-manual-of-multivariable-optimization.pdf
complete-manual-of-multivariable-optimization.pdfcomplete-manual-of-multivariable-optimization.pdf
complete-manual-of-multivariable-optimization.pdfSantiagoGarridoBulln
Β 
slides-linear-programming-introduction.pdf
slides-linear-programming-introduction.pdfslides-linear-programming-introduction.pdf
slides-linear-programming-introduction.pdfSantiagoGarridoBulln
Β 

More from SantiagoGarridoBulln (15)

Genetic Algorithms. Algoritmos GenΓ©ticos y cΓ³mo funcionan.
Genetic Algorithms. Algoritmos GenΓ©ticos y cΓ³mo funcionan.Genetic Algorithms. Algoritmos GenΓ©ticos y cΓ³mo funcionan.
Genetic Algorithms. Algoritmos GenΓ©ticos y cΓ³mo funcionan.
Β 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Β 
OptimumEngineeringDesign-Day2a.pdf
OptimumEngineeringDesign-Day2a.pdfOptimumEngineeringDesign-Day2a.pdf
OptimumEngineeringDesign-Day2a.pdf
Β 
OptimumEngineeringDesign-Day-1.pdf
OptimumEngineeringDesign-Day-1.pdfOptimumEngineeringDesign-Day-1.pdf
OptimumEngineeringDesign-Day-1.pdf
Β 
CI_L01_Optimization.pdf
CI_L01_Optimization.pdfCI_L01_Optimization.pdf
CI_L01_Optimization.pdf
Β 
CI_L02_Optimization_ag2_eng.pdf
CI_L02_Optimization_ag2_eng.pdfCI_L02_Optimization_ag2_eng.pdf
CI_L02_Optimization_ag2_eng.pdf
Β 
Lecture_Slides_Mathematics_06_Optimization.pdf
Lecture_Slides_Mathematics_06_Optimization.pdfLecture_Slides_Mathematics_06_Optimization.pdf
Lecture_Slides_Mathematics_06_Optimization.pdf
Β 
OptimumEngineeringDesign-Day7.pdf
OptimumEngineeringDesign-Day7.pdfOptimumEngineeringDesign-Day7.pdf
OptimumEngineeringDesign-Day7.pdf
Β 
CI_L11_Optimization_ag2_eng.pptx
CI_L11_Optimization_ag2_eng.pptxCI_L11_Optimization_ag2_eng.pptx
CI_L11_Optimization_ag2_eng.pptx
Β 
CI L11 Optimization 3 GlobalOptimization.pdf
CI L11 Optimization 3 GlobalOptimization.pdfCI L11 Optimization 3 GlobalOptimization.pdf
CI L11 Optimization 3 GlobalOptimization.pdf
Β 
optmizationtechniques.pdf
optmizationtechniques.pdfoptmizationtechniques.pdf
optmizationtechniques.pdf
Β 
complete-manual-of-multivariable-optimization.pdf
complete-manual-of-multivariable-optimization.pdfcomplete-manual-of-multivariable-optimization.pdf
complete-manual-of-multivariable-optimization.pdf
Β 
slides-linear-programming-introduction.pdf
slides-linear-programming-introduction.pdfslides-linear-programming-introduction.pdf
slides-linear-programming-introduction.pdf
Β 
bv_cvxslides (1).pdf
bv_cvxslides (1).pdfbv_cvxslides (1).pdf
bv_cvxslides (1).pdf
Β 
Optim_methods.pdf
Optim_methods.pdfOptim_methods.pdf
Optim_methods.pdf
Β 

Recently uploaded

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
Β 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
Β 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
Β 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
Β 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
Β 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
Β 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
Β 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
Β 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
Β 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
Β 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
Β 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
Β 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
Β 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
Β 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
Β 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
Β 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
Β 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
Β 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
Β 

Recently uploaded (20)

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
Β 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Β 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
Β 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
Β 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
Β 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
Β 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
Β 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
Β 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
Β 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
Β 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
Β 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
Β 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
Β 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
Β 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
Β 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
Β 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
Β 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
Β 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Β 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Β 

Optimum engineering design - Day 5. Clasical optimization methods

  • 2. Course Materials β€’ Arora, Introduction to Optimum Design, 3e, Elsevier, (https://www.researchgate.net/publication/273120102_Introductio n_to_Optimum_design) β€’ Parkinson, Optimization Methods for Engineering Design, Brigham Young University (http://apmonitor.com/me575/index.php/Main/BookChapters) β€’ Iqbal, Fundamental Engineering Optimization Methods, BookBoon (https://bookboon.com/en/fundamental-engineering-optimization- methods-ebook)
  • 3. Numerical Optimization β€’ Consider an unconstrained NP problem: min 𝒙 𝑓 𝒙 β€’ Use an iterative method to solve the problem: π’™π‘˜+1 = π’™π‘˜ + π›Όπ‘˜π’…π‘˜, where π’…π‘˜ is a search direction and π›Όπ‘˜ is the step size, such that the function value decreases at each step, i.e., 𝑓 π’™π‘˜+1 < 𝑓 π’™π‘˜ β€’ We expect lim π‘˜β†’βˆž π’™π‘˜ = π’™βˆ— β€’ The general iterative method is a two-step process: – Finding a suitable search direction π’…π‘˜ along which the function value locally decreases and any constraints are obeyed. – Performing line search along π’…π‘˜ to find π’™π‘˜+1 such that 𝑓 π’™π‘˜+1 attains its minimum value.
  • 4. The Iterative Method β€’ Iterative algorithm: 1. Initialize: chose 𝒙0 2. Check termination: 𝛻𝑓 π’™π‘˜ β‰… 0 3. Find a suitable search direction π’…π‘˜ , that obeys the descent condition: 𝛻𝑓 π’™π‘˜ 𝑇 π’…π‘˜ < 0 4. Search along π’…π‘˜ to find where 𝑓 π’™π‘˜+1 attains minimum value (line search problem) 5. Return to step 2
  • 5. The Line Search Problem β€’ Assuming a suitable search direction π’…π‘˜ has been determined, we seek to determine a step length π›Όπ‘˜, that minimizes 𝑓 π’™π‘˜+1 . β€’ Assuming π’™π‘˜ and π’…π‘˜ are known, the projected function value along π’…π‘˜ is expressed as: 𝑓 π’™π‘˜ + π›Όπ‘˜π’…π‘˜ = 𝑓 π’™π‘˜ + π›Όπ’…π‘˜ = 𝑓(𝛼) β€’ The line search problem to choose 𝛼 to minimize 𝑓 π’™π‘˜+1 along π’…π‘˜ is defined as: min 𝛼 𝑓(𝛼) = 𝑓 π’™π‘˜ + Ξ±π’…π‘˜ β€’ Assuming that a solution exists, it is found by setting 𝑓′ 𝛼 = 0.
  • 6. Example: Quadratic Function β€’ Consider minimizing a quadratic function: 𝑓 𝒙 = 1 2 𝒙𝑇𝑨𝒙 βˆ’ 𝒃𝑇𝒙, 𝛻𝑓 = 𝑨𝒙 βˆ’ 𝒃 β€’ Given a descent direction 𝒅, the line search problem is defined as: min 𝛼 𝑓(𝛼) = π’™π‘˜ + 𝛼𝒅 𝑇 𝑨 π’™π‘˜ + 𝛼𝒅 βˆ’ 𝒃𝑇 π’™π‘˜ + 𝛼𝒅 β€’ A solution is found by setting 𝑓′ 𝛼 = 0, where 𝑓′ 𝛼 = 𝒅𝑇𝑨 π’™π‘˜ + 𝛼𝒅 βˆ’ 𝒅𝑇𝒃 = 0 𝛼 = βˆ’ 𝒅𝑇 π‘¨π’™π‘˜ βˆ’ 𝒃 𝒅𝑇𝑨𝒅 = βˆ’ 𝛻𝑓(π’™π‘˜ )𝑇 𝒅 𝒅𝑇𝑨𝒅 β€’ Finally, π’™π‘˜+1 = π’™π‘˜ + 𝛼𝒅.
  • 7. Computer Methods for Line Search Problem β€’ Interval reduction methods – Golden search – Fibonacci search β€’ Approximate search methods – Arjimo’s rule – Quadrature curve fitting
  • 8. Interval Reduction Methods β€’ The interval reduction methods find the minimum of a unimodal function in two steps: – Bracketing the minimum to an interval – Reducing the interval to desired accuracy β€’ The bracketing step aims to find a three-point pattern, such that for π‘₯1, π‘₯2, π‘₯3, 𝑓 π‘₯1 β‰₯ 𝑓 π‘₯2 < 𝑓 π‘₯3 .
  • 9. Fibonacci’s Method β€’ The Fibonacci’s method uses Fibonacci numbers to achieve maximum interval reduction in a given number of steps. β€’ The Fibonacci number sequence is generated as: 𝐹0 = 𝐹1 = 1, 𝐹𝑖 = πΉπ‘–βˆ’1 + πΉπ‘–βˆ’2, 𝑖 β‰₯ 2. β€’ The properties of Fibonacci numbers include: – They achieve the golden ratio 𝜏 = lim π‘›β†’βˆž πΉπ‘›βˆ’1 𝐹𝑛 = 5βˆ’1 2 β‰… 0.618034 – The number of interval reductions 𝑛 required to achieve a desired accuracy πœ€ (where 1/𝐹𝑛 < πœ€) is specified in advance. – For given 𝐼1 and 𝑛, 𝐼2 = πΉπ‘›βˆ’1 𝐹𝑛 𝐼1, 𝐼3 = 𝐼1 βˆ’ 𝐼2, 𝐼4 = 𝐼2 βˆ’ 𝐼3, etc.
  • 10. The Golden Section Method β€’ The golden section method uses the golden ratio: 𝜏 = 0.618034. β€’ The golden section algorithm is given as: 1. Initialize: specify π‘₯1, π‘₯4 𝐼1 = π‘₯4 βˆ’ π‘₯1 , πœ€, 𝑛: πœπ‘› < πœ€ 𝐼1 2. Compute π‘₯2 = 𝜏π‘₯1 + 1 βˆ’ 𝜏 π‘₯4, evaluate 𝑓2 3. For 𝑖 = 1, … , 𝑛 βˆ’ 1 Compute π‘₯3 = 1 βˆ’ 𝜏 π‘₯1 + 𝜏π‘₯4, evaluate 𝑓3; if 𝑓2 < 𝑓3, set π‘₯4 ← π‘₯1, π‘₯1 ← π‘₯3; else set π‘₯1 ← π‘₯2, π‘₯2 ← π‘₯3, 𝑓2 ← 𝑓3
  • 11. Approximate Search Methods β€’ Consider the line search problem: min 𝛼 𝑓(𝛼) = 𝑓 π’™π‘˜ + Ξ±π’…π‘˜ β€’ Sufficient Descent Condition. The sufficient descent condition guards against π’…π‘˜ becoming too close to 𝛻𝑓 π’™π‘˜ . The condition is stated as: 𝛻𝑓 π’™π‘˜ 𝑇 π’…π‘˜ < βˆ’π‘ 𝛻𝑓 π’™π‘˜ 2 , 𝑐 > 0 β€’ Sufficient Decrease Condition. The sufficient decrease condition ensures a nontrivial reduction in the function value. The condition is stated as: 𝑓 π’™π‘˜ + π›Όπ’…π‘˜ βˆ’ 𝑓 π’™π‘˜ ≀ πœ‡ 𝛼 𝛻𝑓 π’™π‘˜ 𝑇 π’…π‘˜, 0 < πœ‡ < 1 β€’ Curvature Condition. The curvature condition guards against 𝛼 becoming too small. The condition is stated as: 𝑓 π’™π‘˜ + π›Όπ’…π‘˜ 𝑇 π’…π‘˜ β‰₯ 𝑓 π’™π‘˜ + πœ‚ 𝛻𝑓 π’™π‘˜ 𝑇 π’…π‘˜ , 0 < πœ‡ < πœ‚ < 1
  • 12. Approximate Line Search β€’ Strong Wolfe Conditions. The strong Wolfe conditions commonly used by all line search algorithms include: 1. The sufficient decrease condition (Arjimo’s rule): 𝑓 𝛼 ≀ 𝑓 0 + πœ‡π›Όπ‘“β€² (0), 0 < πœ‡ < 1 2. Strong curvature condition: 𝑓′ 𝛼 ≀ πœ‚ 𝑓′ 0 , 0 < πœ‡ ≀ πœ‚ < 1
  • 13. Approximate Line Search β€’ The approximate line search includes two steps: – Bracketing the minimum – Estimating the minimum β€’ Bracketing the Minimum. In the bracketing step we seek an interval 𝛼, 𝛼 such that 𝑓′ 𝛼 < 0 and 𝑓′ 𝛼 > 0. – Since for any descent direction, 𝑓′ 0 < 0, therefore, 𝛼 = 0 serves as a lower bound on 𝛼. To find an upper bound, gradually increase 𝛼, e.g., 𝛼 = 1,2, …, – Assume that for some 𝛼𝑖 > 0, we get 𝑓′ 𝛼𝑖 < 0 and 𝑓′ 𝛼𝑖+1 > 0; then, 𝛼𝑖 serves as an upper bound.
  • 14. Approximate Line Search β€’ Estimating the Minimum. Once the minimum has been bracketed to a small interval, a quadratic or cubic polynomial approximation is used to find the minimizer. β€’ If the polynomial minimizer 𝛼 satisfies strong Wolfe’s conditions for the desired πœ‡ and πœ‚ values (say πœ‡ = 0.2, πœ‚ = 0.5), it is taken as the function minimizer. β€’ Otherwise, 𝛼 is used to replace one of the 𝛼 or 𝛼, and the polynomial approximation step repeated.
  • 15. Quadratic Curve Fitting β€’ Assuming that the interval 𝛼𝑙, 𝛼𝑒 contains the minimum of a unimodal function, 𝑓 𝛼 , its quadratic approximation, given as: π‘ž 𝛼 = π‘Ž0 + π‘Ž1𝛼 + π‘Ž2𝛼2 , is obtained using three points 𝛼𝑙, π›Όπ‘š, 𝛼𝑒 , where the mid-point may be used for π›Όπ‘š The quadratic coefficients {π‘Ž0, π‘Ž1, π‘Ž2} are solved as: π‘Ž2 = 1 π›Όπ‘’βˆ’π›Όπ‘š 𝑓 𝛼𝑒 βˆ’π‘“ 𝛼𝑙 π›Όπ‘’βˆ’π›Όπ‘™ βˆ’ 𝑓 π›Όπ‘š βˆ’π‘“ 𝛼𝑙 π›Όπ‘šβˆ’π›Όπ‘™ π‘Ž1 = 1 π›Όπ‘šβˆ’π›Όπ‘™ 𝑓 π›Όπ‘š βˆ’ 𝑓 𝛼𝑙 βˆ’ π‘Ž2(𝛼𝑙 + π›Όπ‘š) π‘Ž0 = 𝑓(𝛼𝑙) βˆ’ π‘Ž1𝛼𝑙 βˆ’ π‘Ž2𝛼𝑙 2 Then, the minimum is given as: π›Όπ‘šπ‘–π‘› = βˆ’ π‘Ž1 2π‘Ž2
  • 16. Example: Approximate Search β€’ Let 𝑓 𝛼 = π‘’βˆ’π›Ό + 𝛼2, 𝑓′ 𝛼 = 2𝛼 βˆ’ π‘’βˆ’π›Ό, 𝑓 0 = 1, 𝑓′ 0 = βˆ’1. Let πœ‡ = 0.2, and try 𝛼 = 0.1, 0.2, …, to bracket the minimum. β€’ From the sufficient decrease condition, the minimum is bracketed in the interval: [0, 0.5] β€’ Using quadratic approximation, the minimum is found as: π‘₯βˆ— = 0.3531 The exact solution is given as: π›Όπ‘šπ‘–π‘› = 0.3517 β€’ The Matlab commands are: Define the function: f=@(x) x.*x+exp(-x); mu=0.2; al=0:.1:1;
  • 17. Example: Approximate Search β€’ Bracketing the minimum: f1=feval(f,al) 1.0000 0.9148 0.8587 0.8308 0.8303 0.8565 0.9088 0.9866 1.0893 1.2166 1.3679 >> f2=f(0)-mu*al 1.0000 0.9800 0.9600 0.9400 0.9200 0.9000 0.8800 0.8600 0.8400 0.8200 0.8000 >> idx=find(f1<=f2) β€’ Quadratic approximation to find the minimum: al=0; am=0.25; au=0.5; a2 = ((f(au)-f(al))/(au-al)-(f(am)-f(al))/(am-al))/(au-am); a1 = (f(am)-f(al))/(am-al)-a2*(al+am); xmin = -a1/a2/2 % 0.3531
  • 18. Computer Methods for Finding the Search Direction β€’ Gradient based methods – Steepest descent method – Conjugate gradient method – Quasi Newton methods β€’ Hessian based methods – Newton’s method – Trust region methods
  • 19. Steepest Descent Method β€’ The steepest descent method determines the search direction as: π’…π‘˜ = βˆ’π›»π‘“(π’™π‘˜), β€’ The update rule is given as: π’™π‘˜+1 = π’™π‘˜ βˆ’ π›Όπ‘˜ βˆ™ 𝛻𝑓(π’™π‘˜ ) where π›Όπ‘˜ is determined by minimizing 𝑓(π’™π‘˜+1) along π’…π‘˜ β€’ Example: quadratic function 𝑓 𝒙 = 1 2 𝒙𝑇 𝑨𝒙 βˆ’ 𝒃𝑇 𝒙, 𝛻𝑓 = 𝑨𝒙 βˆ’ 𝒃 Then, π’™π‘˜+1 = π’™π‘˜ βˆ’ 𝛼 βˆ™ 𝛻𝑓 π’™π‘˜ ; 𝛼 = 𝛻 𝑓 π’™π‘˜ 𝑇 𝛻 𝑓 π’™π‘˜ 𝛻 𝑓 π’™π‘˜ 𝑇 𝐀𝛻 𝑓 π’™π‘˜ Define π’“π‘˜ = 𝒃 βˆ’ π‘¨π’™π‘˜ Then, π’™π‘˜+1 = π’™π‘˜ + π›Όπ‘˜π’“π‘˜; π›Όπ‘˜ = π’“π‘˜ 𝑇 π’“π‘˜ π’“π‘˜ π‘‡π΄π’“π‘˜
  • 20. Steepest Descent Algorithm β€’ Initialize: choose 𝒙0 β€’ For π‘˜ = 0,1,2, … – Compute 𝛻𝑓(π’™π‘˜ ) – Check convergence: if 𝛻𝑓(π’™π‘˜ ) < πœ–, stop. – Set π’…π‘˜ = βˆ’π›»π‘“(π’™π‘˜) – Line search problem: Find min 𝛼β‰₯0 𝑓 π’™π‘˜ + π›Όπ’…π‘˜ – Set π’™π‘˜+1 = π’™π‘˜ + π›Όπ’…π‘˜ .
  • 21. Example: Steepest Descent β€’ Consider min 𝒙 𝑓 𝒙 = 0.1π‘₯1 2 + π‘₯2 2 , 𝛻𝑓 𝒙 = 0.2π‘₯1 2π‘₯2 , 𝛻2𝑓 π‘₯ = 0.1 0 0 1 ; let 𝒙0 = 5 1 , then, 𝑓 𝒙0 = 3.5, 𝑑1 = βˆ’π›»π‘“ 𝒙0 = βˆ’1 βˆ’2 , 𝛼 = 0.61 𝒙1 = 4.39 βˆ’0.22 , 𝑓 𝒙1 = 1.98 Continuing..
  • 22. Example: Steepest Descent β€’ MATLAB code: H=[.2 0;0 2]; f=@(x) x'*H*x/2; df=@(x) H*x; ddf=H; x=[5;1]; xall=x'; for i=1:10 d=-df(x); a=d'*d/(d'*H*d); x=x+a*d; xall=[xall;x']; end plot(xall(:,1),xall(:,2)), grid axis([-1 5 -1 5]), axis equal
  • 23. Steepest Descent Method β€’ The steepest descent method becomes slow close to the optimum β€’ The method progresses in a zigzag fashion, since 𝑑 𝑑𝛼 𝑓 π’™π‘˜ + π›Όπ’…π‘˜ = 𝛻 𝑓 π’™π‘˜+1 𝑇 π’…π‘˜ = βˆ’π›» 𝑓 π’™π‘˜+1 𝑇 𝛻 𝑓 π’™π‘˜ = 0 β€’ The method has linear convergence with rate constant 𝐢 = 𝑓 π’™π‘˜+1 βˆ’π‘“ π’™βˆ— 𝑓 π’™π‘˜ βˆ’π‘“ π’™βˆ— ≀ π‘π‘œπ‘›π‘‘ 𝑨 βˆ’1 π‘π‘œπ‘›π‘‘ 𝑨 +1 2
  • 24. Preconditioning β€’ Preconditioning (scaling) can be used to reduce the condition number of the Hessian matrix and hence aid convergence β€’ Consider 𝑓 𝒙 = 0.1π‘₯1 2 + π‘₯2 2 = 𝒙𝑇 𝑨𝒙, where 𝑨 = π‘‘π‘–π‘Žπ‘”(0.1, 1) β€’ Define a linear transformation: 𝒙 = π‘·π’š, where 𝑷 = π‘‘π‘–π‘Žπ‘”( 10, 1); then, 𝑓 𝒙 = π’šπ‘‡ 𝑷𝑇 π‘¨π‘·π’š = π’šπ‘‡ π’š β€’ Since π‘π‘œπ‘›π‘‘ 𝑰 = 1, the steepest descent method in the case of a quadratic function converges in a single iteration
  • 25. Conjugate Gradient Method β€’ For any square matrix 𝑨, the set of 𝑨-conjugate vectors is defined by: 𝒅𝑖𝑇 𝑨𝒅𝑗 = 0, 𝑖 β‰  𝑗 β€’ Let π’ˆπ‘˜ = 𝛻 𝑓 π’™π‘˜ denote the gradient; then, starting from 𝒅0 = βˆ’π’ˆ0, a set of 𝑨-conjugate directions is generated as: 𝒅0 = βˆ’π’ˆ0; π’…π‘˜+1 = βˆ’π’ˆπ‘˜+1 + π›½π‘˜π’…π‘˜ π‘˜ β‰₯ 0, … where π›½π‘˜ = π’ˆπ‘˜+1 𝑇 π‘¨π’…π‘˜ π’…π‘˜π‘‡ π‘¨π’…π‘˜ There are multiple ways to generate conjugate directions β€’ Using {𝒅0 , 𝒅2 , … , π’…π‘›βˆ’1 } as search directions, a quadratic function is minimized in 𝑛 steps.
  • 26. Conjugate Directions Method β€’ The parameter π›½π‘˜ can be computed in different ways: – By substituting π‘¨π’…π‘˜ = 1 π›Όπ‘˜ (π’ˆπ‘˜+1 βˆ’ π’ˆπ‘˜), we obtain: π›½π‘˜ = π’ˆπ‘˜+1 𝑇 (π’ˆπ‘˜+1βˆ’π’ˆπ‘˜) π’…π‘˜π‘‡ (π’ˆπ‘˜+1βˆ’π’ˆπ‘˜) (the Hestenes-Stiefel formula) – In the case of exact line search, π‘”π‘˜+1 𝑇 π’…π‘˜ = 0; then π›½π‘˜ = π’ˆπ‘˜+1 𝑇 (π’ˆπ‘˜+1βˆ’π’ˆπ‘˜) π’ˆπ‘˜ π‘‡π’ˆπ‘˜ (the Polak-Ribiere formula) – Also, for exact line search π’ˆπ‘˜+1 𝑇 π’ˆπ‘˜ = π›½π‘˜βˆ’1(π’ˆπ‘˜ + π›Όπ‘˜π‘¨π’…π‘˜ )𝑇 π’…π‘˜βˆ’1 = 0, resulting in π›½π‘˜ = π’ˆπ‘˜+1 𝑇 π’ˆπ‘˜+1 π’ˆπ‘˜ π‘‡π’ˆπ‘˜ (the Fletcher-Reeves formula) Other versions of π›½π‘˜ have also been proposed.
  • 27. Example: Conjugate Gradient Method β€’ Consider min 𝒙 𝑓 𝒙 = 0.1π‘₯1 2 + π‘₯2 2 , 𝛻𝑓 𝒙 = 0.2π‘₯1 2π‘₯2 , 𝛻2𝑓 π‘₯ = 0.1 0 0 1 ; let 𝒙0 = 5 1 , then 𝑓 𝒙0 = 3.5, 𝑑0 = βˆ’ 𝛻𝑓 𝒙0 = βˆ’1 βˆ’2 , 𝛼 = 0.61 𝒙1 = 4.39 βˆ’0.22 , 𝑓 𝒙1 = 1.98 𝛽0 = 0.19 𝑑1 = βˆ’0.535 0.027 , 𝛼 = 8.2 𝒙1 = 0 0
  • 28. Example: Conjugate Gradient Method β€’ MATLAB code H=[.2 0;0 2]; f=@(x) x'*H*x/2; df=@(x) H*x; ddf=H; x=[5;1]; n=2; xall=zeros(n+1,n); xall(1,:)=x'; d=-df(x); a=d'*d/(d'*H*d); x=x+a*d; xall(2,:)=x'; for i=1:size(x,1)-1 b=df(x)'*H*d/(d'*H*d); d=-df(x)+b*d; r=-df(x); a=r'*r/(d'*H*d); x=x+a*d; xall(i+2,:)=x'; end plot(xall(:,1),xall(:,2)), grid axis([-1 5 -1 5]), axis equal
  • 29. Conjugate Gradient Algorithm β€’ Conjugate-Gradient Algorithm (Griva, Nash & Sofer, p454): β€’ Initialize: Choose 𝒙0 = 𝟎, 𝒓0 = 𝒃, 𝒅(βˆ’1) = 0, 𝛽0 = 0. β€’ For 𝑖 = 0,1, … – Check convergence: if 𝒓𝑖 < πœ–, stop. – If 𝑖 > 0, set 𝛽𝑖 = 𝒓𝑖 𝑇 𝒓𝑖 π’“π‘–βˆ’1 𝑇 π’“π‘–βˆ’1 – Set 𝒅𝑖 = 𝒓𝑖 + π›½π‘–π’…π‘–βˆ’1 ; 𝛼𝑖 = 𝒓𝑖 𝑇 𝒓𝑖 𝒅𝑖𝑇 𝑨𝒅𝑖 ; 𝒙𝑖+1 = 𝒙𝑖 + 𝛼𝑖𝒅𝑖 ; 𝒓𝑖+1 = 𝒓𝑖 βˆ’ 𝛼𝑖𝑨𝒅𝑖.
  • 30. Conjugate Gradient Method β€’ Assume that an update that includes steps 𝛼𝑖 along 𝑛 conjugate vectors 𝒅𝑖 is assembled as: 𝑦 = 𝛼𝑖𝒅𝑖 𝑛 𝑖=1 . β€’ Then, for a quadratic function, the minimization problem is decomposed into a set of one-dimensional problems, i.e., min 𝑦 𝑓(π’š) ≑ min 𝛼𝑖 1 2 𝛼𝑖 2 𝒅𝑖𝑇 𝑨𝒅𝑖 βˆ’ 𝛼𝑖𝒃𝑇 𝒅𝑖 𝑛 𝑖=1 β€’ By setting the derivative with respect to 𝛼𝑖 equal to zero, i.e., 𝛼𝑖𝒅𝑖𝑇 𝑨𝒅𝑖 βˆ’ 𝒃𝑇 𝒅𝑖 = 0, we obtain: 𝛼𝑖 = 𝒃𝑇𝒅𝑖 𝒅𝑖𝑇 𝑨𝒅𝑖 . β€’ This shows that the CG algorithm iteratively determines the conjugate directions 𝒅𝑖 and their coefficients 𝛼𝑖.
  • 31. CG Rate of Convergence β€’ Conjugate gradient methods achieve superlinear convergence: – In the case of quadratic functions, the minimum is reached exactly in 𝑛 iterations. – For general nonlinear functions, convergence in 2𝑛 iterations is to be expected. β€’ Nonlinear CG methods typically have the lowest per iteration computational costs of all gradient methods.
  • 32. Newton’s Method β€’ Consider minimizing the second order approximation of 𝑓 𝒙 : min 𝒅 𝑓 π’™π‘˜ + Δ𝒙 = 𝑓 π’™π‘˜ + 𝛻𝑓 π’™π‘˜ 𝑇Δ𝒙 + 1 2 Ξ”π’™π‘‡π‘―π‘˜Ξ”π’™ β€’ Apply FONC: π‘―π‘˜π’… + π’ˆπ‘˜ = 𝟎, where π’ˆπ‘˜ = 𝛻𝑓 π’™π‘˜ Then, assuming that π‘―π‘˜ = 𝛻2 𝑓 π’™π‘˜ stays positive definite, the Newton’s update rule is derived as: π’™π‘˜+1 = π’™π‘˜ βˆ’ π‘―π‘˜ βˆ’1 π’ˆπ‘˜ β€’ Note: – The convergence of the Newton’s method is dependent on π‘―π‘˜ staying positive definite. – A step size may be included in the Newton’s method, i.e., π’™π‘˜+1 = π’™π‘˜ βˆ’ π›Όπ‘˜π‘―π‘˜ βˆ’1 π’ˆπ‘˜
  • 33. Marquardt Modification to Newton’s Method β€’ To ensure the positive definite condition on π‘―π‘˜, Marquardt proposed the following modification to Newton’s method: π‘―π‘˜ + πœ†π‘° 𝒅 = βˆ’π’ˆπ‘˜ where πœ† is selected to ensure that the Hessian is positive definite. β€’ Since π‘―π‘˜ + πœ†π‘° is also symmetric, the resulting system of linear equations can be solved for 𝒅 as: 𝑳𝑫𝑳𝑇 𝒅 = βˆ’π›»π‘“ π’™π‘˜
  • 34. Newton’s Algorithm Newton’s Method (Griva, Nash, & Sofer, p. 373): 1. Initialize: Choose 𝒙0, specify πœ– 2. For π‘˜ = 0,1, … 3. Check convergence: If 𝛻𝑓 π’™π‘˜ < πœ–, stop 4. Factorize modified Hessian as 𝛻2 𝑓 π’™π‘˜ + 𝑬 = 𝑳𝑫𝑳𝑇 and solve 𝑳𝑫𝑳𝑇 𝒅 = βˆ’π›»π‘“ π’™π‘˜ for 𝒅 5. Perform line search to determine π›Όπ‘˜ and update the solution estimate as π’™π‘˜+1 = π’™π‘˜ + π›Όπ‘˜ π’…π‘˜
  • 35. Rate of Convergence β€’ Newton’s method achieves quadratic rate of convergence in the close neighborhood of the optimal point, and superlinear convergence otherwise. β€’ The main drawback of the Newton’s method is its computational cost: the Hessian matrix needs to be computed at every step, and a linear system of equations needs to be solved to obtain the update. β€’ Due to the high computational and storage costs, classic Newton’s method is rarely used in practice.
  • 36. Quasi Newton’s Methods β€’ The quasi-Newton methods derive from a generalization of secant method, that approximates the second derivative as: 𝑓′′ (π‘₯π‘˜) β‰… 𝑓′ π‘₯π‘˜ βˆ’π‘“β€²(π‘₯π‘˜βˆ’1) π‘₯π‘˜βˆ’π‘₯π‘˜βˆ’1 β€’ In the multi-dimensional case, the secant condition is generalized as: π‘―π‘˜ π’™π‘˜ βˆ’ π’™π‘˜βˆ’1 = 𝛻𝑓 π’™π‘˜ βˆ’ 𝛻𝑓 π’™π‘˜βˆ’1 β€’ Define π‘­π‘˜ = π‘―π‘˜ βˆ’1 , then π’™π‘˜ βˆ’ π’™π‘˜βˆ’1 = π‘­π‘˜ 𝛻𝑓 π’™π‘˜ βˆ’ 𝛻𝑓 π’™π‘˜βˆ’1 β€’ The quasi-Newton methods iteratively update π‘―π‘˜ or π‘­π‘˜ as: – Direct update: π‘―π‘˜+1 = π‘―π‘˜ + βˆ†π‘―π‘˜, 𝑯0 = 𝑰 – Inverse update: π‘­π‘˜+1 = π‘­π‘˜ + βˆ†π‘­π‘˜, 𝑭 = π‘―βˆ’1 , 𝑭0 = 𝑰
  • 37. Quasi-Newton Methods β€’ Quasi-Newton update: Let π’”π‘˜ = π’™π‘˜+1 βˆ’ π’™π‘˜, π’šπ‘˜ = 𝛻𝑓 π’™π‘˜+1 βˆ’ 𝛻𝑓 π’™π‘˜ ; then, – The DFP (Davison-Fletcher-Powell) formula for inverse Hessian update is given as: π‘­π‘˜+1 = π‘­π‘˜ βˆ’ π‘­π‘˜π’šπ‘˜ π‘­π‘˜π’šπ‘˜ 𝑇 π’šπ‘˜ π‘‡π‘­π‘˜π’šπ‘˜ + π’”π‘˜π’”π‘˜ 𝑇 π’šπ‘˜ π‘‡π’”π‘˜ – The BGFS (Broyden, Fletcher, Goldfarb, Shanno) formula for direct Hessian update is given as: π‘―π‘˜+1 = π‘―π‘˜ βˆ’ π‘―π‘˜π’”π‘˜ π‘―π‘˜π’”π‘˜ 𝑇 π’”π‘˜ π‘‡π‘―π‘˜π’”π‘˜ + π’šπ‘˜π’šπ‘˜ 𝑇 π’šπ‘˜ π‘‡π’”π‘˜
  • 38. Quasi-Newton Algorithm The Quasi-Newton Algorithm (Griva, Nash & Sofer, p.415): β€’ Initialize: Choose 𝒙0, 𝑯0 (e.g., 𝑯0 = 𝑰), specify πœ€ β€’ For π‘˜ = 0,1, … – Check convergence: If 𝛻𝑓 π’™π‘˜ < πœ€, stop – Solve π‘―π‘˜π’… = βˆ’π›»π‘“ π’™π‘˜ for π’…π‘˜ (alternatively, 𝒅 = βˆ’π‘­π‘˜π›»π‘“ π’™π‘˜ ) – Solve min 𝛼 𝑓 π’™π‘˜ + π›Όπ’…π‘˜ for π›Όπ‘˜, and update the current estimate: π’™π‘˜+1 = π’™π‘˜ + π›Όπ‘˜ π’…π‘˜ – Compute π’”π‘˜, π’šπ‘˜, and update π‘―π‘˜ (or π‘­π‘˜ as applicable)
  • 39. Example: Quasi-Newton Method β€’ Consider the problem: min π‘₯1,π‘₯2 𝑓(π‘₯1, π‘₯2) = 2π‘₯1 2 βˆ’ π‘₯1π‘₯2 + π‘₯2 2 , where 𝑯 = 4 βˆ’ 1 βˆ’1 2 , 𝛻𝑓 = 𝑯 π‘₯1 π‘₯2 . Let 𝒙0 = 1 1 , 𝑓0 = 4, 𝑯0 = 𝑰, 𝑭0 = 𝑰; Choose 𝒅0 = βˆ’π›»π‘“ π‘₯0 = βˆ’3 βˆ’1 ; then 𝑓 𝛼 = 2 1 βˆ’ 3𝛼 2 + 1 βˆ’ 𝛼 2 βˆ’ (1 βˆ’ 3𝛼)(1 βˆ’ 𝛼), Using 𝑓′ 𝛼 = 0 β†’ 𝛼 = 5 16 β†’ 𝒙1 = 0.625 0.688 , 𝑓1 = 0.875; then π’š1 = βˆ’3.44 0.313 , 𝑭1 = 1.193 0.065 0.065 1.022 , 𝑯1 = 0.381 βˆ’0.206 βˆ’0.206 0.9313 , and using either update formula 𝒅1 = 0.4375 βˆ’1.313 ; for the next step, 𝑓 𝛼 = 5.36𝛼2 βˆ’ 3.83𝛼 + 0.875 β†’ 𝛼 = βˆ’0.3572, 𝒙2 = 0.2188 0.2188 .
  • 40. Example: Quasi-Newton Method β€’ For quadratic function, convergence is achieved in two iterations.
  • 41. Trust-Region Methods β€’ The trust-region methods locally employ a quadratic approximation π‘žπ‘˜ π’™π‘˜ to the nonlinear objective function. β€’ The approximation is valid in the neighborhood of π’™π‘˜ defined by Ξ©π‘˜ = 𝒙: πšͺ(𝒙 βˆ’ π’™π‘˜) ≀ βˆ†π‘˜ , where πšͺ is a scaling parameter. β€’ The method aims to find a π’™π‘˜+1 ∈ Ξ©π‘˜, that satisfies the sufficient decrease condition in 𝑓(𝒙). β€’ The quality of the quadratic approximation is estimated by the reliability index: π›Ύπ‘˜ = 𝑓(π’™π‘˜)βˆ’π‘“(π’™π‘˜+1) π‘žπ‘˜ π’™π‘˜ βˆ’π‘žπ‘˜ π’™π‘˜+1 . If this ratio is close to unity, the trust region may be expanded in the next iteration.
  • 42. Trust-Region Methods β€’ At each iteration π‘˜, trust-region algorithm solves a constrained optimization sub-problem involving quadratic approximation: min 𝒅 π‘žπ‘˜ 𝒅 = 𝑓 π’™π‘˜ + 𝛻𝑓 π’™π‘˜ 𝑇 𝒅 + 1 2 𝒅𝑇 𝛻2 𝑓 π’™π‘˜ 𝒅 Subject to: 𝒅 ≀ βˆ†π‘˜ Lagrangian function: β„’ π‘₯, πœ† = 𝑓 π’™π‘˜ + πœ† 𝒅 βˆ’ βˆ†π‘˜ FONC: 𝛻2𝑓 π’™π‘˜ + πœ†π‘° π’…π‘˜ = βˆ’π›»π‘“ π’™π‘˜ , πœ† 𝒅 βˆ’ βˆ†π‘˜ = 0 β€’ The resulting search direction π’…π‘˜ is given as: π’…π‘˜ = π’…π‘˜(πœ†). – For large βˆ†π‘˜ and a positive-definite 𝛻2𝑓 π’™π‘˜ , the Lagrange multiplier πœ† β†’ 0, and π’…π‘˜ (πœ†) reduces to the Newton’s direction. – For βˆ†π‘˜β†’ 0, πœ† β†’ ∞, and π’…π‘˜ (πœ†) aligns with the steepest-descent direction.
  • 43. Trust-Region Algorithm β€’ Trust-Region Algorithm (Griva, Nash & Sofer, p.392): β€’ Initialize: choose 𝒙0, βˆ†0; specify πœ€, 0 < πœ‡ < πœ‚ < 1 (e.g., πœ‡ = 1 4 ; πœ‚ = 3 4 ) β€’ For π‘˜ = 0,1, … – Check convergence: If 𝛻𝑓 π’™π‘˜ < πœ€, stop – Solve the subproblem: min 𝒅 π‘žπ‘˜ 𝒅 subject to 𝒅 ≀ βˆ†π‘˜ – Compute π›Ύπ‘˜, β€’ if π›Ύπ‘˜ < πœ‡, set π’™π‘˜+1 = π’™π‘˜, βˆ†π‘˜+1= 1 2 βˆ†π‘˜ β€’ else if π›Ύπ‘˜ < πœ‚, set π’™π‘˜+1 = π’™π‘˜ + π’…π‘˜ , βˆ†π‘˜+1= βˆ†π‘˜ β€’ else set π’™π‘˜+1 = π’™π‘˜ + π’…π‘˜ , βˆ†π‘˜+1= 2βˆ†π‘˜
  • 44. Computer Methods for Constrained Problems β€’ Penalty and Barrier methods β€’ Augmented Lagrangian method (AL) β€’ Sequential linear programming (SLP) β€’ Sequential quadratic programming (SQP)
  • 45. Penalty and Barrier Methods β€’ Consider the general optimization problem: min 𝒙 𝑓 𝒙 Subject to β„Žπ‘– 𝒙 = 0, 𝑖 = 1, … , 𝑝; 𝑔𝑗 𝒙 ≀ 0, 𝑗 = 𝑖, … , π‘š; π‘₯𝑖𝐿 ≀ π‘₯𝑖 ≀ π‘₯π‘–π‘ˆ, 𝑖 = 1, … , 𝑛. β€’ Define a composite function to be used for constraint compliance: Ξ¦ 𝒙, π‘Ÿ = 𝑓 𝒙 + 𝑃 𝑔 𝒙 , β„Ž 𝒙 , 𝒓 where 𝑃 defines a loss function, and 𝒓 is a vector of weights (penalty parameters)
  • 46. Penalty and Barrier Methods β€’ Penalty Function Method. A penalty function method employs a quadratic loss function and iterates through the infeasible region 𝑃 𝑔 𝒙 , β„Ž 𝒙 , 𝒓 = π‘Ÿ 𝑔𝑖 + 𝒙 2 𝑖 + β„Žπ‘– 𝒙 2 𝑖 𝑔𝑖 + 𝒙 = max 0, 𝑔𝑖 𝒙 , π‘Ÿ > 0 β€’ Barrier Function Method. A barrier method employs a log barrier function and iterates through the feasible region 𝑃 𝑔 𝒙 , β„Ž 𝒙 , 𝒓 = 1 π‘Ÿ log βˆ’π‘”π‘– π‘₯ 𝑖 β€’ For both penalty and barrier methods, as π‘Ÿ β†’ ∞, 𝒙(π‘Ÿ) β†’ π’™βˆ—
  • 47. The Augmented Lagrangian Method β€’ Consider an equality-constrained problem: min 𝒙 𝑓 𝒙 Subject to: β„Žπ‘– 𝒙 = 0, 𝑖 = 1, … , 𝑙 β€’ Define the augmented Lagrangian (AL) as: 𝒫 𝒙, 𝒗, π‘Ÿ = 𝑓 𝒙 + π‘£π‘—β„Žπ‘— 𝒙 + 1 2 π‘Ÿβ„Žπ‘— 2 𝒙 𝑗 where the additional term defines an exterior penalty function with π‘Ÿ as the penalty parameter. β€’ For inequality constrained problems, the AL may be defined as: 𝒫 𝒙, 𝒖, π‘Ÿ = 𝑓 𝒙 + 𝑒𝑖𝑔𝑖 𝒙 + 1 2 π‘Ÿπ‘”π‘– 2 𝒙 , if 𝑔𝑗 + 𝑒𝑗 π‘Ÿ β‰₯ 0 βˆ’ 1 2π‘Ÿ 𝑒𝑖 2 , if 𝑔𝑗 + 𝑒𝑗 π‘Ÿ < 0 𝑖 where a large π‘Ÿ makes the Hessian of AL positive definite at 𝒙.
  • 48. The Augmented Lagrangian Method β€’ The dual function for the AL is defined as: πœ“ 𝒗 = min 𝒙 𝒫 𝒙, 𝒗, π‘Ÿ = 𝑓 𝒙 + π‘£π‘—β„Žπ‘— 𝒙 + 1 2 π‘Ÿ β„Žπ‘— 𝒙 2 𝑗 β€’ The resulting dual optimization problem is: max 𝒗 πœ“ 𝒗 β€’ The dual problem may be solved via Newton’s method as: π’—π‘˜+1 = π’—π‘˜ βˆ’ 𝑑2πœ“ 𝑑𝑣𝑖𝑑𝑣𝑗 βˆ’1 𝒉 where 𝑑2πœ“ 𝑑𝑣𝑖𝑑𝑣𝑗 = βˆ’π›»β„Žπ‘– 𝑇 𝛻2𝒫 βˆ’1π›»β„Žπ‘— β€’ For large 𝒓, the Newton’s update may be approximated as: 𝑣𝑗 π‘˜+1 = 𝑣𝑗 π‘˜ + π‘Ÿ π‘—β„Žπ‘—, 𝑗 = 1, … , 𝑙
  • 49. Example: Augmented Lagrangian β€’ Maximize the volume of a cylindrical tank subject to surface area constraint: max 𝑑,𝑙 𝑓 𝑑, 𝑙 = πœ‹π‘‘2𝑙 4 , subject to β„Ž: πœ‹π‘‘2 4 + πœ‹π‘‘π‘™ βˆ’ 𝐴0 = 0 β€’ We can normalize the problem as: min 𝑑,𝑙 𝑓 𝑑, 𝑙 = βˆ’π‘‘2 𝑙, subject to β„Ž: 𝑑2 + 4𝑑𝑙 βˆ’ 1 = 0 β€’ The solution to the primal problem is obtained as: Lagrangian function: β„’ 𝑑, 𝑙, πœ† = βˆ’π‘‘2 𝑙 + πœ†(𝑑2 + 4𝑑𝑙 βˆ’ 1) FONC: πœ† 𝑑 + 2𝑙 βˆ’ 𝑑𝑙 = 0, πœ†π‘‘ 𝑑 + 4 βˆ’ 𝑑2 = 0, 𝑑2 + 4𝑑𝑙 βˆ’ 1 = 0 Optimal solution: π‘‘βˆ— = 2π‘™βˆ— = 4πœ†βˆ— = 1 3 .
  • 50. Example: Augmented Lagrangian β€’ Alternatively, define the Augmented Lagrangian function as: 𝒫 𝑑, 𝑙, πœ†, π‘Ÿ = βˆ’π‘‘2𝑙 + πœ† 𝑑2 + 4𝑑𝑙 βˆ’ 1 + 1 2 π‘Ÿ 𝑑2 + 4𝑑𝑙 βˆ’ 1 2 β€’ Define the dual function: πœ“ πœ† = min 𝑑,𝑙 𝒫 𝑑, 𝑙, πœ†, π‘Ÿ β€’ Define dual optimization problem: max 𝑑,𝑙 πœ“ πœ† β€’ Solution to the dual problem: πœ†βˆ— = πœ†π‘šπ‘Žπ‘₯ = 0.144 β€’ Solution to the design variables: π‘‘βˆ— = 2π‘™βˆ— = 0.577
  • 51. Sequential Linear Programming β€’ Consider the general optimization problem: min 𝒙 𝑓 𝒙 Subject to β„Žπ‘– 𝒙 = 0, 𝑖 = 1, … , 𝑝; 𝑔𝑗 𝒙 ≀ 0, 𝑗 = 𝑖, … , π‘š; π‘₯𝑖𝐿 ≀ π‘₯𝑖 ≀ π‘₯π‘–π‘ˆ, 𝑖 = 1, … , 𝑛. β€’ Let π’™π‘˜ denote the current estimate of the design variables, and let 𝒅 denote the change in variables; define the first order expansion of the objective and constraint functions in the neighborhood of π’™π‘˜ 𝑓 π’™π‘˜ + 𝒅 = 𝑓 π’™π‘˜ + 𝛻𝑓 π’™π‘˜ 𝑇 𝒅 𝑔𝑖 π’™π‘˜ + 𝒅 = 𝑔𝑖 π’™π‘˜ + 𝛻𝑔𝑖 π’™π‘˜ 𝑇 𝒅, 𝑖 = 1, … , π‘š β„Žπ‘— π’™π‘˜ + 𝒅 = β„Žπ‘— π’™π‘˜ + π›»β„Žπ‘— π’™π‘˜ 𝑇 𝒅, 𝑗 = 1, … , 𝑙
  • 52. Sequential Linear Programming β€’ Let π‘“π‘˜ = 𝑓 π’™π‘˜ , 𝑔𝑖 π‘˜ = 𝑔𝑖 π’™π‘˜ , β„Žπ‘— π‘˜ = β„Žπ‘— π’™π‘˜ ; 𝑏𝑖 = βˆ’π‘”π‘– π‘˜ , 𝑒𝑗 = βˆ’β„Žπ‘— π‘˜ , 𝒄 = 𝛻𝑓 π’™π‘˜ , 𝒂𝑖 = 𝛻𝑔𝑖 π’™π‘˜ , 𝒏𝑗 = π›»β„Žπ‘— π’™π‘˜ , 𝑨 = 𝒂1, 𝒂2, … , π’‚π‘š , 𝑡 = 𝒏1, 𝒏2, … , 𝒏𝑙 . β€’ Using first order expansion, define an LP subprogram for the current iteration of the NLP problem: min 𝒅 𝑓 = 𝒄𝑇𝒅 Subject to: 𝑨𝑇 𝒅 ≀ 𝒃, 𝑡𝑇𝒅 = 𝒆 where 𝑓 represents first-order change in the cost function, and the columns of 𝑨 and 𝑡 matrices represent, respectively, the gradients of inequality and equality constraints. β€’ The resulting LP problem can be solved via the Simplex method.
  • 53. Sequential Linear Programming β€’ We may note that: – Since both positive and negative changes to design variables π’™π‘˜ are allowed, the variables 𝑑𝑖 are unrestricted in sign – The SLP method requires additional constraints of the form: βˆ’ βˆ†π‘–π‘™ π‘˜ ≀ 𝑑𝑖 π‘˜ ≀ βˆ†π‘–π‘’ π‘˜ (termed move limits) to bind the LP solution. These limits represent maximum allowable change in 𝑑𝑖 in the current iteration and are selected as percentage of current value. – Move limits serve dual purpose of binding the solution and obviating the need for line search. – Overly restrictive move limits tend to make the SLP problem infeasible.
  • 54. SLP Example β€’ Consider the convex NLP problem: min π‘₯1,π‘₯2 𝑓(π‘₯1, π‘₯2) = π‘₯1 2 βˆ’ π‘₯1π‘₯2 + π‘₯2 2 Subject to: 1 βˆ’ π‘₯1 2 βˆ’ π‘₯2 2 ≀ 0; βˆ’π‘₯1 ≀ 0, βˆ’π‘₯2 ≀ 0 The problem has a single minimum at: π’™βˆ— = 1 2 , 1 2 β€’ The objective and constraint gradients are: 𝛻𝑓𝑇 = 2π‘₯1 βˆ’ π‘₯2, 2π‘₯2 βˆ’ π‘₯1 , 𝛻𝑔1 𝑇 = βˆ’2π‘₯1, βˆ’2π‘₯2 , 𝛻𝑔2 𝑇 = βˆ’1,0 , 𝛻𝑔3 𝑇 = [0, βˆ’1]. β€’ Let 𝒙0 = 1, 1 , then 𝑓0 = 1, 𝒄𝑇 = 1 1 , 𝑏1 = 𝑏2 = 𝑏3 = 1; 𝒂1 𝑇 = βˆ’2 βˆ’ 2 , 𝒂2 𝑇 = βˆ’1 0 , 𝒂3 𝑇 = 0 βˆ’ 1
  • 55. SLP Example β€’ Define the LP subproblem at the current step as: min 𝑑1,𝑑2 𝑓 π‘₯1, π‘₯2 = 𝑑1 + 𝑑2 Subject to: βˆ’2 βˆ’2 βˆ’1 0 0 βˆ’1 𝑑1 𝑑2 ≀ 1 1 1 β€’ In the absence of move limits, the LP problem is unbounded; using 50% move limits, the SLP update is given as: π’…βˆ— = βˆ’ 1 2 , βˆ’ 1 2 𝑇 , 𝒙1 = 1 2 , 1 2 𝑇 , with resulting constraint violation: 𝑔𝑖 = 1 2 , 0, 0 ; smaller move limits may be used to reduce the constraint violation.
  • 56. Sequential Linear Programming SLP Algorithm (Arora, p. 508): β€’ Initialize: choose 𝒙0, πœ€1 > 0, πœ€2 > 0. β€’ For π‘˜ = 0,1,2, … – Choose move limits βˆ†π‘–π‘™ π‘˜ , βˆ†π‘–π‘’ π‘˜ as some fraction of current design π’™π‘˜ – Compute π‘“π‘˜ , 𝒄, 𝑔𝑖 π‘˜ , β„Žπ‘— π‘˜ , 𝑏𝑖, 𝑒𝑗 – Formulate and solve the LP subproblem for π’…π‘˜ – If 𝑔𝑖 ≀ πœ€1; 𝑖 = 1, … , π‘š; β„Žπ‘— ≀ πœ€1; 𝑖 = 1, … , 𝑝; and π’…π‘˜ ≀ πœ€2, stop – Substitute π’™π‘˜+1 ← π’™π‘˜ + π›Όπ’…π‘˜, π‘˜ ← π‘˜ + 1.
  • 57. Sequential Quadratic Programming β€’ Sequential quadratic programming (SQP) uses a quadratic approximation to the objective function at every step of iteration. β€’ The SQP problem is defined as: min 𝒅 𝑓 = 𝒄𝑇 𝒅 + 1 2 𝒅𝑇 𝒅 Subject to, 𝑨𝑇𝒅 ≀ 𝒃, 𝑡𝑇𝒅 = 𝒆 β€’ SQP does not require move limits, alleviating the shortcomings of the SLP method. β€’ The SQP problem is convex; hence, it has a single global minimum. β€’ SQP can be solved via Simplex based linear complementarity problem (LCP) framework.
  • 58. Sequential Quadratic Programming β€’ The Lagrangian function for the SQP problem is defined as: β„’ 𝒅, 𝒖, 𝒗 = 𝒄𝑇𝒅 + 1 2 𝒅𝑇𝒅 + 𝒖𝑇 𝑨𝑇𝒅 βˆ’ 𝒃 + 𝒔 + 𝒗𝑇(𝑡𝑇𝒅 βˆ’ 𝒆) β€’ Then the KKT conditions are: Optimality: 𝛁ℒ = 𝒄 + 𝒅 + 𝑨𝒖 + 𝑡𝒗 = 𝟎, Feasibility: 𝑨𝑇 𝒅 + 𝒔 = 𝒃, 𝑡𝑇 𝒅 = 𝒆 , Complementarity: 𝒖𝑇 𝒔 = 𝟎, Non-negativity: 𝒖 β‰₯ 𝟎, 𝒔 β‰₯ 𝟎
  • 59. Sequential Quadratic Programming β€’ Since 𝒗 is unrestricted in sign, let 𝒗 = π’š βˆ’ 𝒛, π’š β‰₯ 𝟎, 𝒛 β‰₯ 𝟎, and the KKT conditions are compactly written as: 𝑰 𝑨 𝑨𝑇 𝟎 𝑡𝑇 𝟎 𝟎 𝑰 𝟎 𝑡 βˆ’π‘΅ 𝟎 𝟎 𝟎 𝟎 𝒅 𝒖 𝒔 π’š 𝒛 = βˆ’π’„ 𝒃 𝒆 , or 𝑷𝑿 = 𝑸 β€’ The complementary slackness conditions, 𝒖𝑇𝒔 = 𝟎, translate as: 𝑿𝑖𝑿𝑖+π‘š = 0, 𝑖 = 𝑛 + 1, β‹― , 𝑛 + π‘š. β€’ The resulting problem can be solved via Simplex method using LCP framework.
  • 60. Descent Function Approach β€’ In SQP methods, the line search step is based on minimization of a descent function that penalizes constraint violations, i.e., Ξ¦ 𝒙 = 𝑓 𝒙 + 𝑅𝑉 𝒙 where 𝑓 𝒙 is the cost function, 𝑉 𝒙 represents current maximum constraint violation, and 𝑅 > 0 is a penalty parameter. β€’ The descent function value at the current iteration is computed as: Ξ¦π‘˜ = π‘“π‘˜ + π‘…π‘‰π‘˜, 𝑅 = max π‘…π‘˜, π‘Ÿπ‘˜ where π‘Ÿπ‘˜ = 𝑒𝑖 π‘˜ π‘š 𝑖=1 + 𝑣𝑗 π‘˜ 𝑝 𝑗=1 π‘‰π‘˜ = max {0; 𝑔𝑖, 𝑖 = 1, . . . , π‘š; β„Žπ‘— , 𝑗 = 1, … , 𝑝} β€’ The line search subproblem is defined as: min 𝛼 Ξ¦ 𝛼 = Ξ¦ π’™π‘˜ + π›Όπ’…π‘˜
  • 61. SQP Algorithm SQP Algorithm (Arora, p. 526): β€’ Initialize: choose 𝒙0, 𝑅0 = 1, πœ€1 > 0, πœ€2 > 0. β€’ For π‘˜ = 0,1,2, … – Compute π‘“π‘˜ , 𝑔𝑖 π‘˜ , β„Žπ‘— π‘˜ , 𝒄, 𝑏𝑖, 𝑒𝑗; compute π‘‰π‘˜. – Formulate and solve the QP subproblem to obtain π’…π‘˜ and the Lagrange multipliers π’–π‘˜ and π’—π‘˜ . – If π‘‰π‘˜ ≀ πœ€1 and π’…π‘˜ ≀ πœ€2, stop. – Compute 𝑅; formulate and solve line search subproblem for 𝛼 – Set π’™π‘˜+1 ← π’™π‘˜ + π›Όπ’…π‘˜ , π‘…π‘˜+1 ← 𝑅, π‘˜ ← π‘˜ + 1 β€’ The above algorithm is convergent, i.e., Ξ¦ π’™π‘˜ ≀ Ξ¦ 𝒙0 ; π’™π‘˜ converges to the KKT point π’™βˆ—
  • 62. SQP with Approximate Line Search β€’ The SQP algorithm can use with approximate line search as follows: Let 𝑑𝑗, 𝑗 = 0,1, … denote a trial step size, π’™π‘˜+1,𝑗 denote the trial design point, π‘“π‘˜+1,𝑗 = 𝑓( π’™π‘˜+1,𝑗 ) denote the function value at the trial solution, and Ξ¦π‘˜+1,𝑗 = π‘“π‘˜+1,𝑗 + π‘…π‘‰π‘˜+1,𝑗 is the penalty function at the trial solution. β€’ The trial solution is required to satisfy the descent condition: Ξ¦π‘˜+1,𝑗 + 𝑑𝑗𝛾 π’…π‘˜ 2 ≀ Ξ¦π‘˜,𝑗, 0 < 𝛾 < 1 where a common choice is: 𝛾 = 1 2 , πœ‡ = 1 2 , 𝑑𝑗 = πœ‡π‘— , 𝑗 = 0,1,2, …. β€’ The above descent condition ensures that the constraint violation decreases at each step of the method.
  • 63. SQP Example β€’ Consider the NLP problem: min π‘₯1,π‘₯2 𝑓(π‘₯1, π‘₯2) = π‘₯1 2 βˆ’ π‘₯1π‘₯2 + π‘₯2 2 subject to 𝑔1: 1 βˆ’ π‘₯1 2 βˆ’ π‘₯2 2 ≀ 0, 𝑔2: βˆ’π‘₯1 ≀ 0, 𝑔3: βˆ’π‘₯2 ≀ 0 Then 𝛻𝑓𝑇 = 2π‘₯1 βˆ’ π‘₯2, 2π‘₯2 βˆ’ π‘₯1 , 𝛻𝑔1 𝑇 = βˆ’2π‘₯1, βˆ’2π‘₯2 , 𝛻𝑔2 𝑇 = βˆ’1,0 , 𝛻𝑔3 𝑇 = [0, βˆ’1]. Let π‘₯0 = 1, 1 ; then, 𝑓0 = 1, 𝒄 = 1, 1 𝑇, 𝑔1 1,1 = 𝑔2 1,1 = 𝑔3 1,1 = βˆ’1. β€’ Since all constraints are initially inactive, 𝑉0 = 0, and 𝒅 = βˆ’π’„ = βˆ’1, βˆ’1 𝑇; the line search problem is: min 𝛼 Ξ¦ 𝛼 = 1 βˆ’ 𝛼 2; β€’ By setting Ξ¦β€² 𝛼 = 0, we get the analytical solution: 𝛼 = 1; thus π‘₯1 = 0, 0 , which results in a large constraint violation
  • 64. SQP Example β€’ Alternatively, we may use approximate line search as follows: – Let 𝑅0 = 10, 𝛾 = πœ‡ = 1 2 ; let 𝑑0 = 1, then 𝒙1,0 = 0,0 , 𝑓1,0 = 0, 𝑉1,0 = 1, Ξ¦1,0 = 10; 𝒅0 2 = 2, and the descent condition Ξ¦1,0 + 1 2 𝒅0 2 ≀ Ξ¦0 = 1 is not met at the trial point. – Next, for 𝑑1 = 1 2 , we get: 𝒙1,1 = 1 2 , 1 2 , 𝑓1,1 = 1 4 , V1,1 = 1 2 , Ξ¦1,1 = 5 1 4 , and the descent condition fails again; – Next, for 𝑑2 = 1 4 , we get: 𝒙1,2 = 3 4 , 3 4 , V1,2 = 0, 𝑓1,2 = Ξ¦1,2 = 9 16 , and the descent condition checks as: Ξ¦1,2 + 1 8 𝒅0 2 ≀ Ξ¦0 = 1. – Therefore, we set 𝛼 = 𝑑2 = 1 4 , 𝒙1 = 𝒙1,2 = 3 4 , 3 4 with no constraint violation.
  • 65. The Active Set Strategy β€’ To reduce the computational cost of solving the QP subproblem, we may only include the active constraints in the problem. β€’ For π’™π‘˜ ∈ Ξ©, the set of potentially active constraints is defined as: β„π‘˜ = 𝑖: 𝑔𝑖 π‘˜ > βˆ’πœ€; 𝑖 = 1, … , π‘š ⋃ 𝑗: 𝑗 = 1, … , 𝑝 for some πœ€. β€’ For π’™π‘˜ βˆ‰ Ξ©, let π‘‰π‘˜ = max {0; 𝑔𝑖 π‘˜ , 𝑖 = 1, . . . , π‘š; β„Žπ‘— π‘˜ , 𝑗 = 1, … , 𝑝}; then, the active constraint set is defined as: β„π‘˜ = 𝑖: 𝑔𝑖 π‘˜ > π‘‰π‘˜ βˆ’ πœ€; 𝑖 = 1, … , π‘š ⋃ 𝑗: β„Žπ‘— π‘˜ > π‘‰π‘˜ βˆ’ πœ€; 𝑗 = 1, … , 𝑝 β€’ The gradients of inactive constraints, i.e., those not in β„π‘˜, do not need to be computed
  • 66. SQP via Newton’s Method β€’ Consider the following equality constrained problem: min 𝒙 𝑓(𝒙), subject to β„Žπ‘– 𝒙 = 0, 𝑖 = 1, … , 𝑙 β€’ The Lagrangian function is given as: β„’ 𝒙, 𝒗 = 𝑓 𝒙 + 𝒗𝑇𝒉(𝒙) β€’ The KKT conditions are: 𝛻ℒ 𝒙, 𝒗 = 𝛻𝑓 𝒙 + 𝑡𝒗 = 𝟎, 𝒉 𝒙 = 𝟎 where 𝑡 = 𝛁𝒉(𝒙) is a Jacobian matrix whose 𝑖th column is π›»β„Žπ‘– 𝒙 β€’ Using first order Taylor series expansion (with shorthand notation): π›»β„’π‘˜+1 = π›»β„’π‘˜ + 𝛻2β„’π‘˜Ξ”π’™ + 𝑁Δ𝒗 π’‰π‘˜+1 = π’‰π‘˜ + 𝑡𝑇Δ𝒙 β€’ By expanding Δ𝒗 = π’—π‘˜+1 βˆ’ π’—π‘˜ , π›»β„’π‘˜ = π›»π‘“π‘˜ + π‘΅π’—π‘˜ , and assuming π’—π‘˜ β‰… π’—π‘˜+1 we obtain: 𝛻2 β„’π‘˜ 𝑡 𝑡𝑇 𝟎 Ξ”π’™π‘˜ π’—π‘˜+1 = βˆ’ π›»π‘“π‘˜ π’‰π‘˜ which is similar to N-R update, but uses Hessian of the Lagrangian
  • 67. SQP via Newton’s Method β€’ Alternately, we consider minimizing the quadratic approximation: min Δ𝒙 1 2 Δ𝒙𝑇𝛻2ℒΔ𝒙 + 𝛻𝑓𝑇Δ𝒙 Subject to: β„Žπ‘– π‘₯ + 𝒏𝑖 𝑇 Δ𝒙 = 0, 𝑖 = 𝑖, … , 𝑙 β€’ The KKT conditions are: 𝛻𝑓 + 𝛻2 ℒΔ𝒙 + 𝑡𝒗 = 𝟎, 𝒉 + 𝑡Δ𝒙 = 𝟎 β€’ Thus the QP subproblem can be solved via Newton’s method! 𝛻2 β„’π‘˜ 𝑡 𝑡𝑇 𝟎 Ξ”π’™π‘˜ π’—π‘˜+1 = βˆ’ π›»π‘“π‘˜ π’‰π‘˜ β€’ The Hessian of the Lagrangian can be updated via BFGS method as: π‘―π‘˜+1 = π‘―π‘˜ + π‘«π‘˜ βˆ’ π‘¬π‘˜ where π‘«π‘˜ = π’šπ‘˜π’šπ‘˜π‘‡ π’šπ‘˜π‘‡ Ξ”π’™π‘˜ , π‘¬π‘˜ = π’„π‘˜π’„π‘˜π‘‡ π’„π‘˜π‘‡ Ξ”π’™π‘˜ , π’„π‘˜ = π‘―π‘˜Ξ”π’™π‘˜, π’šπ‘˜ = π›»β„’π‘˜+1 βˆ’ β„’π‘˜
  • 68. Example: SQP with Hessian Update β€’ Consider the NLP problem: min π‘₯1,π‘₯2 𝑓(π‘₯1, π‘₯2) = π‘₯1 2 βˆ’ π‘₯1π‘₯2 + π‘₯2 2 subject to 𝑔1: 1 βˆ’ π‘₯1 2 βˆ’ π‘₯2 2 ≀ 0, 𝑔2: βˆ’π‘₯1 ≀ 0, 𝑔3: βˆ’π‘₯2 ≀ 0 Let π‘₯0 = 1, 1 ; then, 𝑓0 = 1, 𝒄 = 1, 1 𝑇 , 𝑔1 1,1 = 𝑔2 1,1 = 𝑔3 1,1 = βˆ’1; 𝛻𝑔1 𝑇 = βˆ’2, βˆ’2 , 𝛻𝑔2 𝑇 = βˆ’1,0 , 𝛻𝑔3 𝑇 = [0, βˆ’1]. β€’ Using approximate line search, 𝛼 = 1 4 , 𝒙1 = 3 4 , 3 4 . β€’ For the Hessian update, we have: 𝑓1 = 0.5625, 𝑔1 = βˆ’0.125, 𝑔2 = 𝑔3 = βˆ’0.75; 𝒄1 = [0.75, 0.75]; 𝛻𝑔1 𝑇 = βˆ’ 3 2 , βˆ’ 3 2 , 𝛻𝑔2 𝑇 = βˆ’1,0 , 𝛻𝑔3 𝑇 = 0, βˆ’1 ; Δ𝒙0 = βˆ’0.25, βˆ’0.25 ; then, 𝑫0 = 8 1 1 1 1 , 𝑬0 = 8 1 1 1 1 , 𝑯1 = 𝑯0
  • 69. SQP with Hessian Update β€’ For the next step, the QP problem is defined as: min 𝑑1,𝑑2 𝑓 = 3 4 𝑑1 + 𝑑2 + 1 2 𝑑1 2 + 𝑑2 2 Subject to: βˆ’ 3 2 𝑑1 + 𝑑2 ≀ 0, βˆ’π‘‘1 ≀ 0, βˆ’π‘‘2 ≀ 0 β€’ The application of KKT conditions results in a linear system of equations, which are solved to obtain: 𝒙𝑇 = 𝑑1, 𝑑2, 𝑒1, 𝑒2, 𝑒3, 𝑠1, 𝑠2, 𝑠3 = 0.188, 0.188, 0, 0, 0,0.125, 0.75, 0.75
  • 70. Modified SQP Algorithm Modified SQP Algorithm (Arora, p. 558): β€’ Initialize: choose 𝒙0, 𝑅0 = 1, 𝑯0 = 𝐼; πœ€1, πœ€2 > 0. β€’ For π‘˜ = 0,1,2, … – Compute π‘“π‘˜ , 𝑔𝑖 π‘˜ , β„Žπ‘— π‘˜ , 𝒄, 𝑏𝑖, 𝑒𝑗, and π‘‰π‘˜. If π‘˜ > 0, compute π‘―π‘˜ – Formulate and solve the modified QP subproblem for search direction π’…π‘˜ and the Lagrange multipliers π’–π‘˜ and π’—π‘˜ . – If π‘‰π‘˜ ≀ πœ€1 and π’…π‘˜ ≀ πœ€2, stop. – Compute 𝑅; formulate and solve line search subproblem for 𝛼 – Set π’™π‘˜+1 ← π’™π‘˜ + π›Όπ’…π‘˜ , π‘…π‘˜+1 ← 𝑅, π‘˜ ← π‘˜ + 1.
  • 71. SQP Algorithm %SQP subproblem via Hessian update % input: xk (current design); Lk (Hessian of Lagrangian estimate) %initialize n=size(xk,1); if ~exist('Lk','var'), Lk=diag(xk+(~xk)); end tol=1e-7; %function and constraint values fk=f(xk); dfk=df(xk); gk=g(xk); dgk=dg(xk); %N-R update A=[Lk dgk; dgk' 0*dgk'*dgk]; b=[-dfk;-gk]; dx=Ab; dxk=dx(1:n); lam=dx(n+1:end);
  • 72. SQP Algorithm %inactive constraints idx1=find(lam<0); if idx1 [dxk,lam]=inactive(lam,A,b,n); end %check termination if abs(dxk)<tol, return, end %adjust increment for constraint compliance P=@(xk) f(xk)+lam'*abs(g(xk)); while P(xk+dxk)>P(xk), dxk=dxk/2; if abs(dxk)<tol, break, end end %Hessian update dL=@(x) df(x)+dg(x)*lam; Lk=update(Lk, xk, dxk, dL); xk=xk+dxk; disp([xk' f(xk) P(xk)])
  • 73. SQP Algorithm %function definitions function [dxk,lam]=inactive(lam,A,b,n) idx1=find(lam<0); lam(idx1)=0; idx2=find(lam); v=[1:n,n+idx2]; A=A(v,v); b=b(v); dx=Ab; dxk=dx(1:n); lam(idx2)=dx(n+1:end); end function Lk=update(Lk, xk, dxk, dL) ga=dL(xk+dxk)-dL(xk); Hx=Lk*dxk; Dk=ga*ga'/(ga'*dxk); Ek=Hx*Hx'/(Hx'*dxk); Lk=Lk+Dk-Ek; end
  • 74. Generalized Reduced Gradient β€’ The GRG method finds the search direction by projecting the objective function gradient onto the constraint hyperplane. β€’ The GRG points tangent to the constraint hyperplane, so that iterative steps try to conform to the constraints. β€’ The constraints are effectively used to implicitly eliminate variables and reduce problem dimensions.
  • 75. Implicit Elimination β€’ Consider an equality constrained problem in two variables: Objective: min 𝑓 𝒙 , 𝒙𝑇 = π‘₯1, π‘₯2 Subject to: 𝑔 𝒙 = 0 β€’ The variation in the objective and constraint functions are: 𝑑𝑓 = 𝛻𝑓 𝑇 𝑑𝒙 = πœ•π‘“ πœ•π‘₯1 𝑑π‘₯1 + πœ•π‘“ πœ•π‘₯2 𝑑π‘₯2 𝑑𝑔 = 𝛻𝑔 𝑇 𝑑𝒙 = πœ•π‘” πœ•π‘₯1 𝑑π‘₯1 + πœ•π‘” πœ•π‘₯2 𝑑π‘₯2 = 0 β€’ Solve for 𝑑π‘₯2 = βˆ’ πœ•π‘”/πœ•π‘₯1 πœ•π‘”/πœ•π‘₯2 𝑑π‘₯1 and substitute in the objective function: 𝑑𝑓 = πœ•π‘“ πœ•π‘₯1 βˆ’ πœ•π‘“ πœ•π‘₯2 πœ•π‘”/πœ•π‘₯1 πœ•π‘”/πœ•π‘₯2 𝑑π‘₯1 β€’ Then the reduced gradient of 𝑓 along π‘₯1 is given as: 𝛻𝑓𝑅 = πœ•π‘“ πœ•π‘₯1 βˆ’ πœ•π‘“ πœ•π‘₯2 πœ•π‘”/πœ•π‘₯1 πœ•π‘”/πœ•π‘₯2
  • 76. Implicit Elimination β€’ Consider a problem in 𝑛 variable with π‘š equality constraints: Objective: min 𝑓 𝒙 , 𝒙𝑇 = π‘₯1, π‘₯2, … , π‘₯𝑛 Subject to: 𝑔𝑖 𝒙 = 0, 𝑖 = 1, … , π‘š β€’ We define π‘š basic variables in terms of 𝑛 βˆ’ π‘š nonbasic variables; let 𝒙𝑇 = π’šπ‘‡ , 𝒛𝑇 , where π’š are basic and 𝒛 are nonbasic. β€’ The gradient vector is partitioned as: 𝛻𝑓𝑇 = 𝛻𝑓 π’š 𝑇, 𝛻𝑓 𝒛 𝑇 . β€’ The variations in the objective and constraint functions are: 𝑑𝑓 = 𝛻𝑓 π’š 𝑇 π‘‘π’š + 𝛻𝑓 𝒛 𝑇 𝑑𝒛 π‘‘π’ˆ = πœ•πœ“ πœ•π’š π‘‘π’š + πœ•πœ“ πœ•π’› 𝑑𝒛 = 𝟎 where the matrices of partial derivatives are defined as: πœ•πœ“ πœ•π’š 𝑖𝑗 = πœ•π‘”π‘– πœ•π‘¦π‘— ; πœ•πœ“ πœ•π’› 𝑖𝑗 = πœ•π‘”π‘– πœ•π‘§π‘—
  • 77. Generalized Reduced Gradient β€’ Since πœ•πœ“ πœ•π’š is a square π‘š Γ— π‘š matrix, we may solve for π‘‘π’š as: π‘‘π’š = βˆ’ πœ•πœ“ πœ•π’š βˆ’1 πœ•πœ“ πœ•π’› , and substitute in 𝑑𝑓 to obtain: 𝑑𝑓 = 𝛻𝑓 𝒛 𝑇 𝑑𝒛 βˆ’ 𝛻𝑓 π’š 𝑇 πœ•πœ“ πœ•π’š βˆ’1 πœ•πœ“ πœ•π’› 𝑑𝒛 β€’ Then the reduce gradient 𝛻𝑓𝑅 is defined as: 𝛻𝑓𝑅 𝑇 = 𝛻𝑓 𝒛 𝑇 βˆ’ 𝛻𝑓 π’š 𝑇 πœ•πœ“ πœ•π’š βˆ’1 πœ•πœ“ πœ•π’› β€’ Next, we choose negative of 𝛻𝑓𝑅 𝑇 as the search direction and perform a line search to determine step size; then Δ𝒛 = βˆ’π›Όπ›»π‘“π‘…, Ξ”π’š = πœ•πœ“ πœ•π’š βˆ’1 πœ•πœ“ πœ•π’› Δ𝒛
  • 78. GRG Algorithm β€’ Initialize: choose 𝒙0; evaluate objective function and constraints; convert binding inequality constraints to equality constraints. β€’ Partition the variables into π‘š basic and 𝑛 βˆ’ π‘š nonbasic ones, e.g., choose first π‘š values, or π‘š highest values as basic variables. β€’ Compute the 𝛻𝑓𝑅 along nonbasic variables. If 𝛻𝑓𝑅 = 0, exit. β€’ Set Δ𝒛 = βˆ’π›»π‘“π‘…/ 𝛻𝑓𝑅 , Ξ”π’š = βˆ’ πœ•πœ“ πœ•π’š βˆ’1 πœ•πœ“ πœ•π’› Δ𝒛. β€’ Do a line search along Δ𝒙 to obtain Ξ±. β€’ Check feasibility at π’™π‘˜ + 𝛼Δ𝒙. If necessary, use Newton-Raphson iterations to adjust Ξ”π’š as: Ξ”π’šπ‘˜+1 = Ξ”π’šπ‘˜ βˆ’ πœ•πœ“ πœ•π’š βˆ’1 π‘”π‘˜ β€’ Update: π’™π‘˜+1 = π’™π‘˜ + 𝛼Δ𝒙
  • 79. Generalized Reduced Gradient β€’ Consider an equality constrained problem Objective: min 𝑓 𝒙 = 3π‘₯1 + 2π‘₯2 + 2π‘₯1 2 βˆ’ π‘₯1π‘₯2 + 1.5π‘₯2 2 Subject to: 𝑔 𝒙 = π‘₯1 2 βˆ’ π‘₯2 βˆ’ 1 = 0 β€’ Let 𝒙0 = βˆ’1 0 ; then 𝑓0 = βˆ’1, 𝛻𝑓0 = βˆ’1 3 , 𝑔0 = 0, 𝛻𝑔0 = βˆ’2 βˆ’1 . β€’ Let π’š = π‘₯2 on the first iteration; then 𝛻𝑓𝑅 𝑇 = βˆ’1 βˆ’ 3 βˆ’2 βˆ’1 = βˆ’7. β€’ Let Δ𝒛 = 1, then Ξ”π’š = βˆ’2 βˆ’1 1 = 2. By doing a line search along Δ𝒙 = 0.333 0.667 , we obtain 𝒙1 = βˆ’0.350 βˆ’0.577 , 𝑓1 = βˆ’2.13. β€’ The optimum is reached in three iterations: π’™βˆ— = βˆ’0.634 βˆ’0.598 , 𝑓 π’™βˆ— = βˆ’2.137.
  • 80. Generalized Reduced Gradient β€’ Consider an inequality constrained problem: Objective: min 𝑓 𝒙 = π‘₯1 2 + π‘₯2 Subject to: 𝑔1 𝒙 = π‘₯1 2 + π‘₯2 2 βˆ’ 9 ≀ 0, π‘₯1 + π‘₯2 βˆ’ 1 ≀ 0 β€’ Add slack variable to inequality constraints: 𝑔1 𝒙 = π‘₯1 2 + π‘₯2 2 βˆ’ 9 + 𝑠1 = 0, 𝑔2 𝒙 = π‘₯1 + π‘₯2 βˆ’ 1 + 𝑠2 = 0 Then 𝛻𝑓 𝒙 = 2π‘₯1 1 ; 𝛻𝑔1 𝒙 = 2π‘₯1 2π‘₯2 ; 𝛻𝑔2 𝒙 = 1 1 β€’ Let 𝒙0 = 2.56 βˆ’1.56 ; then 𝑓0 = 4.99, 𝛻𝑓0 = 5.12 1 , π’ˆ0 = βˆ’0.013 0 , β€’ Since 𝑔2 is binding, add 𝑠2 to variables: 𝛻𝑓0 = 5.12 1 1 , 𝛻𝑔2 0 = 1 1 1
  • 81. Generalized Reduced Gradient β€’ Let 𝑦 = π‘₯1, 𝒛 = π‘₯2 𝑠2 ; then 𝛻𝑓 𝑦 = 5.12, 𝛻𝑓 𝒛 = 1 0 , 𝛻𝑔2 𝑦 = 1, 𝛻𝑔2 𝒛 = 1 1 , therefore 𝛻𝑓𝑅 𝒛 = 1 0 βˆ’ 1 1 5.12 = βˆ’4.12 βˆ’5.12 β€’ Let Δ𝒛 = βˆ’π›»π‘“π‘… 𝒛 , Δ𝑦 = βˆ’[1 1]Δ𝒛 = βˆ’9.24; then, Δ𝒙 = βˆ’9.24 4.12 and 𝒔0 = Δ𝒙/ Δ𝒙 . Suppose we limit the maximum step size to 𝛼 ≀ 0.5, then 𝒙1 = 𝒙0 + 0.5𝒔0 = 2.103 βˆ’1.356 with 𝑓 π‘₯1 = 𝑓1 = 3.068. There are no constraint violations, hence first iteration is completed. β€’ After seven iterations: 𝒙7 = 0.003 βˆ’3.0 with 𝑓7 = βˆ’3.0 β€’ The optimum is at: π’™βˆ— = 0.0 βˆ’3.0 with π‘“βˆ— = βˆ’3.0
  • 82. GRG for LP Problems β€’ Consider an LP problem: min 𝑓(𝒙) = 𝒄𝑇 𝒙 Subject to: 𝑨𝒙 = 𝒃, 𝒙 β‰₯ 𝟎 β€’ Let 𝒙 be partitioned into π‘š basic variables and 𝑛 βˆ’ π‘š nonbasic variables: 𝒙𝑇 = [π’šπ‘‡ , 𝒛𝑇 ]. β€’ The objective function is partitioned as: 𝑓 𝒙 = 𝒄𝑦 𝑇 π’š + 𝒄𝑧 𝑇 𝒛 β€’ The constraints are partitioned as: π‘©π’š + 𝑡𝒛 = 𝒃, π’š β‰₯ 𝟎, 𝒛 β‰₯ 𝟎. Then π’š = π‘©βˆ’1𝒃 βˆ’ π‘©βˆ’1𝑡𝒛 β€’ The objective function in terms of independent variables is: 𝑓 𝒛 = 𝒄𝑦 𝑇 π‘©βˆ’1 𝒃𝒛 + (𝒄𝑧 𝑇 βˆ’ 𝒄𝑦 𝑇 π‘©βˆ’1 𝑡)𝒛 β€’ The reduced costs for nonbasic variables are given as: 𝒓𝑐 𝑇 = 𝒄𝑧 𝑇 βˆ’ 𝒄𝑦 𝑇 π‘©βˆ’1 𝑡, or 𝒓𝑐 𝑇 = 𝒄𝑧 𝑇 βˆ’ 𝝀𝑇 𝑡
  • 83. GRG for LP Problems β€’ Using Tableu notation, the reduced costs are computed as: 𝑩 𝑡 𝒃 π’„π’š 𝑇 𝒄𝑧 𝑇 0 β†’ 𝑰 π‘©βˆ’1𝑡 π‘©βˆ’1𝒃 𝟎 𝒓𝑐 𝑇 βˆ’π’„π’š π‘‡π‘©βˆ’1𝒃 β€’ The objective function variation is given as: 𝑑𝑓 = π›»π‘“π’š π‘‡π‘‘π’š + 𝛻𝑓𝒛 𝑇𝑑𝒛 β€’ The reduced gradient along the constraint surface is given as: 𝛻𝑓𝑅 𝑇 = 𝛻𝒛𝑓𝑇 βˆ’ π›»π’šπ‘“π‘‡ π‘©βˆ’1 𝑡 = 𝒓𝑐 𝑇
  • 84. GRG Algorithm for LP Problems 1. Choose the largest π‘š components of 𝒙 as basic variables 2. Compute the reduced gradient 𝛻𝑓𝑅 𝑇 = 𝒓𝑐 𝑇 3. Let Δ𝑧𝑖 = βˆ’π‘Ÿπ‘– 𝑖𝑓 π‘Ÿπ‘– ≀ 0 βˆ’π‘₯π‘–π‘Ÿπ‘– 𝑖𝑓 π‘Ÿπ‘– > 0 4. If Δ𝒛 = 0, stop; otherwise set Ξ”π’š = π‘©βˆ’1 𝑡Δ𝒛 5. Compute step size: let 𝛼1 = max 𝛼: π’š + Ξ”π’š β‰₯ 0, 𝒛 + Δ𝒛 β‰₯ 0 , 𝛼2 = min 𝑓 𝒙 + Δ𝒙 β‰₯ 0 , 𝛼 = min{𝛼1, 𝛼2} 6. Update: π’™π‘˜+1 = π’™π‘˜ + 𝛼Δ𝒙 7. If 𝛼2 β‰₯ 𝛼1, update 𝑩, 𝑡 (use pivoting) 8. Return to 1 View publication stats View publication stats