2. Course Objectives
• Learn basic optimization methods and how they are applied in
engineering design
• Use MATLAB to solve optimum engineering design problems
– Linear programming problems
– Nonlinear programming problems
– Mixed integer programing problems
3. Course Prerequisites
• Some knowledge of these will be assumed of the participants:
– Linear algebra
– Multivariable calculus
– Scientific reasoning
– Basic programming
– MATLAB
4. Course Materials
• Arora, Introduction to Optimum Design, 3e, Elsevier,
(https://www.researchgate.net/publication/273120102_Introductio
n_to_Optimum_design)
• Parkinson, Optimization Methods for Engineering Design, Brigham
Young University
(http://apmonitor.com/me575/index.php/Main/BookChapters)
• Iqbal, Fundamental Engineering Optimization Methods, BookBoon
(https://bookboon.com/en/fundamental-engineering-optimization-
methods-ebook)
6. Set Definitions
• Closed Set. A set 𝑆 is closed if it contains its limit points, i.e., for any
sequence of points 𝑥𝑘 , 𝑥𝑘 ∈ 𝑆, lim
𝑘→∞
𝑥𝑘 = 𝑥 ∈ 𝑆. For example, the
set 𝑆 = 𝑥: 𝑥 ≤ 𝑐 is closed.
• Bounded Set. A set 𝑆 is bounded if for every 𝑥 ∈ 𝑆, 𝑥 < 𝑐, where
∙ represents a vector norm and c is a finite number.
• Compact set. A set 𝑆 is compact if it is both closed and bounded.
• Open Set. A set 𝑆 is open if every 𝑥 ∈ 𝑆 is an interior point of 𝑆. For
example, the set 𝑆 = 𝑥: 𝑥 < 𝑐 is open.
7. Set Definitions
• Hyperplane. The set 𝑆 = 𝒙: 𝒂𝑇𝒙 = 𝑏 , where 𝒂 and 𝑏 are constants,
defines a hyperplane. A line is a hyperplane in two dimensions.
Note that the vector 𝒂 is normal to the hyperplane.
• Halfspace. The set 𝑆 = 𝒙: 𝒂𝑇
𝒙 ≤ 𝑏 , where 𝒂 and 𝑏 are constants,
defines a halfspace. Note that vector 𝒂 is normal to the halfspace.
• Convex Set. A set 𝑆 is convex if for every pair 𝑥, 𝑦 ∈ 𝑆, their convex
combination 𝛼𝑥 + 1 − 𝛼 𝑦 ∈ 𝑆 for 0 ≤ 𝛼 ≤ 1. A line segment, a
hyperplane, a halfspace, sets of real numbers (ℝ, ℝ𝑛) are convex.
• Extreme Point. A point 𝑥 ∈ 𝑆 is an extreme point (or vertex) of a
convex set 𝑆 if it cannot be expressed as 𝑥 = 𝛼𝑦 + 1 − 𝛼 𝑧, with
𝑦, 𝑧 ∈ 𝑆 where 𝑦, 𝑧 ≠ 𝑥, and 0 < 𝛼 < 1.
• Interior point. A point 𝑥 ∈ 𝑆 is interior to the set 𝑆 if
𝑦: 𝑦 − 𝑥 < 𝜖 ⊂ 𝑆 for some 𝜖 > 0.
8. Function Definitions
• Continuous Function. A function 𝑓(𝒙) is continuous at a point 𝒙0 if
lim
𝒙→𝒙0
𝑓(𝒙) = 𝑓(𝒙0).
• Affine Function. The function 𝑓 𝒙 = 𝒂𝑇𝒙 + 𝑏 is affine.
• Quadratic Function. A quadrative function is of the form:
𝑓 𝒙 =
1
2
𝒙𝑇𝑸𝒙 + 𝒂𝑇𝒙 + 𝑏, where 𝑸 is symmetric.
• Convex Functions. A function 𝑓(𝒙) defined on a convex set 𝑆 is
convex if and only if for every pair 𝒙, 𝒚 ∈ 𝑆,
𝑓 𝛼𝒙 + 1 − 𝛼 𝒚 ≤ 𝛼𝑓 𝒙 + 1 − 𝛼 𝑓 𝒚 , 𝛼 ∈ [0,1]
– Affine functions defined over convex sets are convex.
– Quadratic functions defined over convex sets are convex only if
𝑸 > 0, i.e., all its eigenvalues are positive.
9. The Gradient Vector
• The Gradient Vector. Let 𝑓(𝒙) = 𝑓 𝑥1, 𝑥2, … , 𝑥𝑛 be a real-valued
function of 𝑛 variables; the gradient of 𝑓 is a vector defined by:
𝛻𝑓 𝒙 𝑇 =
𝜕𝑓
𝜕𝑥1
,
𝜕𝑓
𝜕𝑥2
, … ,
𝜕𝑓
𝜕𝑥𝑛
. The gradient of 𝑓(𝒙) at a point 𝒙0 is
given as: 𝛻𝑓 𝒙0 = 𝛻𝑓 𝒙 |𝒙=𝒙0
.
• Directional Derivative. The directional derivative of 𝑓(𝒙) along any
direction 𝒅, is defined as: 𝑓𝒅
′
𝒙 = 𝛻𝑓(𝒙)𝑇𝒅. By definition, the
directional derivative at 𝒙0is maximum along 𝛻𝑓 𝒙0 .
10. The Hessian Matrix
• The Hessian of 𝑓(𝒙) is the 𝑛 × 𝑛 matrix 𝛻2
𝑓(𝒙) of second partial
derivatives, where 𝛻2𝑓 𝒙 𝑖𝑗 =
𝜕2𝑓
𝜕𝑥𝑖𝜕𝑥𝑗
. Note that the Hessian is
symmetric, since
𝜕2𝑓
𝜕𝑥𝑖𝜕𝑥𝑗
=
𝜕2𝑓
𝜕𝑥𝑗𝜕𝑥𝑖
.
• Example: let 𝑓 𝒙 =
1
2
𝒙𝑇𝑸𝒙 + 𝒂𝑇𝒙 + 𝑏 where 𝑸 is symmetric;
then, 𝛻𝑓 𝒙 = 𝑸𝒙 + 𝒂; 𝛻2𝑓 𝒙 = 𝑸
• Example: let 𝑓 𝑥, 𝑦 = 3𝑥2𝑦.
Then, 𝛻𝑓 𝑥, 𝑦 = 6𝑥𝑦, 3𝑥2 𝑇 and 𝛻2𝑓 𝑥, 𝑦 =
6𝑦 6𝑥
6𝑥 0
.
Let 𝑥0, 𝑦0 = 1,2 , then 𝛻𝑓 𝑥0, 𝑦0 =
12
3
, 𝛻2𝑓 𝑥0, 𝑦0 =
12 6
6 0
.
11. The Taylor Series
• The Taylor series expansion of 𝑓 𝑥 around 𝑥0 is given as:
𝑓 𝑥0 + Δ𝑥 = 𝑓 𝑥0 + 𝑓′ 𝑥0 Δ𝑥 +
1
2!
𝑓′′ 𝑥0 Δ𝑥2 + ⋯
• The 𝑛th order Taylor series approximation of 𝑓 𝑥 is given as:
𝑓 𝑥0 + Δ𝑥 ≅ 𝑓 𝑥0 + 𝑓′
𝑥0 Δ𝑥 +
1
2!
𝑓′′
𝑥0 Δ𝑥2
+ ⋯ +
1
𝑛!
𝑓(𝑛)
𝑥0 Δ𝑥𝑛
First order: 𝑓 𝑥0 + Δ𝑥 ≅ 𝑓 𝑥0 + 𝑓′ 𝑥0 Δ𝑥
Second order: 𝑓 𝑥0 + Δ𝑥 ≅ 𝑓 𝑥0 + 𝑓′
𝑥0 Δ𝑥 +
1
2!
𝑓′′
𝑥0 Δ𝑥2
• The local behavior of a function is approximated as:
𝑓 𝑥 − 𝑓 𝑥0 ≅ 𝑓′ 𝑥0 𝑥 − 𝑥0
12. Taylor Series
• The Taylor series expansion in the case of a multi-variable function
is given as (where 𝜹𝒙 = 𝒙 − 𝒙0):
𝑓 𝒙0 + 𝜹𝒙 = 𝑓 𝒙0 + 𝛻𝑓 𝒙0
𝑇𝜹𝒙 +
1
2!
𝜹𝒙𝑇𝛻2𝑓 𝒙0 𝜹𝒙 + ⋯
where 𝛻𝑓 𝒙0 and 𝛻2𝑓 𝒙0 are, respectively, the gradient and
Hessian of 𝑓 computed at 𝒙0.
• A first-order change in 𝑓(𝒙) at 𝒙0 along a direction 𝒅 is given by its
directional derivative:
𝛿𝑓 = 𝑓 𝒙 − 𝑓 𝒙0 = 𝛻𝑓 𝒙0
𝑇
𝒅
13. Quadratic Function Forms
• The quadratic (scalar) function form on 𝒙 is defined as:
𝑓 𝒙 = 𝒙𝑇
𝑸𝒙 = 𝑄𝑖,𝑗𝑥𝑖𝑥𝑗
𝑛
𝑗=1
𝑛
𝑖=1
Note that replacing 𝑸 by
1
2
(𝑸 + 𝑸𝑇
) does not change 𝑓 𝒙 .
Hence, in a quadratic form 𝑸 can always be assumed as symmetric
• The quadratic form is classified as:
– Positive definite if 𝒙𝑇𝑸𝒙 > 0 or 𝜆 𝑸 > 0
– Positive semidefinite if 𝒙𝑇
𝑸𝒙 ≥ 0 or 𝜆 𝑸 ≥ 0
– Negative definite if 𝒙𝑇
𝑸𝒙 < 0 or 𝜆 𝑸 < 0
– Negative semidefinite if 𝒙𝑇
𝑸𝒙 ≤ 0 or or 𝜆 𝑸 ≤ 0
– Infinite otherwise
14. Matrix Norms
• Norms provide a measure for the size of a vector or matrix, similar
to the notion of absolute value in the case of real numbers.
• Vector p-norms are defined by: 𝒙 𝑝 = 𝑥𝑖
𝑛
𝑖=1
1
𝑝, 𝑝 ≥ 1.
– 1-norm: 𝒙 1 = 𝑥𝑖
𝑛
𝑖=1 ,
– Euclidean norm: 𝒙 2 = 𝑥𝑖
2
𝑛
𝑖=1 ,
– ∞-norm: 𝒙 ∞ = max
1≤𝑖≤𝑛
𝑥𝑖 .
• Induced matrix norms are defined by: 𝑨 = max
𝑥 =1
𝑨𝒙 ≤ 𝑨 𝒙 .
– 𝑨 1 = max
1≤𝑗<𝑛
𝐴𝑖,𝑗
𝑛
𝑖=1 (the largest column sum of 𝑨)
– 𝑨 2 = 𝜆𝑚𝑎𝑥(𝑨𝑇𝑨), where 𝜆𝑚𝑎𝑥 is the largest eigenvalue of 𝑨
– 𝑨 ∞ = max
1≤𝑖≤𝑛
𝐴𝑖,𝑗
𝑛
𝑗=1 (the largest row sum of 𝑨)
15. Properties of Convex Functions
• If 𝑓 ∈ ∁1
(i.e., 𝑓 is differentiable), then 𝑓 is convex over a convex set
𝑆 if and only if for all 𝒙, 𝒚 ∈ 𝑆, 𝑓 𝒚 ≥ 𝑓 𝒙 + 𝛻𝑓 𝒙 𝑇
(𝒚 − 𝒙).
Graphically, it means that a function is on or above the tangent line
(hyperplane) passing through 𝒙.
• If 𝑓 ∈ ∁2
(i.e., 𝑓 is twice differentiable), then 𝑓 is convex over a
convex set 𝑆 if and only if 𝑓′′(𝒙) ≥ 0 for all 𝒙 ∈ 𝑆 . In the case of
multivariable functions, 𝑓 is convex over a convex set 𝑆 if and only if
its Hessian matrix is positive semi-definite everywhere in 𝑆, i.e., for
all 𝒙 ∈ 𝑆 and for all 𝒅, 𝒅𝑇
𝛁2
𝑓 𝒙 𝒅 ≥ 0.
• If the Hessian is positive definite for all 𝒙 ∈ 𝑆 and for all 𝒅, i.e., if
𝒅𝑇𝛁2𝑓 𝒙 𝒅 > 0, then the function is strictly convex.
• If 𝑓(𝒙∗
) is a local minimum for a convex function 𝑓 defined over a
convex set 𝑆, then it is also a global minimum.
16. Solving Linear System of Equations
• A system of 𝑚 linear equations in 𝑛 unknowns described as:
𝑨𝒙 = 𝒃, where 𝑨 has a rank 𝑟 = min(𝑚, 𝑛).
• A solution to the system exists only if rank 𝑨 = rank 𝑨 𝒃 , i.e., only
if 𝒃 lies in the column space of 𝑨. The solution is unique if 𝑟 = 𝑛.
– For 𝑚 = 𝑛, a solution is obtained as: 𝒙 = 𝑨−1
𝒃
– The general solution for 𝑚 < 𝑛 is obtained by resolving the system
into canonical form: 𝑰𝒙(𝑚) + 𝑸𝒙(𝑛−𝑚) = 𝒃′, where 𝒙(𝑚) are 𝑚
dependent variables and 𝒙(𝑛−𝑚) are independent variables.
– The general solution is given as: 𝒙(𝑚) = 𝒃′ − 𝑸𝒙(𝑛−𝑚).
– A basic solution is obtained as: 𝒙(𝑛−𝑚) = 𝟎, 𝒙(𝑚) = 𝒃′
.
– For 𝑚 > 𝑛, the system is inconsistent, but can be solved in the
least-squares sense.
17. Example: General and Basic Solution
• Consider the LP problem
max
𝒙
𝑧 = 2𝑥1 + 3𝑥2
Subject to: 𝑥1 ≤ 3, 𝑥2 ≤ 5, 2𝑥1 + 𝑥2 ≤ 7; 𝑥1, 𝑥2 ≥ 0
• Add slack variables to turn inequality constraints into equality:
𝑥1 + 𝑠1 = 3, 𝑥2 + 𝑠2 = 5, 2𝑥1 + 𝑥2 + 𝑠3 = 7
• Using 𝑥1, 𝑥2 as independent variables, the system is written as:
1
1
1
𝑠1
𝑠2
𝑠3
+
1 0
0 1
2 1
𝑥1
𝑥2
=
3
5
7
• Choosing 𝑥1 = 𝑥2 = 0, we obtain a basic solution as:
𝑠1
𝑠2
𝑠3
=
3
5
7
• A different choice of independent variables will result in different
basic solution.
18. Linear Systems of Equations
• The general solution to 𝑨𝒙 = 𝒃 may be written as: 𝒙 = 𝑨†
𝒃, where
𝑨†
is the pseudo-inverse of 𝑨, defined as:
– 𝑨† = 𝑨−1 (m=n)
– 𝑨†
= 𝑨𝑇
𝑨𝑨𝑇 −1
(m<n)
– 𝑨†
= 𝑨𝑇
𝑨 −1
𝑨𝑇
(m>n)
19. Linear Diophantine System of Equations
• A Linear Diophantine system of equations (LDSE) is represented as:
𝑨𝒙 = 𝒃, 𝒙 ∈ ℤ𝑛.
• A square matrix 𝑨 ∈ ℤ𝑛×𝑛
is unimodular if det 𝑨 = ±1.
– If 𝑨 ∈ ℤ𝑛×𝑛
is unimodular, then 𝑨−1
∈ ℤ𝑛×𝑛
is also unimodular.
– Assume that 𝑨 is unimodular and 𝒃 is an integer vector, then every
solution {𝒙|𝑨𝒙 = 𝒃} is integral.
• A non-square matrix 𝑨 ∈ ℤ𝑚×𝑛
is totally unimodular if every square
submatrix 𝑪 of 𝑨, has det 𝑪 ∈ 0, ±1 .
– Assume that 𝑨 is totally unimodular and 𝒃 is an integer vector,
then every basic solution {𝒙|𝑨𝒙 = 𝒃} is integral.
– Note, a basic solution has only 𝑚 non-zero elements
20. Example: Integer BFS
• Consider the LP problem with integral coefficients
max
𝒙
𝑧 = 2𝑥1 + 3𝑥2
Subject to: 𝑥1 ≤ 3, 𝑥2 ≤ 5, 𝑥1 + 𝑥2 ≤ 7, 𝒙 ∈ ℤ2, 𝒙 ≥ 𝟎
• Add slack variables and write the constraints in matrix form as: :
𝑨 =
1
0
1
0
1
1
1
0
0
0
1
0
0
0
1
, 𝒃 =
3
5
7
where columns of 𝑨 represents variables and rows represent the
constraints. Note that 𝑨 is totally unimodular and 𝒃 ∈ ℤ3
.
• Then, using the simplex method, the optimal integral solution is
obtained as: 𝒙𝑇
= 2,5,1,0,0 , with 𝑧∗
= 19.
21. Condition Numbers and Convergence Rates
• The condition number of matrix 𝑨 is defined as: 𝑐𝑜𝑛𝑑 𝑨 = 𝑨 ∙
𝑨−1
. Note that,
– 𝑐𝑜𝑛𝑑 𝑨 ≥ 1
– 𝑐𝑜𝑛𝑑 𝑰 = 1
– If 𝑨 is symmetric with real eigenvalues, then 𝑐𝑜𝑛𝑑 𝑨 =
𝜆𝑚𝑎𝑥(𝑨)/𝜆𝑚𝑖𝑛(𝑨).
• The condition number of the Hessian matrix affects the
convergence rate of the optimization algorithm
22. Convergence Rates of Numerical Algorithms
• Assume that a sequence of points {𝑥𝑘} converges to a solution 𝑥∗,
and let 𝑒𝑘 = 𝑥𝑘
− 𝑥∗
. Then, the sequence {𝑥𝑘
} converges to 𝑥∗
with
rate 𝑟 and rate constant 𝐶 if 𝑒𝑘+1 = 𝐶 𝑒𝑘
𝑟.
Note that convergence is faster if 𝑟 is large and 𝐶 is small.
• Linear convergence. For 𝑟 = 1, 𝑒𝑘+1 = 𝐶 𝑒𝑘 , i.e., convergence is
linear with 𝐶 ≈
𝑓 𝑥𝑘+1 −𝑓(𝑥∗)
𝑓 𝑥𝑘 −𝑓(𝑥∗)
.
• Quadratic Convergence. For 𝑟 = 2, 𝑒𝑘+1 = 𝐶 𝑒𝑘
2.
If, additionally, 𝐶 = 1, then the number of correct digits doubles at
every iteration.
• Superlinear Convergence. For 1 < 𝑟 < 2, the convergence is
superlinear. Numerical algorithms that only use the gradient
information can achieve superlinear convergence.
23. Newton’s Method
• Newton’s method iteratively solves the equation: 𝑓 𝑥 = 0.
– Starting from an initial guess 𝑥0, it generates a series of solutions
𝑥𝑘 that converge to a fixed point 𝑓 𝑥∗ = 0.
– Newton’s method for a single variable function is given as:
𝑥𝑘+1 = 𝑥𝑘 −
𝑓 𝑥𝑘
𝑓′ 𝑥𝑘
• For a system of equations, let 𝐽 𝒙 = 𝛻𝑓1 𝒙 , 𝛻𝑓2 𝒙 , … , 𝛻𝑓𝑛 𝒙 𝑇
;
then, Newton’s update is given as: 𝒙𝑘+1 = 𝒙𝑘 − 𝐽 𝒙𝑘
−1
𝑓 𝒙𝑘
• The Newton’s method achieves quadratic convergence with a rate
constant: 𝐶 =
1
2
𝑓′′(𝑥∗)
𝑓′(𝑥∗)
.
24. Conjugate Gradient Method
• The conjugate-gradient (CG) method was designed to iteratively
solve the linear system of equations: 𝑨𝒙 = 𝒃, where 𝑨 is assumed
normal, i.e., 𝑨𝑇
𝑨 = 𝑨𝑨𝑇
.
• The method initializes with 𝒙0 = 𝟎, and obtains an approximate
solution 𝒙𝑛 in 𝑛 iterations.
• The method generates a set of vectors 𝒗1, 𝒗2, … , 𝒗𝑛 that are
conjugate with respect to 𝑨 matrix, i.e., 𝒗𝑖
𝑇
𝑨𝒗𝑗 = 0, 𝑖 ≠ 𝑗.
– Let 𝒗−1 = 𝟎, 𝛽0 = 0; and define 𝒓𝑖 = 𝒃 − 𝑨𝒙𝑖. Then, a set of
conjugate vectors with respect to 𝑨 is iteratively generated as:
𝒗𝑖 = 𝒓𝑖 + 𝛽𝑖𝒗𝑖−1, 𝛽𝑖 =
𝒗𝑖
𝑇
𝑨𝒓𝑖
𝒗𝑖
𝑇𝑨𝒗𝑖
.
View publication stats
View publication stats