The document discusses various optimization methods for minimizing objective functions. It covers continuous, discrete, and combinatorial optimization problems. Common approaches to optimization include analytical, graphical, experimental, and numerical methods. Specific techniques discussed include gradient descent, Newton's method, conjugate gradient descent, and quasi-Newton methods. The document also covers linear programming, integer programming, and nonlinear programming.
Lockhart and Johnson (1996) define optimization as “the process of finding the most effective or favorable value or condition” (p. 610). The purpose of optimization is to achieve the “best” design relative to a set of prioritized criteria or constraints.Within the traditional engineering disciplines, optimization techniques are commonly employed for a variety of problems, including: Product-Mix Problems. Determine the mix of products in a factory that will make the best use of machines, labor resources, raw materials, while maximizing the companies profitsOptimization involves the selection of the “best” solution from among the set of candidate solutions. The degree of goodness of the solution is quantified using an objective function (e.g., cost) which is to be minimized or maximized.Optimization problem: Maximizing or minimizing some function relative to some set,
often representing a range of choices available in a certain situation. The function
allows comparison of the different choices for determining which might be “best.”
Common applications: Minimal cost, maximal profit, minimal error, optimal design,
optimal management, variational principles.
Goals of the subject: The understanding of
Modeling issues—
What to look for in setting up an optimization problem?
What features are advantageous or disadvantageous?
What devices/tricks of formulation are available?
How can problems usefully be categorized?
Analysis of solutions—
What is meant by a “solution?”
When do solutions exist, and when are they unique?
How can solutions be recognized and characterized?
What happens to solutions under perturbations?
Numerical methods—
How can solutions be determined by iterative schemes of computation?
What modes of local simplification of a problem are convenient/appropriate?
How can different solution techniques be compared and evaluated?Distinguishing features of optimization as a mathematical discipline:
descriptive −→ prescriptive
equations −→ inequalities
linear/nonlinear −→ convex/nonconvex
differential calculus −→ subdifferential calculus
1
Finite-dimensional optimization: The case where a choice corresponds to selecting
the values of a finite number of real variables, called decision variables. For general
purposes the decision variables may be denoted by x1, . . . , xn and each possible choice
therefore identified with a point x = (x1, . . . , xn) in the space IRn
. This is what we’ll
be focusing on in this course.
Feasible set: The subset C of IRn
representing the allowable choices x = (x1, . . . , xn).
Objective function: The function f0(x) = f0(x1, . . . , xn) that is to be maximized or
minimized over C.
Constraints: Side conditions that are used to specify the feasible set C within IRn
.
Equality constraints: Conditions of the form fi(x) = ci
for certain functions fi on IRn
and constants ci
in IRn
.
Inequality constraints: Conditions of the form fi(x) ≤ ci or fi(x) ≥ ci
for certain
functions fi on IRn
and constants ci
in IR.
Range constarintt
One of the central tasks in computational mathematics and statistics is to accurately approximate unknown target functions. This is typically done with the help of data — samples of the unknown functions. The emergence of Big Data presents both opportunities and challenges. On one hand, big data introduces more information about the unknowns and, in principle, allows us to create more accurate models. On the other hand, data storage and processing become highly challenging. In this talk, we present a set of sequential algorithms for function approximation in high dimensions with large data sets. The algorithms are of iterative nature and involve only vector operations. They use one data sample at each step and can handle dynamic/stream data. We present both the numerical algorithms, which are easy to implement, as well as rigorous analysis for their theoretical foundation.
Lockhart and Johnson (1996) define optimization as “the process of finding the most effective or favorable value or condition” (p. 610). The purpose of optimization is to achieve the “best” design relative to a set of prioritized criteria or constraints.Within the traditional engineering disciplines, optimization techniques are commonly employed for a variety of problems, including: Product-Mix Problems. Determine the mix of products in a factory that will make the best use of machines, labor resources, raw materials, while maximizing the companies profitsOptimization involves the selection of the “best” solution from among the set of candidate solutions. The degree of goodness of the solution is quantified using an objective function (e.g., cost) which is to be minimized or maximized.Optimization problem: Maximizing or minimizing some function relative to some set,
often representing a range of choices available in a certain situation. The function
allows comparison of the different choices for determining which might be “best.”
Common applications: Minimal cost, maximal profit, minimal error, optimal design,
optimal management, variational principles.
Goals of the subject: The understanding of
Modeling issues—
What to look for in setting up an optimization problem?
What features are advantageous or disadvantageous?
What devices/tricks of formulation are available?
How can problems usefully be categorized?
Analysis of solutions—
What is meant by a “solution?”
When do solutions exist, and when are they unique?
How can solutions be recognized and characterized?
What happens to solutions under perturbations?
Numerical methods—
How can solutions be determined by iterative schemes of computation?
What modes of local simplification of a problem are convenient/appropriate?
How can different solution techniques be compared and evaluated?Distinguishing features of optimization as a mathematical discipline:
descriptive −→ prescriptive
equations −→ inequalities
linear/nonlinear −→ convex/nonconvex
differential calculus −→ subdifferential calculus
1
Finite-dimensional optimization: The case where a choice corresponds to selecting
the values of a finite number of real variables, called decision variables. For general
purposes the decision variables may be denoted by x1, . . . , xn and each possible choice
therefore identified with a point x = (x1, . . . , xn) in the space IRn
. This is what we’ll
be focusing on in this course.
Feasible set: The subset C of IRn
representing the allowable choices x = (x1, . . . , xn).
Objective function: The function f0(x) = f0(x1, . . . , xn) that is to be maximized or
minimized over C.
Constraints: Side conditions that are used to specify the feasible set C within IRn
.
Equality constraints: Conditions of the form fi(x) = ci
for certain functions fi on IRn
and constants ci
in IRn
.
Inequality constraints: Conditions of the form fi(x) ≤ ci or fi(x) ≥ ci
for certain
functions fi on IRn
and constants ci
in IR.
Range constarintt
One of the central tasks in computational mathematics and statistics is to accurately approximate unknown target functions. This is typically done with the help of data — samples of the unknown functions. The emergence of Big Data presents both opportunities and challenges. On one hand, big data introduces more information about the unknowns and, in principle, allows us to create more accurate models. On the other hand, data storage and processing become highly challenging. In this talk, we present a set of sequential algorithms for function approximation in high dimensions with large data sets. The algorithms are of iterative nature and involve only vector operations. They use one data sample at each step and can handle dynamic/stream data. We present both the numerical algorithms, which are easy to implement, as well as rigorous analysis for their theoretical foundation.
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Yandex
We consider a new class of huge-scale problems, the problems with sparse subgradients. The most important functions of this type are piecewise linear. For optimization problems with uniform sparsity of corresponding linear operators, we suggest a very efficient implementation of subgradient iterations, the total cost of which depends logarithmically in the dimension. This technique is based on a recursive update of the results of matrix/vector products and the values of symmetric functions. It works well, for example, for matrices with few nonzero diagonals and for max-type functions.
We show that the updating technique can be efficiently coupled with the simplest subgradient methods. Similar results can be obtained for a new non-smooth random variant of a coordinate descent scheme. We also present promising results of preliminary computational experiments.
Least Square Optimization and Sparse-Linear SolverJi-yong Kwon
Short slide that explains about the least square problem and its practical solution, including Poisson Image editing example and brief introduction of sparse linear solver.
Convex Optimization Modelling with CVXOPTandrewmart11
An introduction to convex optimization modelling using cvxopt in an IPython environment. The facility location problem is used as an example to demonstrate modelling in cvxopt.
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Sean Moran
In this paper we focus on improving the effectiveness of hashing-based approximate nearest neighbour search. Generating similarity preserving hashcodes for images has been shown to be an effective and efficient method for searching through large datasets. Hashcode generation generally involves two steps: bucketing the input feature space with a set of hyperplanes, followed by quantising the projection of the data-points onto the normal vectors to those hyperplanes. This procedure results in the makeup of the hashcodes depending on the positions of the data-points with respect to the hyperplanes in the feature space, allowing a degree of locality to be encoded into the hashcodes. In this paper we study the effect of learning both the hyperplanes and the thresholds as part of the same model. Most previous research either learn the hyperplanes assuming a fixed set of thresholds, or vice-versa. In our experiments over two standard image datasets we find statistically significant increases in retrieval effectiveness versus a host of state-of-the-art data-dependent and independent hashing models.
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Yandex
We consider a new class of huge-scale problems, the problems with sparse subgradients. The most important functions of this type are piecewise linear. For optimization problems with uniform sparsity of corresponding linear operators, we suggest a very efficient implementation of subgradient iterations, the total cost of which depends logarithmically in the dimension. This technique is based on a recursive update of the results of matrix/vector products and the values of symmetric functions. It works well, for example, for matrices with few nonzero diagonals and for max-type functions.
We show that the updating technique can be efficiently coupled with the simplest subgradient methods. Similar results can be obtained for a new non-smooth random variant of a coordinate descent scheme. We also present promising results of preliminary computational experiments.
Least Square Optimization and Sparse-Linear SolverJi-yong Kwon
Short slide that explains about the least square problem and its practical solution, including Poisson Image editing example and brief introduction of sparse linear solver.
Convex Optimization Modelling with CVXOPTandrewmart11
An introduction to convex optimization modelling using cvxopt in an IPython environment. The facility location problem is used as an example to demonstrate modelling in cvxopt.
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Sean Moran
In this paper we focus on improving the effectiveness of hashing-based approximate nearest neighbour search. Generating similarity preserving hashcodes for images has been shown to be an effective and efficient method for searching through large datasets. Hashcode generation generally involves two steps: bucketing the input feature space with a set of hyperplanes, followed by quantising the projection of the data-points onto the normal vectors to those hyperplanes. This procedure results in the makeup of the hashcodes depending on the positions of the data-points with respect to the hyperplanes in the feature space, allowing a degree of locality to be encoded into the hashcodes. In this paper we study the effect of learning both the hyperplanes and the thresholds as part of the same model. Most previous research either learn the hyperplanes assuming a fixed set of thresholds, or vice-versa. In our experiments over two standard image datasets we find statistically significant increases in retrieval effectiveness versus a host of state-of-the-art data-dependent and independent hashing models.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
2. Categorization of Optimization Problems
Continuous Optimization
Discrete Optimization
Combinatorial Optimization
Variational Optimization
Common Optimization Concepts in Computer Vision
Energy Minimization
Graphs
Markov Random Fields
Several general approaches to optimization are as follows:
Analytical methods
Graphical methods
Experimental methods
Numerical methods
Several branches of mathematical programming have
evolved, as follows:
Linear programming
Integer programming
Quadratic programming
Nonlinear programming
Dynamic programming
3. 1. THE OPTIMIZATION PROBLEM
1.1 Introduction
1.2 The Basic Optimization Problem
2. BASIC PRINCIPLES
2.1 Introduction
2.2 Gradient Information
3. GENERAL PROPERTIES OF ALGORITHMS
3.5 Descent Functions
3.6 Global Convergence
3.7 Rates of Convergence
4. ONE-DIMENSIONAL OPTIMIZATION
4.3 Fibonacci Search
4.4 Golden-Section Search2
4.5 Quadratic Interpolation Method
5. BASIC MULTIDIMENSIONAL GRADIENT
METHODS
5.1 Introduction
5.2 Steepest-Descent Method
5.3 Newton Method
5.4 Gauss-Newton Method
5. FUNDAMENTALS OF CONSTRAINED OPTIMIZATION
10.1 Introduction
10.2 Constraints
LINEAR PROGRAMMING PART I: THE SIMPLEX METHOD
11.1 Introduction
11.2 General Properties
11.3 Simplex Method
LINEAR PROGRAMMING PART I: THE SIMPLEX METHOD
11.1 Introduction
11.2 General Properties
11.3 Simplex Method
QUADRATIC AND CONVEX PROGRAMMING
13.1 Introduction
13.2 Convex QP Problems with Equality Constraints
13.3 Active-Set Methods for Strictly Convex QP Problems
GENERAL NONLINEAR OPTIMIZATION PROBLEMS
15.1 Introduction
15.2 Sequential Quadratic Programming Methods
6. Problem specification
Suppose we have a cost function (or objective function)
Our aim is to find values of the parameters (decision variables) x that
minimize this function
Subject to the following constraints:
• equality:
• nonequality:
If we seek a maximum of f(x) (profit function) it is equivalent to seeking
a minimum of –f(x)
7. Books to read
• Practical Optimization
– Philip E. Gill, Walter Murray, and Margaret H.
Wright, Academic Press,
1981
• Practical Optimization: Algorithms and
Engineering Applications
– Andreas Antoniou and Wu-Sheng Lu
2007
• Both cover unconstrained and constrained
optimization. Very clear and comprehensive.
8. Further reading and web resources
• Numerical Recipes in C (or C++) : The Art
of Scientific Computing
– William H. Press, Brian P. Flannery, Saul A.
Teukolsky, William T. Vetterling
– Good chapter on optimization
– Available on line at
(1992 ed.) www.nrbook.com/a/bookcpdf.php
(2007 ed.) www.nrbook.com
• NEOS Guide
www-fp.mcs.anl.gov/OTC/Guide/
• This powerpoint presentation
www.utia.cas.cz
9. Introductory Items in OPTIMIZATION
- Category of Optimization methods
- Constrained vs Unconstrained
- Feasible Region
• Gradient and Taylor Series Expansion
• Necessary and Sufficient Conditions
• Saddle Point
• Convex/concave functions
• 1-D search – Dichotomous, Fibonacci – Golden
Section, DSC;
• Steepest Descent; Newton; Gauss-Newton
• Conjugate Gradient;
• Quasi-Newton; Minimax;
• Lagrange Multiplier; Simplex; Prinal-Dual,
Quadratic programming; Semi-definite;
10. Types of minima
• which of the minima is found depends on the starting
point
• such minima often occur in real applications
x
f(x)
strong
local
minimum
weak
local
minimum strong
global
minimum
strong
local
minimum
feasible region
11. Unconstrained univariate optimization
Assume we can start close to the global minimum
How to determine the minimum?
• Search methods (Dichotomous, Fibonacci, Golden-Section)
• Approximation methods
1. Polynomial interpolation
2. Newton method
• Combination of both (alg. of Davies, Swann, and Campey)
12. Search methods
• Start with the interval (“bracket”) [xL, xU] such that the
minimum x* lies inside.
• Evaluate f(x) at two point inside the bracket.
• Reduce the bracket.
• Repeat the process.
• Can be applied to any function and differentiability is not
essential.
13. XL XU X1 Criteria
0 1 1/2 FA > FB
1/2 1 3/4 FA < FB
1/2 3/4 5/8 FA < FB
1/2 5/8 9/16 FA > FB
9/16 5/8 19/32 FA < FB
9/16 19/32 37/64
XL XU RANGE
0 1 1
1/2 1 1/2
1/2 3/4 1/4
1/2 5/8 1/8
9/16 5/8 1/16
9/16 19/32 1/32
23. 1D function
As an example consider the function
(assume we do not know the actual function expression from now on)
24. Gradient descent
Given a starting location, x0, examine df/dx
and move in the downhill direction
to generate a new estimate, x1 = x0 + δx
How to determine the step size δx?
25. Polynomial interpolation
• Bracket the minimum.
• Fit a quadratic or cubic polynomial which
interpolates f(x) at some points in the interval.
• Jump to the (easily obtained) minimum of the
polynomial.
• Throw away the worst point and repeat the
process.
26. Polynomial interpolation
• Quadratic interpolation using 3 points, 2 iterations
• Other methods to interpolate?
– 2 points and one gradient
– Cubic interpolation
27. Newton method
• Expand f(x) locally using a Taylor series.
• Find the δx which minimizes this local quadratic
approximation.
• Update x.
Fit a quadratic approximation to f(x) using both gradient and
curvature information at x.
28. Newton method
• avoids the need to bracket the root
• quadratic convergence (decimal accuracy doubles
at every iteration)
29. Newton method
• Global convergence of Newton’s method is poor.
• Often fails if the starting point is too far from the minimum.
• in practice, must be used with a globalization strategy
which reduces the step length until function decrease is
assured
30. Extension to N (multivariate) dimensions
• How big N can be?
– problem sizes can vary from a handful of parameters to
many thousands
• We will consider examples for N=2, so that cost
function surfaces can be visualized.
31. An Optimization Algorithm
• Start at x0, k = 0.
1. Compute a search direction pk
2. Compute a step length αk, such that f(xk + αk pk ) < f(xk)
3. Update xk = xk + αk pk
4. Check for convergence (stopping criteria)
e.g. df/dx = 0
Reduces optimization in N dimensions to a series of (1D) line minimizations
k = k+1
32. Taylor expansion
A function may be approximated locally by its Taylor series
expansion about a point x*
where the gradient is the vector
and the Hessian H(x*) is the symmetric matrix
36. Quadratic functions
• The vector g and the Hessian H are constant.
• Second order approximation of any function by the Taylor
expansion is a quadratic function.
We will assume only quadratic functions for a while.
37. Necessary conditions for a minimum
Expand f(x) about a stationary point x* in direction p
since at a stationary point
At a stationary point the behavior is determined by H
38. • H is a symmetric matrix, and so has orthogonal
eigenvectors
• As |α| increases, f(x* + αui) increases, decreases
or is unchanging according to whether λi is
positive, negative or zero
39. Examples of quadratic functions
Case 1: both eigenvalues positive
with
minimum
positive
definite
40. Examples of quadratic functions
Case 2: eigenvalues have different sign
with
saddle point
indefinite
41. Examples of quadratic functions
Case 3: one eigenvalues is zero
with
parabolic cylinder
positive
semidefinite
42. Optimization for quadratic functions
Assume that H is positive definite
There is a unique minimum at
If N is large, it is not feasible to perform this inversion directly.
43. Steepest descent
• Basic principle is to minimize the N-dimensional function
by a series of 1D line-minimizations:
• The steepest descent method chooses pk to be parallel to
the gradient
• Step-size αk is chosen to minimize f(xk + αkpk).
For quadratic forms there is a closed form solution:
Prove it!
44. Steepest descent
• The gradient is everywhere perpendicular to the contour
lines.
• After each line minimization the new gradient is always
orthogonal to the previous step direction (true of any line
minimization).
• Consequently, the iterates tend to zig-zag down the
valley in a very inefficient manner
45. Steepest descent
• The 1D line minimization must be performed using one
of the earlier methods (usually cubic polynomial
interpolation)
• The zig-zag behaviour is clear in the zoomed view
• The algorithm crawls down the valley
46.
47.
48.
49.
50. Newton method
Expand f(x) by its Taylor series about the point xk
where the gradient is the vector
and the Hessian is the symmetric matrix
51. Newton method
For a minimum we require that , and so
with solution . This gives the iterative update
• If f(x) is quadratic, then the solution is found in one step.
• The method has quadratic convergence (as in the 1D case).
• The solution is guaranteed to be a downhill direction.
• Rather than jump straight to the minimum, it is better to perform a line
minimization which ensures global convergence
• If H=I then this reduces to steepest descent.
52. Newton method
- example
• The algorithm converges in only 18 iterations compared
to the 98 for conjugate gradients.
• However, the method requires computing the Hessian
matrix at each iteration – this is not always feasible
54. Method Alpha ( ) Calculation Intermediate Updates Convergence
Condition
Steepest-
Descent
Method
(Method 1)
Find , the value of that
minimizes ( + ),
using line search
= +
= −
= ( )
If <∈,
then ∗
= ,
∗
=
Else = + 1
Steepest-
Descent
Method
(Method 2)
Without Using Line Search
≈
= +
= −
= ( ) --” ”--
Newton
Method
Find , the value of that
minimizes ( + ),
using line search
= +
= −
= ( )
--” ”--
Gauss-
Newton
Method
Find , the value of that
minimizes F + ,
using line search
= =
= …
= +
= −
= 2
≈ 2 = ( )
= − ( )
= −
( ) = ( )
( ) = ( )
If − <∈
then ∗
= ,
F ∗
=
Else = + 1
55.
56. Conjugate gradient
• Each pk is chosen to be conjugate to all previous search
directions with respect to the Hessian H:
• The resulting search directions are mutually linearly
independent.
• Remarkably, pk can be chosen using only knowledge of
pk-1, , and
Prove it!
57.
58. Conjugate gradient
• An N-dimensional quadratic form can be minimized in at
most N conjugate descent steps.
• 3 different starting points.
• Minimum is reached in exactly 2 steps.
59. Optimization for General functions
Apply methods developed using quadratic Taylor series expansion
61. Conjugate gradient
• Again, an explicit line minimization must be used at
every step
• The algorithm converges in 98 iterations
• Far superior to steepest descent
62. Quasi-Newton methods
• If the problem size is large and the Hessian matrix is
dense then it may be infeasible/inconvenient to compute
it directly.
• Quasi-Newton methods avoid this problem by keeping a
“rolling estimate” of H(x), updated at each iteration using
new gradient information.
• Common schemes are due to Broyden, Goldfarb,
Fletcher and Shanno (BFGS), and also Davidson,
Fletcher and Powell (DFP).
• The idea is based on the fact that for quadratic functions
holds
and by accumulating gk’s and xk’s we can calculate H.
63. Quasi-Newton BFGS method
• Set H0 = I.
• Update according to
where
• The matrix inverse can also be computed in this way.
• Directions δk‘s form a conjugate set.
• Hk+1 is positive definite if Hk is positive definite.
• The estimate Hk is used to form a local quadratic
approximation as before
64. BFGS example
• The method converges in 34 iterations, compared to
18 for the full-Newton method
66. Method Alpha ( ) Calculation Intermediate Updates Convergence
Condition
Steepest-
Descent
Method
(Method 1)
Find , the value of that
minimizes ( + ),
using line search
= +
= −
= ( )
If <∈,
then ∗
= ,
∗
=
Else = + 1
Steepest-
Descent
Method
(Method 2)
Without Using Line Search
≈
= +
= −
= ( ) --” ”--
Newton
Method
Find , the value of that
minimizes ( + ),
using line search
= +
= −
= ( )
--” ”--
Gauss-
Newton
Method
Find , the value of that
minimizes F + ,
using line search
= =
= …
= +
= −
= 2
≈ 2 = ( )
= − ( )
= −
( ) = ( )
( ) = ( )
If − <∈
then ∗
= ,
F ∗
=
Else = + 1
67. Method Alpha ( )
Calculation
Intermediate Updates Convergence
Condition
Coordinate
Descent
Find , the value
of that minimizes
( + ), using
line search
= +
= 0 0 … 0 0 … 0
= ( )
If <∈,
then ∗
= ,
∗
=
Elseif k==n, then
= , = 1
Else = + 1
Conjugate
Gradient
Without Using Line
Search
=
= −
= +
=
= − +
= ( )
If <∈,
then ∗
= ,
∗
=
Else = + 1
Quasi-
Newton
Method
Find , the value
of that minimizes
( + ), using
line search
= +
= ; = −
= −
/∗ = + ∗/
=
= +
( − )( − )
( − )
= −
If <∈, then
∗
= ,
∗
=
Else = + 1
68. Non-linear least squares
• It is very common in applications for a cost
function f(x) to be the sum of a large number of
squared residuals
• If each residual depends non-linearly on the
parameters x then the minimization of f(x) is a
non-linear least squares problem.
69. Non-linear least squares
• The M × N Jacobian of the vector of residuals r is defined
as
• Consider
• Hence
70. Non-linear least squares
• For the Hessian holds
• Note that the second-order term in the Hessian is multiplied by the
residuals ri.
• In most problems, the residuals will typically be small.
• Also, at the minimum, the residuals will typically be distributed with
mean = 0.
• For these reasons, the second-order term is often ignored.
• Hence, explicit computation of the full Hessian can again be avoided.
Gauss-Newton
approximation
71. Gauss-Newton example
• The minimization of the Rosenbrock function
• can be written as a least-squares problem with
residual vector
75. Constrained Optimization
Subject to:
• Equality constraints:
• Nonequality constraints:
• Constraints define a feasible region, which is nonempty.
• The idea is to convert it to an unconstrained optimization.
76. Equality constraints
• Minimize f(x) subject to: for
• The gradient of f(x) at a local minimizer is equal to the
linear combination of the gradients of ai(x) with
Lagrange multipliers as the coefficients.
77. f3 > f2 > f1
f3 > f2 > f1
f3 > f2 > f1
is not a minimizer
x* is a minimizer, λ*>0
x* is a minimizer, λ*<0
x* is not a minimizer
79. 3D Example
f(x) = 3
Gradients of constraints and objective function are linearly independent.
80. 3D Example
f(x) = 1
Gradients of constraints and objective function are linearly dependent.
81. Inequality constraints
• Minimize f(x) subject to: for
• The gradient of f(x) at a local minimizer is equal to the
linear combination of the gradients of cj(x), which are
active ( cj(x) = 0 )
• and Lagrange multipliers must be positive,
82. f3 > f2 > f1
f3 > f2 > f1
f3 > f2 > f1
No active constraints
at x*,
x* is not a minimizer, μ<0
x* is a minimizer, μ>0
83. Lagrangien
• We can introduce the function (Lagrangien)
• The necessary condition for the local minimizer is
and it must be a feasible point (i.e. constraints are
satisfied).
• These are Karush-Kuhn-Tucker conditions
84. Quadratic Programming (QP)
• Like in the unconstrained case, it is important to study
quadratic functions. Why?
• Because general nonlinear problems are solved as a
sequence of minimizations of their quadratic
approximations.
• QP with constraints
Minimize
subject to linear constraints.
• H is symmetric and positive semidefinite.
85. QP with Equality Constraints
• Minimize
Subject to:
• Ass.: A is p × N and has full row rank (p<N)
• Convert to unconstrained problem by variable
elimination:
Minimize
This quadratic unconstrained problem can be solved, e.g.,
by Newton method.
Z is the null space of A
A+ is the pseudo-inverse.
86. QP with inequality constraints
• Minimize
Subject to:
• First we check if the unconstrained minimizer
is feasible.
If yes we are done.
If not we know that the minimizer must be on the
boundary and we proceed with an active-set method.
• xk is the current feasible point
• is the index set of active constraints at xk
• Next iterate is given by
87. Active-set method
• How to find dk?
– To remain active thus
– The objective function at xk+d becomes
where
• The major step is a QP sub-problem
subject to:
• Two situations may occur: or
88. Active-set method
•
We check if KKT conditions are satisfied
and
If YES we are done.
If NO we remove the constraint from the active set with the most
negative and solve the QP sub-problem again but this time with
less active constraints.
•
We can move to but some inactive constraints
may be violated on the way.
In this case, we move by till the first inactive constraint
becomes active, update , and solve the QP sub-problem again
but this time with more active constraints.
89. General Nonlinear Optimization
• Minimize f(x)
subject to:
where the objective function and constraints are
nonlinear.
1. For a given approximate Lagrangien by
Taylor series → QP problem
2. Solve QP → descent direction
3. Perform line search in the direction →
4. Update Lagrange multipliers →
5. Repeat from Step 1.
90. General Nonlinear Optimization
Lagrangien
At the kth iterate:
and we want to compute a set of increments:
First order approximation of and constraints:
•
•
•
These approximate KKT conditions corresponds to a QP program
92. Linear Programming (LP)
• LP is common in economy and is meaningful only if it
is with constraints.
• Two forms:
1. Minimize
subject to:
2. Minimize
subject to:
• QP can solve LP.
• If the LP minimizer exists it must be one of the vertices
of the feasible region.
• A fast method that considers vertices is the Simplex
method.
A is p × N and has
full row rank (p<N)
Prove it!