MAST30013 Techniques in Operations Research

1
MAST30013 Techniques in Operations Research
Newton's Method: A Comparative Analysis of Algorithmic Convergence Efficiency
T.Lee thlee@student.unimelb.edu.au
J.Rigby j.rigby@student.unimelb.edu.au
L.Russell lrussell@student.unimelb.edu.au
Department of Mathematics and Statistics
University of Melbourne

2
Summary:
Objective
By considering a specific problem the aim of this project is to provide an example of the
implementation of traditional Newtonian Methods in multivariate minimization applications. The
effectiveness of three methods: Newton’s, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) and
Symmetric Rank 1 (SR1) are to be examined. By analysing the performance of these algorithms in
minimising a quadratic and convex function, a recommendation will be given as to the best method
to apply to such a case.
These methods cannot be applied to all multivariate functions and for a given problem a technique
may 'succeed' where others 'fail'. This success may manifest as convergence at a faster rate or in the
form of algorithmic robustness. Investigation into which conditions facilitate the successful and
efficient application of the chosen algorithms will be undertaken.
Minimization algorithms require varying degrees of further information in addition to the objective
function. Newtonian methods, in a relative sense, require more information than other common
methods like the Steepest Descent Method (SDM). The advantages and disadvantages of this further
requisite information will be examined.
Findings and Conclusions
For the specific quadratic and convex non-linear program outlined subsequently in the introduction,
it was found that Newton’s Method performed best for both the constrained and unconstrained
problem. It was computationally quicker, usually requiring less iterations to solve the problems and
achieved greater convergence accuracy than the BFGS, SR1 and SDM methods, consistent with the
outlined theory.
The ‘ideal’ nature of the specific case required further evaluation of the other methods without
consideration given to Newton’s method. For a constrained problem the SR1 method achieved non-
trivial convergence success partly attributed to the algorithms flexibility in not always choosing a
descent direction (that is, positive definiteness of the approximated hessian is not imposed).
Recommendations
Whilst theoretical convergence rates are greater for Newton’s method than for Quasi-Newton
methods and the SDM in turn, the applicability of certain algorithms is highly dependent on the
specifics of a problem. For more complex general cases the advantages of using quasi-Newtonian
methods become apparent as hessian inversion becomes more computationally taxing.
Analysis suggests that when seeking to minimize quadratic and convex nonlinear programs,
Newton's Method appears to perform better than any of the other tested methods.

3
Introduction:
The Objective Function
This project investigates the advantages and disadvantages of using three variations of Newton's
Method and contrasts their convergence efficiencies against one another as well as the Steepest
Descent Method for the general case and the specified problem:
min 𝑓(𝒙) = 𝒄 𝑇
𝒙 +
1
2
𝒙 𝑇
𝐻𝒙
where
𝒄 = [5.04, −59.4, 146.4, 96.6] 𝑇
, 𝐻 = [
0.16 −1.2 2.4 −1.4
−1.2 12.0 −27.0 16.8
2.4 −27.0 64.8 −42.0
−1.4 16.8 −42.0 28.0
]
A constrained case, where the objective function is subject to the constraint 𝒙 𝑇
𝒙 = 1, is considered
and analysed with Newtonian methods implementing an L2 penalty program. The results are
contrasted against output from the MATLAB Optimisation Tool.
In the following sections this report will detail a method of analysing the constrained and
unconstrained objective function and compare algorithmic output to analytical solutions. Results will
be contrasted with theory and meaningful conclusions made where grounded by empirical evidence.
Any results requiring further substantiation will be discussed subsequently.
Newton's Method
Newton's Method (also known as the Newton-Raphson Method) is a method for minimising an
unconstrained multivariate function. Given a starting point, this method approximates the objective
function with a second order Taylor polynomial and proceeds to minimise this approximation by
moving in the 'Newton direction'. The output is subsequently used as the new starting point and the
process is iteratively repeated (Chong and Zak, 2008).
Newton's Method seeks to increase the rate of convergence by using second order information of
the function it is operating on. This second order information is the hessian function, denoted 𝛻2
𝑓.
The 'Newton direction' mentioned above is defined to be:
𝒅 𝑘
≔ −𝛻2
𝑓(𝒙 𝑘
)
−1
𝛻𝑓(𝒙 𝑘
)
where 𝒙 𝑘
denotes a particular iterate point of the algorithm. It is defined this way such that if the
second order approximation to a given function held exactly, then it would be minimised in one step
with a step size of 1.
BFGS and SR1
Minimisation via Newton's Method requires the calculation of the gradient and hessian matrices. In
addition, the hessian must also be inverted. This inversion can be quite computationally expensive
(standard techniques are known to be 𝑂(𝑛3
) )(Hauser, 2012) and may not be defined. Additionally,

4
the Newton direction is not necessarily guaranteed to be a descent direction if the hessian is not
positive definite. These two potential problems may affect the quality of any implementation of
Newton’s method and give rise to the need for quasi-Newton methods.
The Broyden-Fletcher-Goldfarb-Shanno (BFGS) and Symmetric Rank 1 (SR1) Quasi-Newton methods
have been formulated specifically in order to bypass these concerns by approximating the hessian
from successively calculated gradient vectors (Farzin and Wah, 2012). Both methods attempt to
satisfy the secant equation at each iteration:
𝒙 𝑘+1
− 𝒙 𝑘
= 𝐻 𝑘+1(∇𝑓(𝒙 𝑘+1
) − ∇𝑓(𝒙 𝑘
))
During each iteration the approximated hessian is said to ‘updated’ in a manner dependent on the
method:
BFGS Update
𝐻 𝑘+1 = 𝐻 𝑘 +
1 + 〈𝒓 𝑘
, 𝒈 𝑘〉
〈𝒔 𝑘, 𝒈 𝑘〉
𝒔 𝑘
(𝒔 𝑘
) 𝑇
− [𝒔 𝑘
(𝒓 𝑘
) 𝑇
+ 𝒓 𝑘
(𝒔 𝑘
) 𝑇
]
where
𝒔 𝑘
= 𝒙 𝑘+1
− 𝒙 𝑘
, 𝒈 𝑘
= 𝛁f(𝒙k+1
) − 𝛁f(𝒙 𝑘
), 𝒓 𝑘
=
𝐻 𝑘 𝒈 𝑘
〈𝒔 𝑘, 𝒈 𝑘〉
𝐻 𝑘+1 is an approximation to the inverse of the hessian. The BFGS update always satisfies the secant
equation and maintains positive definiteness of the hessian approximation (if initialized as such).
Additionally the BFGS update satisfies the useful symmetry property 𝐻 𝑘+1 = 𝐻 𝑘+1
𝑇
. It can also be
shown that 𝐻 𝑘+1 differs from its predecessor by a rank-2 matrix (Nocedal and Wright, 1999).
SR1 Update
The SR1 update is a simpler rank-1 update that also maintains the symmetry of the hessian and
seeks to (but does not always) satisfy the secant equation. (Nocedal and Wright, 1999)
𝐻 𝑘+1 = 𝐻 𝑘 +
(∆ 𝒙 𝑘
− 𝐻 𝑘 𝒚 𝑘)(∆ 𝒙 𝑘
− 𝐻 𝑘 𝒚 𝑘) 𝑇
(∆ 𝒙 𝑘 − 𝐻 𝑘 𝒚 𝑘) 𝑇 𝒚 𝑘
where
𝒚 𝑘
= ∇𝑓( 𝒙 𝑘
+ ∆ 𝒙 𝑘) − ∇𝑓( 𝒙 𝑘).
As with the BFGS method, 𝐻 𝑘+1 is an approximation to the inverse of the hessian. This update does
not guarantee that the update be positive definite and subsequently does not ensure that following
iterations always move in descent directions. In practice, the approximated hessians generated by
the SR1 method exhibit faster converge towards the true hessian inverse than the BFGS method
(Conn, Gould and Toint, 1991). A known drawback affecting the robustness of the SR1 method is
that the denominator can vanish (Nocedal & Wright 1999). Where this is the case algorithmic
robustness can be increased simply by skipping the updating process of troublesome iteration.

5
Method:
Analysis Method
First a theoretical analysis of the function was completed. Then each of the methods of minimisation
was applied to the outlined objective function in both its constrained and unconstrained form using
MATLAB algorithms (see appendices: MATLAB). The unconstrained results from these three methods
have been compared to each other and to the Steepest Descent Method in order to draw
conclusions about the best method to apply. The constrained program results were compared to
results calculated from the MATLAB Optimisation Tool. The criteria analysed were, time taken and
number of iterations taken to converge on global minimum (within a tolerance value), accuracy of 𝒙∗
and algorithmic robustness.
Analysis of Unconstrained Objective Function
The properties of the considered problem need to be determined to draw meaningful conclusions
from output results. Knowledge about the behaviour of this function, specifically the type of
function, location and number of any stationary points, the class of stationary points and whether
they are local or global can be determined for this case.
Functions of the following form are defined as quadratic:
𝑓(𝑥) = 𝛼 + 〈𝒄, 𝒙〉 +
1
2
〈𝒙, 𝐵𝒙〉 = 𝛼 + 𝒄 𝑇
𝒙 +
1
2
𝒙 𝑇
𝐵𝒙
The specified function is of this form with 𝛼 = 0, 𝐵 = 𝐻 and 𝒄 = 𝑐 as detailed above. This
identification means that:
𝛻𝑓(𝒙) = 𝒄 + 𝐻𝒙
𝛻2
𝑓(𝒙) = 𝐻
A particular feature of quadratic functions is that they are convex if and only if their hessian is
positive semi-definite. Calculating the eigenvalues of H (via MATLAB) reveals:
𝜆1 = 0.0066657144469
𝜆2 = 0.0591221937911
𝜆3 = 1.4840596546051
𝜆4 = 103.4101524371569
Since each eigenvalue is positive, this tells us that H is positive definite (see appendices: Proofs).
Positive definite matrices are also positive semi-definite, and so the quadratic function is also
convex.
The function under investigation is quadratic and convex, hence solvable via matrix algebra. For a
convex and at least 𝐶1
function, 𝛻𝑓(𝒙∗) = 0 if and only if 𝒙∗
is a global minimum of f. Note that by
the fact that the function is quadratic, it must be at least 𝐶1
. Therefore:
𝛻𝑓(𝒙) = 0
⟺ 𝐻𝒙∗
+ 𝒄 = 0

6
⟺ 𝐻𝒙∗
= −𝒄
⟺ 𝒙∗
= −𝐻−1
𝒄
For the specified function:
𝐻−1
= [
100 50 33.333 25
50 33.333 25 20
33.333 25 20 16.667
25 20 16.667 14.286
]
And recall:
𝒄 = [
5.04
−59.4
146.4
−96.6
]
This implies that:
𝒙∗
= − [
100 50 33.333 25
50 33.333 25 20
33.333 25 20 16.667
25 20 16.667 14.286
] ∗ [
5.04
−59.4
146.4
−96.6
] = [
1
0
−1
2
]
So 𝒙∗
= [1 0 −1 2] 𝑇
is the global minimum to the nonlinear function. There will be no other
minimums of this function. It is expected that the algorithms converge to this point.
Thus, the result in calculating f(x*) = -167.28.
Note that the inverse of the hessian matrix for the program is a constant matrix, that is to say that it
does not change for all elements of ℝ4
. Recall that one of the potential problems with Newton's
method was that the hessian may not be invertible or positive definite. Both of these cases are not
true for this specified problem.
Implementation of Algorithms
Each algorithm under investigation has been written into MATLAB code and included in the
appendices. They each call on a common univariate line-search algorithm, the Golden Section
Search, one of the better methods in the class of robust interval reducing methods.(Arora, 2011)
Similarly, they implement an algorithm for finding an upper bound on the location of the minimum
on a half-open interval which doubles the incremented step size with each iteration.
Each algorithm has the following parameters that can be altered:
 𝑥0
 The starting point for the algorithm.
 Tolerance 1
 The stopping criteria for the particular algorithm. In all cases, this is a check of the
magnitude of the gradient vector at a particular iteration point 𝒙 𝑘
against 0. If it is
‘close enough’ to zero, the algorithm will end. ‘Close enough’ is defined as the value
set for this tolerance.

7
 Tolerance 2
 The stopping criteria for the Golden Section Search as detailed above. This value sets
how large the interval estimate will be when the line search is complete.
 T
 The parameter used the Multi Variable Half Open Interval Search nested in each of
the algorithms. 2(𝑘−1)
𝑇 is the increase to the upper bound during each iteration
when trying to find an interval on which the minimum of the approximation must
exist.
 𝐻0
 This is the ‘starting hessian’, thought of as an approximation to the inverse hessian
of the program. It is only present in the BFGS and SR1 methods.
In this paper, 𝒙0, tolerance 1 and 𝐻0 values (where appropriate) have been varied and the effects
analysed. The effect of changing tolerance 2 and the value of T have not been analysed because they
do not relate directly (see above) to the methods under investigation.
It is expected that the result of altering 𝒙0 will depend on the distance of 𝒙0 from the global
minimum. The expectation is that the closer 𝒙0 is to the minimum the less time and number of
iterations the algorithm will be expected to take to converge. Having a more strict tolerance (that is,
bringing tolerance 1 closer to 0) should result in an increased number of iterations and time taken.
The algorithm in question must get closer to the true global minimum in order to comply with the
more strict tolerance, and hence is expected to require more computational time. Additionally, the
effect of changing 𝐻0 will be expected to depend on how well 𝐻0 approximates the true hessian
inverse of the function. A better approximation (such as giving the algorithm the function’s true
hessian inverse to begin with) would be expected to reduce the amount of computation time
required for convergence. However, the idea behind the BFGS and SR1 methods is to avoid
calculating the inverse of the hessian directly. As such, the hessian inverse will not be used as a 𝐻0 in
order to simulate more realistic conditions under which these two methods might be implemented.
Constrained Case
In solving the constrained case, two approaches were taken. The first was to solve the nonlinear
program using the MATLAB Optimization Tool, specifically via the interior point and active set
algorithms. This gave the solution point as well as some data and intuition in regards to the time
taken and iterations required to solve such a constrained problem (alternate analytical approaches
could have been used to provide this reference value, see discussion). Since the algorithms
implemented by the MATLAB Optimization Tool are specifically designed to solve constrained
nonlinear programs, the expectation was that they would outperform the Newtonian algorithms
under investigation. The second method for solving the constrained case was via the L2 penalty
method. Converting the constrained nonlinear program into an unconstrained nonlinear program
allowed for the Newtonian algorithms to be implemented.
The specified constraint is:
𝒙 𝑇
𝒙 = 1
⟺ 𝑥1
2
+ 𝑥2
2
+ 𝑥3
2
+ 𝑥4
2
− 1 = 0

8
The L2 penalty method requires that this constraint be converted into a penalty term and added to
the objective function. So, rather than minimizing the original objective function with the above
constraint, the function to be minimised was instead:
𝑃𝑘(𝑥) = 𝒄 𝑇
𝒙 +
1
2
𝒙 𝑇
𝐻𝒙 +
𝑘
2
(𝑥1
2
+ 𝑥2
2
+ 𝑥3
2
+ 𝑥4
2
− 1)2
where 𝒄 and 𝐻 are defined as above, and 𝑘 being the parameter of the penalty term.
The algorithms under investigation require the calculation of both the gradient function and the
hessian matrix. As shown above, in the unconstrained case, the hessian is a symmetric, positive
definite matrix, perfect for implementation of Newtonian methods. When the hessian matrix is
calculated in order to implement these algorithms for solving the constrained case, the positive
definiteness of the matrix is potentially lost. This is a possible cause for any non-convergence issues
arising from implementation of the Newtonian methods. For the equations for the gradient function
and hessian matrix see appendices: MATLAB.
The L2 penalty method analytically finds the minimum point by evaluating 𝒙∗
= lim 𝑘→∞ 𝒙 𝑘
. When
solving for the minimum point numerically using the Newtonian algorithms, a small value of 𝑘 was
chosen and increased in order to simulate this limiting process. It was expected that as the value of
𝑘 increased the minimum point the algorithms found converged on the value of 𝑥∗
found by the
MATLAB Optimization tool.
Theoretical Convergence
Under specific circumstances, the various methods exhibit differing rates of convergence.
 For an initial point sufficiently close to the minimum, if the hessian is positive definite, a
local/global minimum actually exists, the step size at each iteration satisfies the Armijo-
Goldstein and Wolfe conditions and f is C3
, then the rate of convergence for Newton's
Method is quadratic. (see appendices for univariate proof).
 Quasi-Newton methods are known to exhibit superlinear convergence under certain
circumstances:
 The BFGS Method can be shown to converge to the global minimum at a superlinear
rate if the starting hessian is positive definite and the objective function is twice
differentiable and convex. (Powell, 1976)
 Likewise, the SR1 method, exhibits superlinear convergence under the same
conditions. (Nocedal and Wright, 1999).
 As an aside, the Steepest Descent Method is known to converge at a linear rate. In addition,
it is not adversely affected, as the Newtonian methods are known to be, by horizontal
asymptotes where divergence is sometimes observed.
The correlation of results to these theoretical rates was examined. It is expected that Newton's
Method will converge the quickest. The quasi Newtonian methods are expected to be the next
fastest methods to converge followed lastly by the Steepest Descent Method.

9
Results, Conclusions and Recommendations:
Results for Unconstrained Case
The performance of the chosen methods was analysed for the specified objective function by
choosing 10 starting points, 𝒙0 (8 randomly generated, [0 0 0 0] and the known minimum at [1 0 -1
2]).
Point 𝑥1 𝑥2 𝑥3 𝑥4
1 0 0 0 0
2 1 0 -1 2
3 0.81 0.91 0.13 0.91
4 3.16 0.49 1.39 2.73
5 9.57 -4.85 8.00 -1.42
6 14.66 -4.04 12.09 3.42
7 1.60 3.35 -16.46 18.81
8 -0.30 -7.47 -17.90 1.86
9 2.23 11.37 15.51 17.57
10 7.52 -10.79 -10.14 17.14
Table 1: List of starting points
*Starting points truncated to two decimal places.
Results shown here were generated with tolerance values of 0.01 for T and tolerances 1,2 (see
appendices for full list of all results).
Disregarding data sets that failed to return a value for x* and sets where the number of iterations
exceeded the average by >500% the following table of average values was generated:
Newton's Method Steepest Descent BFGS (hessian) BFGS (identity) SR1 (hessian) SR1 (identity)
x(1) 0.99998 1.01376 1.06218 0.94505 1.00033 0.996722222
x(2) -0.00007 0.01406 0.03947 -0.02716 0.00152 0.002833333
x(3) -1.00019 -0.9882 -0.97035 -1.01809 -0.99881 -0.996133333
x(4) 1.99986 2.00992 2.02388 1.98639 2.00088 2.003922222
f(x*) -167.28 -167.2749 -167.27892 -167.27965 -167.28 -167.2799889
Elapsed Time (s) 0.0040865 0.1965647 0.0100982 0.014235375 0.0119615 0.0125613
Iterations 9 1010 9.2 5.75 24.6 24.1
Elapsed Time per
Iteration (s) 0.000454056 0.000194619 0.00109763 0.002475717 0.00048624 0.000521216
Table 2: Summary of algorithms performances (averaged values given robust algorithm implementation)
The BFGS and SR1 methods were both ran using 𝐻 (as defined in the introduction) and the 4x4
identity as 𝐻0 inputs.
All algorithms converged on the global minimum at 𝒙∗
= [1 0 − 1 2]; 𝑓(𝒙∗) = −167.28 for all
staring point except in one instance of the SR1 method initialsed with the identity matrix from point
7. However given the ‘shallowness’ of the function and the tolerances, outlying 𝒙∗
values were
occasionally generated. This ‘shallowness’ refers to, based on the results in the appendix, the
relatively small magnitude of the gradient vector at many points of the objective function close to

10
the minimum. To this end, the BFGS Method and Steepest Descent Method often stopped only part
way to the global minimum.
For a given set of parameters it was often the case that the BFGS method regularly converged faster
than Newton’s method albeit with less precision. However, for this ideal quadratic objective function
Newton’s Method was less prone to ‘getting stuck’ (see appendices data: BFGS (identity) had runs
where iteration values were 28905 and 126675) and always took the least time to converge. The
accuracy of calculated 𝒙∗
was greatest for Newton’s Method followed by SR1, BFGS and Steepest
Descent in decreasing order of accuracy. As expected the quasi-Newtonian methods each converged
faster (from these points, by an order of magnitude) than the Steepest Descent method. The
average iteration of the Steepest Descent method, however, took less time to compute than the
other methods. A cause of this could be that all other methods require operations on a 4x4 matrix
such as an inversion of a hessian or an update of the inverse hessian approximation at each
iteration, whereas the Steepest Descent Method requires only calculation of a gradient vector.
Tolerance Variation (see appendices for tables of results)
With tolerance 2 held constant at 0.01 and T held constant at 1:
For Newton's Method, the algorithm converged quickest with tolerance 1 values set at 0.0001.
When given the identity matrix for the starting iteration the efficiency of the SR1 method appeared
to generally increase as the tolerance 1 became stricter. When given the hessian of the program to
start with, there was no real discernible pattern as to what effect varying the tolerance 1 had. The
BFGS method, when given either the identity or the program's hessian to start with, behaved as
intuitively expected and computational time increased with tightened tolerances. Hence the results
varied and were not always consistent with our expectations. Further investigation and more data
would allow for greater quantitative analysis (see discussion).
With regard to the BFGS method, on occasion the value of T was altered to get the algorithm to
converge. This issue did not arise when implementing Newton’s Method or with the SR1 Method. As
T is only used in the Multi Variable Half Open Section Search portion of the algorithm, this points to
a possible incompatibility between this particular algorithm and the BFGS method under certain
conditions. Finding an alternative method for doing this task (e.g. using a step size that meets the
Armijo-Goldstein and Wolff conditions) would rectify this issue and may increase the robustness of
the BFGS method.
Summary
In summary, each method found the global minimum to the desired accuracy in a vast majority of
cases; however, Newton’s method was the most accurate method and took less computational time.
The convergence rates were mostly-consistent with theory in that the quasi-Newtonian algorithms
generally converged slower than Newton’s method and faster than the Steepest Descent Method.
Results for Constrained Case
Algorithmic performance given the original objective function constrained by 𝑥 𝑇
𝑥 = 1 was next
analysed. Firstly, the constrained case was solved using MATLAB’s Optimization Toolbox:

11
Interior Point Algorithm Active Set Algorithm
x(1) -0.025 -0.025
x(2) 0.311 0.311
x(3) -0.789 -0.789
x(4) 0.53 0.53
f(x*) -133.56022058 -133.56022058
Average Iterations 22.2 33.6
Elapsed Time (s) 1.05 2.01
Elapsed Time per Iteration (s) 0.047297 0.059821
Table 3: MATLAB Optimisation Tool Results
The following values were obtained using the first five starting points listed in Table 1.
To solve this constrained problem using Newton's Method and its variants, the L2 penalty method
was used. In applying the algorithms the following results were returned, with the penalty term
= 10,000,000 :
Newton's Method BFGS (Hessian) BFGS (Identity)* SR1 (Hessian) SR1 (Identity)
x(1) -0.02477 N/A -0.02477 -0.02477 -0.02477
x(2) 0.31073 N/A 0.31073 0.31073 0.31073
x(3) -0.78876 N/A -0.78876 -0.78876 -0.78876
x(4) 0.52980 N/A 0.52980 0.52980 0.52980
f(x*) -133.56022894 N/A -133.56022894 -133.5603044 -133.5603044
Elapsed Time (s) 0.2051092 N/A 0.112444 2.104064 1.678844
Iterations 169 N/A 111 2037 659
Elapsed Time per
Iteration (s)
0.001214 N/A 0.001013 0.001033 0.002546
Table 4: Newtonian Methods Constrained Problem Results
*BFGS (identity) only returned results for the [0 0 0 0] starting point. As such, its results are not
averaged. The rest of the results are averaged data returned for the five starting points.
The BFGS algorithm’s (see appendices: MATLAB) L2 implementation was especially fragile given this
constraint. It did not return any results when given the program's hessian for the staring iteration
and only found the minimum once when given the identity as a starting hessian. In that one case
where it found a minimum, the BFGS method found the same minimum as the other two algorithms,
closely matching MATLAB optimization output.
Slightly more robust was the SR1 method. It managed to find the minimum from more starting
points than the BFGS method did, although not from all starting points. This slightly increased
robustness did come at a cost however, with the SR1 method requiring a very large number of
iterations and a longer timeframe in which to operate, making it more computationally expensive to
use. The SR1 update does not impose positive definiteness on the updated hessian and this fact may
have contributed at the algorithms increased success rate when compared to the BFGS method.
In general, the relative reduced robustness of the BFGS and SR1 implementations, given the
specified problem, possibly stems from the fact that they are both quasi-Newton Methods derived
from the Secant Method. As quasi-Newton Methods are designed to avoid the computationally
expensive hessian inversion, the hessian is instead approximated using finite differences of the
function gradient, and data is interpolated with iterations (Indiana University, 2012). As Quasi-

12
Newton methods are multivariate generalisations of the secant method, the same problem exists for
both methods – namely, that if the initial values used are not close enough to x*, the methods may
fail to converge entirely. (Ohio University, 2012)
In contrast, Newton's Method performed exceptionally well. It found the minimum from all starting
points and did so relatively quickly in terms of both time and number of iterations.
Compared to the MATLAB optimization algorithms, all of the Newtonian algorithms took more
iterations to converge, as expected. Surprisingly, Newton's Method was able to outperform
MATLAB’s optimization algorithms with regard to speed.
Hessian Analysis (both programs)
For both the constrained and unconstrained cases, the BFGS Method converged in less iterations
and more accurately when it was started with the identity matrix as the inverse hessian
approximation. It was also more robust when starting with the identity, always converging in the
unconstrained case and at least finding the global minimum once in the constrained case. The
method was quicker in terms of elapsed time when given the program's hessian to start with. It is
therefore recommended that the identity be used as 𝐻0 when minimising a function such as this via
the BFGS Method.
In the unconstrained case, the difference in choosing 𝐻0 as the starting inverse hessian
approximation for the SR1 Method was negligible. There is no significant difference in the time
taken, accuracy or average number or iterations required to warrant recommending one particular
𝐻0 over the other. In the constrained case, whilst there may be no difference between the results in
terms of accuracy, there is a more pronounced disparity between iterations and time taken. Starting
with the function's hessian caused the SR1 Method to take nearly three times as many iterations and
almost a 20% longer timeframe. Therefore, using the identity matrix for 𝐻0 for this constrained case
is a much better alternative.
As was noted earlier, the hessian of this nonlinear function is a constant 4x4 matrix regardless of the
algorithm's current 𝒙 𝑘
. This means that it is not very computationally taxing to compute the hessian
and it only needs to be computed once. This deals with the problem of the hessian inversion being
computationally expensive. Since it is known that the hessian is positive definite, so too is the
inverse of the hessian. Thus, the Newton Direction will always be a descent direction. Whilst this is
only true for this function (and functions of similar forms), it means that Newton's Method behaved
very well in this particular problem.
Summary
Given an ideal function such as this, that is to say a quadratic and convex nonlinear program, based
on the above analysis, Newton's Method outperformed both of its variants (BFGS and SR1) and the
Steepest Descent Method. With the problem formulated using the L2 penalty method it is the best
algorithm to use for such a program.

13
Discussion:
The objective function analysed by this project was particularly suited to minimization via Newton’s
Method. For other programs, especially non-convex and non-quadratic ones, the results obtained by
this paper may not hold. The BFGS and SR1 method were formulated precisely because the relative
effectiveness of Newton’s method diminishes with increasing complexity.
In addition, the methodology implemented one class of many available algorithms which specifically
used the Golden Section Search in conjunction with a particular open interval search algorithm. A
variety of methods could have also been used to determine an appropriate step size to move during
each iteration. For example, step sizes satisfying the Armijo-Goldstein and Wolfe conditions would
be an appropriate choice. Hence, a whole family of dissimilar results could have been generated
from the same starting points using different algorithms which could just as easily be considered
Newtonian.
The analysis was very origin-centric in that the starting points were all within a relatively similar
distance from [0 0 0 0] 𝑇
and hence [1 0 −1 2] 𝑇
and [−0.025 0.311 − 0.789 0.53] 𝑇
, the
global minimums. Analysis from starting iterations further from the minimums should yield results
consistent with those generated by this report; further investigation is needed.
As discussed in the previous section, in such a nonlinear program as this, the inverse of the hessian
needs to only be calculated once. As long as it is known the hessian does not change for any point in
ℝ4
and inversion of that constant hessian matrix is computationally feasible, the coding for
Newton's Method used here could be adjusted to remove the evaluation and inversion of the
hessian at each iteration. Such a change would result in less calculations per iteration speeding up
the algorithm. The results returned for this particular case would be even better if such an
adjustment was made. The drawback of doing so is that the adjusted method could only be applied
to cases where the hessian is a constant matrix, severely restricting its applicability.
For the constrained case an analytical solve of using the KKT method would have been possible
albeit complicated and not solvable by simple linear algebra operations due to the quadratic nature
of the constraint. If this project had gone down this path instead of utilizing MATLAB’s optimization
tools an exact value of 𝒙∗
could have been used as a point of reference.
The shortcomings of the BFGS algorithm’s implementation of the L2 penalty method requires further
analysis and perhaps troubleshooting.
Finally, the analysis of varying the tolerances of the algorithms used by this report could have been
furthered with more systematically obtained data. It would be expected that for a general case
decreasing the ‘strictness’ of the major tolerance would decrease computational time taken (this
was not always the case: see results). By way of contrast, varying the tolerances of the open interval
search and golden section search would have been expected to exhibit different effects for different
starting points and for different problems. To optimize an algorithm a balance must be struck
between accuracy and time taken to generate an appropriate step length. Hence, ideal tolerance
values exist for different algorithm, different starting points and for each iteration. Further
investigation may reveal common properties of these ideal tolerances given the algorithms used.

14
References:
Arora, J. (2011). Introduction to Optimum Design [electronic resource]. p.42 Burlington Elsevier
Science.
Chong, E. Zak, S. (2008). An Introduction To Optimization 3rd Edition. pp. 155-156. John Wiley and
Sons.
Conn, A., Gould, N. and Toint, P. (1991). "Convergence of quasi-Newton matrices generated by the
symmetric rank one update". Mathematical Programming (Springer Berlin/ Heidelberg) 50 (1): pp.
177–195.
Farzin, K. and Wah, L. (2012). On the performance of a new symmetric rank-one method with restart
for solving unconstrained optimization problems. Computers and Mathematics with Applications,
Volume 64, Issue 6, September 2012, pp. 2141-2152,
http://www.sciencedirect.com/science/article/pii/S089812211200449X
Hauser, K. (2012). Lecture 6: Multivariate Newton's Method and Quasi-Newton Methods. p. 5.
Available online: http://homes.soic.indiana.edu/classes/spring2012/csci/b553-
hauserk/newtons_method.pdf
Indiana University. (2012). Lecture 6: Multivariate Newton's Method and Quasi-Newton methods.
Available online: http://homes.soic.indiana.edu/classes/spring2012/csci/b553-
hauserk/newtons_method.pdf
Indian Institute of Technology. (2002). Convergence of Newton-Raphson method. Available online:
http://ecourses.vtu.ac.in/nptel/courses/Webcourse-contents/IIT-
KANPUR/Numerical%20Analysis/numerical-analysis/Rathish-kumar/ratish-1/f3node7.html
Nocedal, J. and Wright, S.J. (1999). Numerical Optimization, pp. 220, 144.
Ohio University (2012). Lecture 6: Secant Methods. Available online:
http://www.math.ohiou.edu/courses/math3600/lecture6.pdf
Powell, M. (1976). 'Superlinear convergence Some global convergence properties of a variable
metric algorithm for minimization without exact line searches', Nonlinear Programming, Vol 4,
Society for Industrial and Applied Mathematics, p. 53

15
Appendices:
Proofs
Positive Eigenvalues imply invertiblity of matrix:
𝐷𝑒𝑓𝑖𝑛𝑒 𝑎 𝑝𝑜𝑙𝑦𝑛𝑜𝑚𝑖𝑎𝑙 𝑝(𝑡) = (𝑡 − 𝜆1)(𝑡 − 𝜆2) … (𝑡 − 𝜆 𝑛)
𝑇ℎ𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑡𝑒𝑟𝑚: (−1) 𝑛
𝜆1 𝜆2 … 𝜆 𝑛
𝐿𝑒𝑡 𝑝(𝑡) = det(𝑡𝐼 − 𝐴)
𝑊ℎ𝑒𝑟𝑒 𝐴 𝑖𝑠 𝑎 𝑠𝑞𝑢𝑎𝑟𝑒 𝑚𝑎𝑡𝑟𝑖𝑥: 𝑝(0) = 𝑑𝑒𝑡(−𝐴) = (−1) 𝑛
det(𝐴)
det(𝐴) = 𝜆1 𝜆2 … 𝜆 𝑛
if 𝜆1, 𝜆2,, … , 𝜆 𝑛 > 0
det(𝐴) ≠ 0
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝐴 𝑖𝑠 𝑖𝑛𝑣𝑒𝑟𝑡𝑖𝑏𝑒
Newton’s method Converges Quadratically for Univariate Case:
𝐿𝑒𝑡 𝑥𝑖 𝑏𝑒 𝑎 𝑟𝑜𝑜𝑡 𝑜𝑓 𝑓(𝑥) = 0
𝐿𝑒𝑡 𝑥 𝑛 𝑏𝑒 𝑎𝑛 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑥𝑖: |𝑥𝑖 − 𝑥 𝑛| = 𝜀 < 1
𝐵𝑦 𝑇𝑎𝑦𝑙𝑜𝑟 𝑠𝑒𝑟𝑖𝑒𝑠 𝑒𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛:
0 = 𝑓(𝑥𝑖) = 𝑓(𝑥 𝑛 + 𝜀) = 𝑓(𝑥 𝑛) + 𝑓′(𝑥 𝑛)( 𝑥𝑖 − 𝑥 𝑛) + 0.5𝑓′′(𝜉)( 𝑥𝑖 − 𝑥 𝑛)2
𝐹𝑜𝑟 𝑠𝑜𝑚𝑒 𝜉 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑥𝑖, 𝑥 𝑛
𝐹𝑜𝑟 𝑁𝑒𝑤𝑡𝑜𝑛′
𝑠 𝑚𝑒𝑡ℎ𝑜𝑑: −𝑓′(𝑥 𝑛)(𝑥 𝑛+1 − 𝑥 𝑛) = 𝑓(𝑥 𝑛)
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒: 0 = 𝑓′(𝑥 𝑛)(𝑥𝑖 − 𝑥 𝑛+1) + 0.5𝑓′′(𝜉)( 𝑥𝑖 − 𝑥 𝑛)2
( 𝑥𝑖 − 𝑥 𝑛), (𝑥𝑖 − 𝑥 𝑛+1) 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑒𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚𝑠 𝑓𝑜𝑟 𝑠𝑢𝑐𝑒𝑠𝑠𝑖𝑣𝑒 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠
(𝑥𝑖 − 𝑥 𝑛+1) ∝ ( 𝑥𝑖 − 𝑥 𝑛)2
Q.E.D.
(Indian Institute of Technology, 2002)

16
Data
All tolerance values 0.01:
Newton's Method:
x* f(x*) Elapsed Time (s) Iterations
1.0000 -0.0000 -1.0000 2.0000 -167.28 0.009944 10
1 0 -1 2 -167.28 0.000134 0
1.0000 -0.0001 -1.0001 2.0001 -167.28 0.004207 9
0.9999 -0.0000 -1.0002 2.0000 -167.28 0.001905 9
1.0002 -0.0001 -1.0002 1.9999 -167.28 0.00208 10
0.9999 0.0000 -1.0001 2.0000 -167.28 0.005061 11
1.0000 0.0001 -1.0004 1.9995 -167.28 0.003064 10
1.0000 0.0001 -0.9999 2.0000 -167.28 0.005054 11
0.9999 -0.0008 -1.0011 1.9990 -167.28 0.004321 9
0.9999 0.0001 -0.9999 2.0001 -167.28 0.005095 11
Steepest Descent:
0.2126 -0.4083 -1.2812 1.7836 -167.2769 0.00655 34
1 0 -1 2 -167.28 0.000116 0
1.1539 0.1993 -0.8189 2.1600 -167.2789 0.038846 106
2.1659 0.6656 -0.5248 2.3718 -167.2728 0.170743 798
1.7844 0.3563 -0.7776 2.1590 -167.2768 0.077666 418
2.1659 0.6648 -0.5257 2.3709 -167.2728 0.481597 2600
-0.1767 -0.6709 -1.4786 1.6256 -167.2727 0.274945 1399
-0.1749 -0.6699 -1.4779 1.6262 -167.2727 0.273263 1332
2.1767 0.6709 -0.5213 2.3744 -167.2727 0.303484 1575
-0.1702 -0.6672 -1.4760 1.6277 -167.2727 0.338437 1838
BFGS (hessian for starting iterate):
0.1915 -0.4173 -1.2823 1.7865 -167.2767 0.004697 6
1 0 -1 2 -167.28 0.000144 0
1.2740 0.1562 -0.8887 2.0870 -167.2796 0.002729 7
0.9981 0.0014 -0.9982 2.0018 -167.28 0.004855 10
2.1585 0.6613 -0.5280 2.3692 -167.2729 0.006476 7
1.0040 -0.0019 -1.0030 1.9967 -167.28 0.02334 11
0.9987 0.0005 -0.9991 2.0011 -167.28 0.016815 14
0.9946 -0.0028 -1.0020 1.9983 -167.28 0.016214 13
1.0029 -0.0024 -1.0021 1.9982 -167.28 0.010847 10
0.9995 -0.0003 -1.0001 2.0000 -167.28 0.014865 14

17
BFGS (identity for starting iterate):
0.2028 -0.4142 -1.2815 1.7865 -167.2768 0.001249 3
1 0 -1 2 -167.28 0.000297 0
1.2434 0.1361 -0.9034 2.0753 -167.2797 0.005207 5
1.0024 0.0016 -0.9992 2.0004 -167.28 0.003858 7
1.0118 0.0074 -0.9948 2.0039 -167.28 3.954961 28905
0.9991 -0.0006 -1.0005 1.9996 -167.28 0.079164 9
0.9976 -0.0041 -1.0046 1.9955 -167.28 0.004674 7
0.9953 -0.0024 -1.0012 1.9992 -167.28 0.008941 7
1.0007 0.0009 -0.9994 2.0004 -167.28 0.010493 8
0.9974 0.0037 -0.9963 2.0031 -167.28 16.554056 126675
SR1 (hessian for starting iterate):
1.0001 0.0001 -0.9999 1.9999 -167.28 0.011665 14
1 0 -1 2 -167.28 0.000299 0
1.0000 0.0000 -1.0000 2.0000 -167.28 0.007054 21
0.9985 -0.0008 -1.0003 2.0000 -167.28 0.013603 20
1.0173 0.0085 -0.9961 2.0015 -167.28 0.015117 19
0.9996 0.0115 -0.9892 2.0093 -167.28 0.007496 19
1.0000 0.0000 -1.0000 2.0000 -167.28 0.012073 37
0.9812 -0.0119 -1.0090 1.9927 -167.28 0.028734 52
1.0066 0.0056 -0.9956 2.0038 -167.28 0.013331 39
1.0000 0.0022 -0.9980 2.0016 -167.28 0.010243 25
SR1 (identity for starting iterate):
1.0014 -0.0007 -0.9997 2.0009 -167.28 0.023934 17
1 0 -1 2 -167.28 0.000314 0
1.0004 0.0001 -1.0001 1.9997 -167.28 0.015986 25
1.0053 0.0051 -0.9963 2.0025 -167.28 0.009545 30
0.9998 -0.0005 -1.0002 2.0000 -167.28 0.012679 37
1.0000 -0.0000 -1.0000 2.0000 -167.28 0.01236 34
0.9723 0.0241 -0.9692 2.0305 -167.2799 0.003297 9
NaN NaN NaN NaN NaN 0.006176 7
0.9991 -0.0003 -1.0002 1.9998 -167.28 0.013616 41
0.9922 -0.0023 -0.9995 2.0019 -167.28 0.027706 41

18
Newton's Method
x0 Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations
[0 0 0 0] 0.1 0.01 1 [ NaN -Inf -Inf -Inf] NaN 0.3685380 353
[0 0 0 0] 0.01 0.01 1 [1 0 -1 2] -167.279999996088 0.0294290 4
[0 0 0 0] 0.001 0.01 1 [1 0 -1 2] -167.279999999938 0.0160210 3
[0 0 0 0] 0.0001 0.01 1 [1 0 -1 2] -167.279999999976 0.0058770 2
[0 0 0 0] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0072560 2
[0 0 0 0] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0104300 3
[1 0 -1 2] 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0003140 0
[1 0 -1 2] 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0001230 0
[1 0 -1 2] 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000790 0
[1 0 -1 2] 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000750 0
[1 0 -1 2] 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000740 0
[1 0 -1 2] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000720 0
[0.81 0.91 0.13 0.91] 0.1 0.01 1 [NaN NaN NaN NaN] NaN 0.3632520 353
[0.81 0.91 0.13 0.91] 0.01 0.01 1 [1 0 -1 2] -167.279999998378 0.0060940 4
[0.81 0.91 0.13 0.91] 0.001 0.01 1 [1 0 -1 2] -167.279999999974 0.0052550 3
[0.81 0.91 0.13 0.91] 0.0001 0.01 1 [1 0 -1 2] -167.279999999990 0.0043530 2
[0.81 0.91 0.13 0.91] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0052190 2
[0.81 0.91 0.13 0.91] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0058110 3
[3.16 0.49 1.39 2.73] 0.1 0.01 1 [NaN NaN NaN NaN] NaN 0.3621590 353
[3.16 0.49 1.39 2.73] 0.01 0.01 1 [1 0 -1 2] -167.279999997557 0.0058490 4
[3.16 0.49 1.39 2.73] 0.001 0.01 1 [1 0 -1 2] -167.279999999961 0.0052710 3
[3.16 0.49 1.39 2.73] 0.0001 0.01 1 [1 0 -1 2] -167.279999999985 0.0043780 2

19
[3.16 0.49 1.39 2.73] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0051530 2
[3.16 0.49 1.39 2.73] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0116540 2
[9.57 -4.85 -8.00 -1.42] 0.1 0.01 1 [NaN NaN NaN NaN] NaN 0.3284300 352
[9.57 -4.85 -8.00 -1.42] 0.01 0.01 1 [1 0 -1 2] -167.279999995272 0.0051650 4
[9.57 -4.85 -8.00 -1.42] 0.001 0.01 1 [1 0 -1 2] -167.279999999925 0.0045340 3
[9.57 -4.85 -8.00 -1.42] 0.0001 0.01 1 [1 0 -1 2] -167.279999999971 0.0037930 2
[9.57 -4.85 -8.00 -1.42] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0045140 2
[9.57 -4.85 -8.00 -1.42] 0.000001 0.01 1 [1 0 -1 2] -167.279999999999 0.0072210 3

20
SR1 Method
x0 Starting Hessian Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations
[0 0 0 0] Identity 0.1 0.01 1 [1 0 -1 2] -167.279999128240 0.1688580 89
[0 0 0 0] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999575407 0.0692100 39
[0 0 0 0] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999999853 0.0587820 26
[0 0 0 0] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999999 0.1066900 38
[0 0 0 0] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999973 0.0380570 10
[0 0 0 0] Identity 0.000001 0.01 1 [NaN NaN NaN NaN] NaN 0.0307350 9
[0 0 0 0] Program's Hessian 0.1 0.01 1 [1.42 0.12 -0.94 2.05] -167.278456189735 0.0620180 31
[0 0 0 0] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999919554 0.0949530 57
[1 0 -1 2] Identity 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0001780 0
[1 0 -1 2] Identity 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0000800 0
[1 0 -1 2] Identity 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000720 0
[1 0 -1 2] Identity 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000700 0
[1 0 -1 2] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000700 0
[1 0 -1 2] Identity 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000690 0
[1 0 -1 2] Program's Hessian 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0000880 0
[0.81 0.91 0.13 0.91] Identity 0.1 0.01 1 [1.40 0.52 -0.52 2.43] -167.272369332640 0.1391750 94
[0.81 0.91 0.13 0.91] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999927619 0.0512890 30

21
[0.81 0.91 0.13 0.91] Identity 0.001 0.01 1 [NaN NaN NaN NaN] NaN 0.0337580 17
[0.81 0.91 0.13 0.91] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999999 0.0577020 26
[0.81 0.91 0.13 0.91] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999955 0.0280640 10
[0.81 0.91 0.13 0.91] Identity 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0415910 13
[0.81 0.91 0.13 0.91] Program's Hessian 0.1 0.01 1 [1.03 .01 -0.99 2.00] -167.279983904904 0.2144600 156
[0.81 0.91 0.13 0.91] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999620384 0.0843110 46
[3.16 0.49 1.39 2.73] Identity 0.1 0.01 1 [1 0 -1 2] -167.279998954778 0.1396510 89
[3.16 0.49 1.39 2.73] Identity 0.01 0.01 1 [NaN, NaN, NaN, NaN] NaN 0.0152440 9
[3.16 0.49 1.39 2.73] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999990 0.0244790 9
[9.57 -4.85 -8.00 -1.42] Identity 0.1 0.01 1 [0.35 -0.05 -0.92 2.12] -167.272798242689 0.0848760 72
[9.57 -4.85 -8.00 -1.42] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999999861 0.0446150 31
[9.57 -4.85 -8.00 -1.42] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999953795 0.0294250 17
[9.57 -4.85 -8.00 -1.42] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999996731 0.0385920 15
[9.57 -4.85 -8.00 -1.42] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999715 0.0541070 22
[9.57 -4.85 -8.00 -1.42] Identity 0.000001 0.01 1 [1 0 -1 2] -167.279999999999 0.0781950 31
[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.1 0.01 1 [1.03 0.02 -0.99 2.00] -167.279959843584 0.0323530 23

22
[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999991203 0.0652370 41

23
Steepest Descent Method
x0 Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations
[0 0 0 0] 0.1 0.01 1 [-Inf NaN NaN NaN] NaN 0.8100010 377
[0 0 0 0] 0.01 0.01 1 [0.22 -0.41 -1.28 1.78] -167.276916941098 0.2438320 196
[0 0 0 0] 0.001 0.01 1 [0.89 -0.06 -1.05 1.96] -167.279933780104 8.3233250 6416
[0 0 0 0] 0.0001 0.01 1 [0.99 -0.01 -1 2] -167.279999267465 3.8602080 2417
[0 0 0 0] 0.00001 0.01 1 [1 0 -1 2] -167.279999997389 22.7155470 12213
[0 0 0 0] 0.000001 0.01 1 [1 0 -1 2] -167.279999999942 18.2888800 8432
[1 0 -1 2] 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0011080 0
[1 0 -1 2] 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0001110 0
[1 0 -1 2] 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000690 0
[1 0 -1 2] 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000660 0
[1 0 -1 2] 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000660 0
[1 0 -1 2] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000630 0
[0.81 0.91 0.13 0.91] 0.1 0.01 1 [-Inf NaN NaN NaN] NaN 0.3082390 377
[0.81 0.91 0.13 0.91] 0.01 0.01 1 [1.16 0.19 -0.83 2.15] -167.279100661797 1.0436020 984
[0.81 0.91 0.13 0.91] 0.001 0.01 1 [1.12 0.07 -0.95 2.04] -167.279928943921 1.9716170 1514
[0.81 0.91 0.13 0.91] 0.0001 0.01 1 [1 0 -1 2] -167.279999266127 2.4358950 1833
[0.81 0.91 0.13 0.91] 0.00001 0.01 1 [1 0 -1 2] -167.279999999681 1.9436270 999
[0.81 0.91 0.13 0.91] 0.000001 0.01 1 [1 0 -1 2] -167.279999999926 5.8643590 2553
[3.16 0.49 1.39 2.73] 0.1 0.01 1 [-Inf NaN NaN NaN] NaN 0.3068300 377
[3.16 0.49 1.39 2.73] 0.01 0.01 1 [2.02 0.58 -0.59 2.32] -167.274489679393 6.4725080 5980
[3.16 0.49 1.39 2.73] 0.001 0.01 1 [1.11 0.061 -0.96 2.03] -167.279938709635 17.6293290 15514
[3.16 0.49 1.39 2.73] 0.0001 0.01 1 [1.01 0.01 -1 2] -167.279999278511 8.2242300 5651
[3.16 0.49 1.39 2.73] 0.00001 0.01 1 [1 0 -1 2] -167.279999997974 1.9071060 1631
[3.16 0.49 1.39 2.73] 0.000001 0.01 1 [1 0 -1 2] -167.279999999979 4.4262350 1784
[9.57 -4.85 -8.00 -1.42] 0.1 0.01 1 [-Inf Inf NaN NaN] NaN 0.3026670 376
[9.57 -4.85 -8.00 -1.42] 0.01 0.01 1 [1.75 0.35 -0.77 2.16] -167.277168000179 3.8968330 3948
[9.57 -4.85 -8.00 -1.42] 0.001 0.01 1 [1.12 0.07 -0.95 2.04] -167.279928743320 5.0956410 3914

24
[9.57 -4.85 -8.00 -1.42] 0.0001 0.01 1 [1 0 -1 2] -167.279999264172 3.6328800 2661
[9.57 -4.85 -8.00 -1.42] 0.00001 0.01 1 [1 0 -1 2] -167.279999998846 25.4060450 16377
[9.57 -4.85 -8.00 -1.42] 0.000001 0.01 1 [1 0 -1 2] -167.279999999938 4.1823240 C

25
BFGS Method
x0 Starting Hessian Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations
[0 0 0 0] Identity 0.1 0.01 1 [0.22 -0.41 -1.28 1.79] -167.276910497404 0.0077950 5
[0 0 0 0] Identity 0.01 0.01 1 [0.61 -0.31 -1.26 1.78] -167.278342371136 0.0092760 5
[0 0 0 0] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999999944 0.0167320 6
[0 0 0 0] Identity 0.0001 0.01 0.0001 [1 0 -1 2] -167.279999999619 0.0260180 7
[0 0 0 0] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999982 0.0232830 6
[0 0 0 0] Identity 0.000001 0.01 1 [1 0 -1 2] -167.279999999999 0.0269040 6
[0 0 0 0] Program's Hessian 0.1 0.01 1 [0.19 -0.41 -1.28 1.78] -167.276642869312 0.0094350 7
[0 0 0 0] Program's Hessian 0.01 0.01 1 [0.19 -0.42 -1.28 1.79] -167.276717196021 0.0057380 4
[0 0 0 0] Program's Hessian 0.0001 0.01 0.00000001 [NaN, NaN, NaN, NaN] NaN 0.0415680 13
[0 0 0 0] Program's Hessian 0.00001 0.01 0.1 [1 0 -1 2] -167.279999999938 0.0834650 22
[0 0 0 0] Program's Hessian 0.000001 0.01 0.00000001 [NaN NaN NaN NaN] NaN 1.4213690 2330
[1 0 -1 2] Identity 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0002520 0
[1 0 -1 2] Identity 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0001150 0
[1 0 -1 2] Identity 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000720 0
[1 0 -1 2] Identity 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000700 0
[1 0 -1 2] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000680 0
[1 0 -1 2] Identity 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000670 0
[0.81 0.91 0.13 0.91] Identity 0.1 0.01 1 [1.05 0.30 -0.69 2.29] -167.275083269658 0.0100910 4
[0.81 0.91 0.13 0.91] Identity 0.01 0.01 1 [1.24 0.14 -0.90 2.08] -167.279686990669 0.0090960 5

26
[0.81 0.91 0.13 0.91] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999999376 0.0185580 7
[0.81 0.91 0.13 0.91] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999860 0.0183590 6
[0.81 0.91 0.13 0.91] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0260800 8
[0.81 0.91 0.13 0.91] Identity 0.000001 0.01 0.01 [1 0 -1 2] -167.279999999999 0.1026530 43
[0.81 0.91 0.13 0.91] Program's Hessian 0.1 0.01 1 [1.02 0.32 -0.66 2.33] -167.273452580260 0.0098400 7
[0.81 0.91 0.13 0.91] Program's Hessian 0.01 0.01 1 [1.28 0.16 -0.89 2.09] -167.279598023217 0.0122180 7
[0.81 0.91 0.13 0.91] Program's Hessian 0.0001 0.01 0.000001 [NaN NaN NaN NaN] NaN 1.0398700 1601
[0.81 0.91 0.13 0.91] Program's Hessian 0.000001 0.01 0.000000001 [NaN NaN NaN NaN] NaN 0.0878040 78
[3.16 0.49 1.39 2.73] Identity 0.1 0.01 1 [ 2.95 0.95 -0.37 2.47] -167.260921728458 0.0148570 7
[3.16 0.49 1.39 2.73] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999971584 0.0201850 8
[3.16 0.49 1.39 2.73] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999998218 0.0196480 7
[3.16 0.49 1.39 2.73] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999995 0.0273770 9
[3.16 0.49 1.39 2.73] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999995 0.0163080 5
[3.16 0.49 1.39 2.73] Identity 0.000001 0.01 0.0001 [1 0 -1 2] -167.279999999999 0.0242500 5
[3.16 0.49 1.39 2.73] Program's Hessian 0.1 0.01 1 [3 1.49 0.19 2.99] -167.244594503201 0.0097640 7
[3.16 0.49 1.39 2.73] Program's Hessian 0.000001 0.01 1 [NaN NaN NaN NaN] NaN 32.2398260 11663
[9.57 -4.85 -8.00 -1.42] Identity 0.1 0.01 1 [ 2.00 0.58 -0.58 2.33] -167.274520285135 0.0099090 6
[9.57 -4.85 -8.00 -1.42] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999526260 0.0190540 8
[9.57 -4.85 -8.00 -1.42] Identity 0.001 0.01 0.1 [1 0 -1 2] -167.279999999709 0.0216050 7
[9.57 -4.85 -8.00 -1.42] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999999 0.0256380 8
[9.57 -4.85 -8.00 -1.42] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0312400 9
[9.57 -4.85 -8.00 -1.42] Identity 0.000001 0.01 0.1 [1 0 -1 2] -167.279999999999 0.0293100 7

27

28
Newton's Method For k = 10,000,000 All tolerances at
0.01
T set to 1 (adjusted as
necessary to get convergence)
Starting Point xmin fmin Iterations Elapsed Time (s)
[0 0 0 0] [-0.024771532289335 0.310732926395592 -0.788760177923239 0.529801104907597] -133.560228943395 39 0.03467
[1 0 -1 2] [-0.024771837343072 0.310733028722944 -0.788760301760888 0.529800846242057] -133.560228943387 143 0.253257
[0.81 0.91 0.13 0.91] [-0.024771522265604 0.310732911352949 -0.788760170844310 0.529801124775638] -133.560228943395 173 0.183464
[3.16 0.49 1.39 2.73] [-0.024771527241936 0.310732906803061 -0.788760174684506 0.529801121482055] -133.560228943395 240 0.243349
[9.57 -4.85 -8.00 -1.42] [-0.024771536889997 0.310732912478593 -0.788760169890084 0.529801124846407] -133.560228943395 250 0.310806
BFGS ID
[0 0 0 0] [-0.024770871284853 0.310733636611044 -0.788758902260001 0.529802618520184] -133.560228943197 111 0.112444
[1 0 -1 2] Refused to return an answer
[0.81 0.91 0.13 0.91] Refused to return an answer
[9.57 -4.85 -8.00 -1.42] Refused to return an answer
BFGS Hess
[0 0 0 0] Refused to return an answer
[1 0 -1 2] Refused to return an answer
SR1 ID
[0 0 0 0] [-0.024771567761620 0.310733441325913 -0.788761623923610 0.529802126005156] -133.560304375650 378 1.013921
[1 0 -1 2] [-0.024771544747263 0.310733182847815 -0.788761385804056 0.529802633470826] -133.560304375634 819 2.392423
[0.81 0.91 0.13 0.91] [-0.024777470366075 0.310725229879482 -0.788760807092803 0.529807865053035] -133.560304368405 619 1.81805
[9.57 -4.85 -8.00 -1.42] [-0.024770231258035 0.310730257079946 -0.788765912375838 0.529797671405605] -133.560304373593 821 1.490982

29
SR1 Hess
[0 0 0 0] [-0.024773909306389 0.310738355421821 -0.788757020809805 0.529805987358458] -133.560304372956 1976 1.343549
[1 0 -1 2] [-0.024775941428827 0.310732519348041 -0.788760367888054 0.529804332231798] -133.560304374565 1734 2.006957
[0.81 0.91 0.13 0.91] [-0.024769720656432 0.310730507404563 -0.788765053684288 0.529798826909580] -133.560304374229 2254 2.432939
[3.16 0.49 1.39 2.73] [-0.024767341442569 0.310733094291290 -0.788764366771071 0.529798443639664] -133.560304374047 2184 2.63281

30
MATLAB
Objective Function File: f.m
function val = f(x)
% This m-file is the objective function for our unconstrained
% nonlinear program.
% Definitions.
c = [5.04; -59.4; 146.4; -96.6];
hessian = [
0.16 -1.2 2.4 -1.4;
-1.2 12 -27 16.8;
2.4 -27 64.8 -42;
-1.4 16.8 -42 28];
xs = [x(1); x(2); x(3); x(4)];
val = c' * xs + 0.5 * xs' * hessian * xs;
end
Gradient Function File: gradf.m
function grad = gradf(x)
% This is the gradient function of our objective function, f.m.
% Definitions.
c = [5.04 -59.4 146.4 -96.6];
hessian = [
0.16 -1.2 2.4 -1.4;
-1.2 12 -27 16.8;
2.4 -27 64.8 -42;
-1.4 16.8 -42 28];
xs = [x(1) x(2) x(3) x(4)];
grad = c + xs * hessian;
end
Hessian Function File: hessf.m
function hessian = hessf(x)
% The hessian of our objective function, f.m.
% Note the the hessian is independent of x.
hessian = [
0.16 -1.2 2.4 -1.4;
-1.2 12 -27 16.8;
2.4 -27 64.8 -42;
-1.4 16.8 -42 28];
end
Objective Function File, Penalty Method: fpen.m
function val = f(x,k)
% The objective function implemented with the L2 penalty method.

31
% Evaluate with increasing values of k to simulate evaluating the limit
% as k approaches infinity.
% Ensure that k is the same for fpen, gradfpen and hessfpen.
% Definitions
k = 10000000;
c = [5.04 -59.4 146.4 -96.6];
hessian = [
0.16 -1.2 2.4 -1.4;
-1.2 12 -27 16.8;
2.4 -27 64.8 -42;
-1.4 16.8 -42 28];
xs = [x(1); x(2); x(3); x(4)];
constraint = 1-(x(1))^2-(x(2))^2-(x(3))^2-(x(4))^2;
% Function
val = c * xs + 0.5 * xs' * hessian * xs + (k/2)*(constraint)^2;
end
Gradient Function File, Penalty Method: gradfpen.m
function grad = gradf(x,k)
% The gradient function implemented with the L2 penalty method.
% Ensure that k is the same for fpen, gradfpen and hessfpen.
%Definitions
k = 10000000;
g = ((x(1))^2+(x(2))^2+(x(3))^2+(x(4))^2-1);
c = [5.04 -59.4 146.4 -96.6];
hessian = [
0.16 -1.2 2.4 -1.4;
-1.2 12 -27 16.8;
2.4 -27 64.8 -42;
-1.4 16.8 -42 28];
xs = [x(1) x(2) x(3) x(4)];
% Function
grad =c + xs * hessian + [2*k*x(1)*g; 2*k*x(2)*g; 2*k*x(3)*g; 2*k*x(4)*g]';
end
Hessian Function File, Penalty Method: hessfpen.m
function hessian = hessf(x,k)
% This is the hessian of the L2 penalty method for our program.
% Ensure that k is the same for fpen, gradfpen and hessfpen
%Definitions
k = 10000000;
h11 = 2*k*(3*(x(1))^2 + (x(2))^2 + (x(3))^2 + (x(4))^2-1);

32
h22 = 2*k*((x(1))^2 + 3*(x(2))^2 + (x(3))^2 + (x(4))^2-1);
h33 = 2*k*((x(1))^2 + (x(2))^2 + 3*(x(3))^2 + (x(4))^2-1);
h44 = 2*k*((x(1))^2 + (x(2))^2 + (x(3))^2 + 3*(x(4))^2-1);
unchessian = [
0.16 -1.2 2.4 -1.4;
-1.2 12 -27 16.8;
2.4 -27 64.8 -42;
-1.4 16.8 -42 28];
% Function
hessian = unchessian + [h11 4*k*x(1)*x(2) 4*k*x(1)*x(3) 4*k*x(1)*x(4);...
4*k*x(1)*x(2) h22 4*k*x(2)*x(3) 4*k*x(4)*x(2);...
4*k*x(1)*x(3) 4*k*x(3)*x(2) h33 4*k*x(3)*x(4);...
4*k*x(1)*x(4) 4*k*x(4)*x(2) 4*k*x(3)*x(4) h44];
end
Constraint File: MATLAB Optimisation Tool
function [c, ceq] = xtx(x)
% The constraint as required for the MATLAB Optimization Tool.
% c is the set of nonlinear inequality constraints. Empty in our case.
c = [];
% ceq is the set of nonlinear equality constraints.
ceq = x(1)^2 + x(2)^2 + x(3)^2 + x(4)^2 - 1;
end

33
SR1 Quasi-Newton Method: NewtonMethod_SR1.m
% INPUT:
%
% f - the multivariable function to minimise (a separate
% user-defined MATLAB function m-file)
%
% gradf - function which returns the gradient vector of f evaluated
% at x (also a separate user-defined MATLAB function
% m-file)
%
% x0 - the starting iterate
%
% tolerance1 - tolerance for stopping criterion of algorithm
%
% tolerance2 - tolerance for stopping criterion of line minimisation (eg:
% in golden section search)
%
% H0 - a matrix used as the first approximation to the hessian.
% Updated as the algorithm progresses
%
% T - parameter used by the "improved algorithm for
% finding an upper bound for the minimum" along
% each given descent direction
%
% OUTPUT:
%
% xminEstimate - estimate of the minimum
%
% fminEstimate - the value of f at xminEstimate
function [xminEstimate, fminEstimate, iteration] = NewtonMethod_SR1(f,...
gradf, x0,H0,tolerance1, tolerance2, T)
tic %starts timer
k = 0; % initialize iteration counter
iteration_number=0; %initialise count
xk = x0; % row vector
xk_old=x0; % row vector
H_old=H0; % square matrix
while ( norm(feval(gradf, xk)) >= tolerance1 )
iteration_number = iteration_number + 1;
H_old = H_old / max(max(H_old)); % Correction if det H_old gets
% too large or small
dk = transpose(-H_old*transpose(feval(gradf, xk))); % gives dk as
% a row vector
% minimise f with respect to t in the direction dk, which involves
% two steps:
% (1) find upper and lower bound, [a,b], for the stepsize t using
% the "improved procedure" presented in the lecture notes
[a, b] = multiVariableHalfOpen(f, xk, dk, T);
% (2) use golden section algorithm (suitably modified for
% functions of more than one variable) to estimate the

34
% stepsize t in [a,b] which minimises f in the direction dk
% starting at xk
[tmin, fmin] = multiVariableGoldenSectionSearch(f, a, b, tolerance2,...
xk, dk);
% note: we do not actually need fmin, but we do need tmin
% update the iteration counter and the current iterate
k = k + 1;
xk = xk + tmin*dk;
xk_new = xk_old + tmin*dk;
% update the hessian approximation
gk = (feval(gradf, xk_new) - feval(gradf, xk_old))'; %column vector
s = (xk_new - xk)' - (H_old * gk);
st = s';
H_new = H_old + (s * st) / (st * gk);
% keep track of the old values
xk_old=xk_new;
H_old=H_new;
end
% assign output values
toc
xminEstimate = xk;
fminEstimate = feval(f,xminEstimate)
iteration = iteration_number

MAST30013 Techniques in Operations Research

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to MAST30013 Techniques in Operations Research

Similar to MAST30013 Techniques in Operations Research (20)

MAST30013 Techniques in Operations Research