SlideShare a Scribd company logo
1 of 34
Download to read offline
1
MAST30013 Techniques in Operations Research
Newton's Method: A Comparative Analysis of Algorithmic Convergence Efficiency
T.Lee thlee@student.unimelb.edu.au
J.Rigby j.rigby@student.unimelb.edu.au
L.Russell lrussell@student.unimelb.edu.au
Department of Mathematics and Statistics
University of Melbourne
2
Summary:
Objective
By considering a specific problem the aim of this project is to provide an example of the
implementation of traditional Newtonian Methods in multivariate minimization applications. The
effectiveness of three methods: Newton’s, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) and
Symmetric Rank 1 (SR1) are to be examined. By analysing the performance of these algorithms in
minimising a quadratic and convex function, a recommendation will be given as to the best method
to apply to such a case.
These methods cannot be applied to all multivariate functions and for a given problem a technique
may 'succeed' where others 'fail'. This success may manifest as convergence at a faster rate or in the
form of algorithmic robustness. Investigation into which conditions facilitate the successful and
efficient application of the chosen algorithms will be undertaken.
Minimization algorithms require varying degrees of further information in addition to the objective
function. Newtonian methods, in a relative sense, require more information than other common
methods like the Steepest Descent Method (SDM). The advantages and disadvantages of this further
requisite information will be examined.
Findings and Conclusions
For the specific quadratic and convex non-linear program outlined subsequently in the introduction,
it was found that Newton’s Method performed best for both the constrained and unconstrained
problem. It was computationally quicker, usually requiring less iterations to solve the problems and
achieved greater convergence accuracy than the BFGS, SR1 and SDM methods, consistent with the
outlined theory.
The ‘ideal’ nature of the specific case required further evaluation of the other methods without
consideration given to Newton’s method. For a constrained problem the SR1 method achieved non-
trivial convergence success partly attributed to the algorithms flexibility in not always choosing a
descent direction (that is, positive definiteness of the approximated hessian is not imposed).
Recommendations
Whilst theoretical convergence rates are greater for Newton’s method than for Quasi-Newton
methods and the SDM in turn, the applicability of certain algorithms is highly dependent on the
specifics of a problem. For more complex general cases the advantages of using quasi-Newtonian
methods become apparent as hessian inversion becomes more computationally taxing.
Analysis suggests that when seeking to minimize quadratic and convex nonlinear programs,
Newton's Method appears to perform better than any of the other tested methods.
3
Introduction:
The Objective Function
This project investigates the advantages and disadvantages of using three variations of Newton's
Method and contrasts their convergence efficiencies against one another as well as the Steepest
Descent Method for the general case and the specified problem:
min 𝑓(𝒙) = 𝒄 𝑇
𝒙 +
1
2
𝒙 𝑇
𝐻𝒙
where
𝒄 = [5.04, −59.4, 146.4, 96.6] 𝑇
, 𝐻 = [
0.16 −1.2 2.4 −1.4
−1.2 12.0 −27.0 16.8
2.4 −27.0 64.8 −42.0
−1.4 16.8 −42.0 28.0
]
A constrained case, where the objective function is subject to the constraint 𝒙 𝑇
𝒙 = 1, is considered
and analysed with Newtonian methods implementing an L2 penalty program. The results are
contrasted against output from the MATLAB Optimisation Tool.
In the following sections this report will detail a method of analysing the constrained and
unconstrained objective function and compare algorithmic output to analytical solutions. Results will
be contrasted with theory and meaningful conclusions made where grounded by empirical evidence.
Any results requiring further substantiation will be discussed subsequently.
Newton's Method
Newton's Method (also known as the Newton-Raphson Method) is a method for minimising an
unconstrained multivariate function. Given a starting point, this method approximates the objective
function with a second order Taylor polynomial and proceeds to minimise this approximation by
moving in the 'Newton direction'. The output is subsequently used as the new starting point and the
process is iteratively repeated (Chong and Zak, 2008).
Newton's Method seeks to increase the rate of convergence by using second order information of
the function it is operating on. This second order information is the hessian function, denoted 𝛻2
𝑓.
The 'Newton direction' mentioned above is defined to be:
𝒅 𝑘
≔ −𝛻2
𝑓(𝒙 𝑘
)
−1
𝛻𝑓(𝒙 𝑘
)
where 𝒙 𝑘
denotes a particular iterate point of the algorithm. It is defined this way such that if the
second order approximation to a given function held exactly, then it would be minimised in one step
with a step size of 1.
BFGS and SR1
Minimisation via Newton's Method requires the calculation of the gradient and hessian matrices. In
addition, the hessian must also be inverted. This inversion can be quite computationally expensive
(standard techniques are known to be 𝑂(𝑛3
) )(Hauser, 2012) and may not be defined. Additionally,
4
the Newton direction is not necessarily guaranteed to be a descent direction if the hessian is not
positive definite. These two potential problems may affect the quality of any implementation of
Newton’s method and give rise to the need for quasi-Newton methods.
The Broyden-Fletcher-Goldfarb-Shanno (BFGS) and Symmetric Rank 1 (SR1) Quasi-Newton methods
have been formulated specifically in order to bypass these concerns by approximating the hessian
from successively calculated gradient vectors (Farzin and Wah, 2012). Both methods attempt to
satisfy the secant equation at each iteration:
𝒙 𝑘+1
− 𝒙 𝑘
= 𝐻 𝑘+1(∇𝑓(𝒙 𝑘+1
) − ∇𝑓(𝒙 𝑘
))
During each iteration the approximated hessian is said to ‘updated’ in a manner dependent on the
method:
BFGS Update
𝐻 𝑘+1 = 𝐻 𝑘 +
1 + 〈𝒓 𝑘
, 𝒈 𝑘〉
〈𝒔 𝑘, 𝒈 𝑘〉
𝒔 𝑘
(𝒔 𝑘
) 𝑇
− [𝒔 𝑘
(𝒓 𝑘
) 𝑇
+ 𝒓 𝑘
(𝒔 𝑘
) 𝑇
]
where
𝒔 𝑘
= 𝒙 𝑘+1
− 𝒙 𝑘
, 𝒈 𝑘
= 𝛁f(𝒙k+1
) − 𝛁f(𝒙 𝑘
), 𝒓 𝑘
=
𝐻 𝑘 𝒈 𝑘
〈𝒔 𝑘, 𝒈 𝑘〉
𝐻 𝑘+1 is an approximation to the inverse of the hessian. The BFGS update always satisfies the secant
equation and maintains positive definiteness of the hessian approximation (if initialized as such).
Additionally the BFGS update satisfies the useful symmetry property 𝐻 𝑘+1 = 𝐻 𝑘+1
𝑇
. It can also be
shown that 𝐻 𝑘+1 differs from its predecessor by a rank-2 matrix (Nocedal and Wright, 1999).
SR1 Update
The SR1 update is a simpler rank-1 update that also maintains the symmetry of the hessian and
seeks to (but does not always) satisfy the secant equation. (Nocedal and Wright, 1999)
𝐻 𝑘+1 = 𝐻 𝑘 +
(∆ 𝒙 𝑘
− 𝐻 𝑘 𝒚 𝑘)(∆ 𝒙 𝑘
− 𝐻 𝑘 𝒚 𝑘) 𝑇
(∆ 𝒙 𝑘 − 𝐻 𝑘 𝒚 𝑘) 𝑇 𝒚 𝑘
where
𝒚 𝑘
= ∇𝑓( 𝒙 𝑘
+ ∆ 𝒙 𝑘) − ∇𝑓( 𝒙 𝑘).
As with the BFGS method, 𝐻 𝑘+1 is an approximation to the inverse of the hessian. This update does
not guarantee that the update be positive definite and subsequently does not ensure that following
iterations always move in descent directions. In practice, the approximated hessians generated by
the SR1 method exhibit faster converge towards the true hessian inverse than the BFGS method
(Conn, Gould and Toint, 1991). A known drawback affecting the robustness of the SR1 method is
that the denominator can vanish (Nocedal & Wright 1999). Where this is the case algorithmic
robustness can be increased simply by skipping the updating process of troublesome iteration.
5
Method:
Analysis Method
First a theoretical analysis of the function was completed. Then each of the methods of minimisation
was applied to the outlined objective function in both its constrained and unconstrained form using
MATLAB algorithms (see appendices: MATLAB). The unconstrained results from these three methods
have been compared to each other and to the Steepest Descent Method in order to draw
conclusions about the best method to apply. The constrained program results were compared to
results calculated from the MATLAB Optimisation Tool. The criteria analysed were, time taken and
number of iterations taken to converge on global minimum (within a tolerance value), accuracy of 𝒙∗
and algorithmic robustness.
Analysis of Unconstrained Objective Function
The properties of the considered problem need to be determined to draw meaningful conclusions
from output results. Knowledge about the behaviour of this function, specifically the type of
function, location and number of any stationary points, the class of stationary points and whether
they are local or global can be determined for this case.
Functions of the following form are defined as quadratic:
𝑓(𝑥) = 𝛼 + 〈𝒄, 𝒙〉 +
1
2
〈𝒙, 𝐵𝒙〉 = 𝛼 + 𝒄 𝑇
𝒙 +
1
2
𝒙 𝑇
𝐵𝒙
The specified function is of this form with 𝛼 = 0, 𝐵 = 𝐻 and 𝒄 = 𝑐 as detailed above. This
identification means that:
𝛻𝑓(𝒙) = 𝒄 + 𝐻𝒙
𝛻2
𝑓(𝒙) = 𝐻
A particular feature of quadratic functions is that they are convex if and only if their hessian is
positive semi-definite. Calculating the eigenvalues of H (via MATLAB) reveals:
𝜆1 = 0.0066657144469
𝜆2 = 0.0591221937911
𝜆3 = 1.4840596546051
𝜆4 = 103.4101524371569
Since each eigenvalue is positive, this tells us that H is positive definite (see appendices: Proofs).
Positive definite matrices are also positive semi-definite, and so the quadratic function is also
convex.
The function under investigation is quadratic and convex, hence solvable via matrix algebra. For a
convex and at least 𝐶1
function, 𝛻𝑓(𝒙∗) = 0 if and only if 𝒙∗
is a global minimum of f. Note that by
the fact that the function is quadratic, it must be at least 𝐶1
. Therefore:
𝛻𝑓(𝒙) = 0
⟺ 𝐻𝒙∗
+ 𝒄 = 0
6
⟺ 𝐻𝒙∗
= −𝒄
⟺ 𝒙∗
= −𝐻−1
𝒄
For the specified function:
𝐻−1
= [
100 50 33.333 25
50 33.333 25 20
33.333 25 20 16.667
25 20 16.667 14.286
]
And recall:
𝒄 = [
5.04
−59.4
146.4
−96.6
]
This implies that:
𝒙∗
= − [
100 50 33.333 25
50 33.333 25 20
33.333 25 20 16.667
25 20 16.667 14.286
] ∗ [
5.04
−59.4
146.4
−96.6
] = [
1
0
−1
2
]
So 𝒙∗
= [1 0 −1 2] 𝑇
is the global minimum to the nonlinear function. There will be no other
minimums of this function. It is expected that the algorithms converge to this point.
Thus, the result in calculating f(x*) = -167.28.
Note that the inverse of the hessian matrix for the program is a constant matrix, that is to say that it
does not change for all elements of ℝ4
. Recall that one of the potential problems with Newton's
method was that the hessian may not be invertible or positive definite. Both of these cases are not
true for this specified problem.
Implementation of Algorithms
Each algorithm under investigation has been written into MATLAB code and included in the
appendices. They each call on a common univariate line-search algorithm, the Golden Section
Search, one of the better methods in the class of robust interval reducing methods.(Arora, 2011)
Similarly, they implement an algorithm for finding an upper bound on the location of the minimum
on a half-open interval which doubles the incremented step size with each iteration.
Each algorithm has the following parameters that can be altered:
 𝑥0
 The starting point for the algorithm.
 Tolerance 1
 The stopping criteria for the particular algorithm. In all cases, this is a check of the
magnitude of the gradient vector at a particular iteration point 𝒙 𝑘
against 0. If it is
‘close enough’ to zero, the algorithm will end. ‘Close enough’ is defined as the value
set for this tolerance.
7
 Tolerance 2
 The stopping criteria for the Golden Section Search as detailed above. This value sets
how large the interval estimate will be when the line search is complete.
 T
 The parameter used the Multi Variable Half Open Interval Search nested in each of
the algorithms. 2(𝑘−1)
𝑇 is the increase to the upper bound during each iteration
when trying to find an interval on which the minimum of the approximation must
exist.
 𝐻0
 This is the ‘starting hessian’, thought of as an approximation to the inverse hessian
of the program. It is only present in the BFGS and SR1 methods.
In this paper, 𝒙0, tolerance 1 and 𝐻0 values (where appropriate) have been varied and the effects
analysed. The effect of changing tolerance 2 and the value of T have not been analysed because they
do not relate directly (see above) to the methods under investigation.
It is expected that the result of altering 𝒙0 will depend on the distance of 𝒙0 from the global
minimum. The expectation is that the closer 𝒙0 is to the minimum the less time and number of
iterations the algorithm will be expected to take to converge. Having a more strict tolerance (that is,
bringing tolerance 1 closer to 0) should result in an increased number of iterations and time taken.
The algorithm in question must get closer to the true global minimum in order to comply with the
more strict tolerance, and hence is expected to require more computational time. Additionally, the
effect of changing 𝐻0 will be expected to depend on how well 𝐻0 approximates the true hessian
inverse of the function. A better approximation (such as giving the algorithm the function’s true
hessian inverse to begin with) would be expected to reduce the amount of computation time
required for convergence. However, the idea behind the BFGS and SR1 methods is to avoid
calculating the inverse of the hessian directly. As such, the hessian inverse will not be used as a 𝐻0 in
order to simulate more realistic conditions under which these two methods might be implemented.
Constrained Case
In solving the constrained case, two approaches were taken. The first was to solve the nonlinear
program using the MATLAB Optimization Tool, specifically via the interior point and active set
algorithms. This gave the solution point as well as some data and intuition in regards to the time
taken and iterations required to solve such a constrained problem (alternate analytical approaches
could have been used to provide this reference value, see discussion). Since the algorithms
implemented by the MATLAB Optimization Tool are specifically designed to solve constrained
nonlinear programs, the expectation was that they would outperform the Newtonian algorithms
under investigation. The second method for solving the constrained case was via the L2 penalty
method. Converting the constrained nonlinear program into an unconstrained nonlinear program
allowed for the Newtonian algorithms to be implemented.
The specified constraint is:
𝒙 𝑇
𝒙 = 1
⟺ 𝑥1
2
+ 𝑥2
2
+ 𝑥3
2
+ 𝑥4
2
− 1 = 0
8
The L2 penalty method requires that this constraint be converted into a penalty term and added to
the objective function. So, rather than minimizing the original objective function with the above
constraint, the function to be minimised was instead:
𝑃𝑘(𝑥) = 𝒄 𝑇
𝒙 +
1
2
𝒙 𝑇
𝐻𝒙 +
𝑘
2
(𝑥1
2
+ 𝑥2
2
+ 𝑥3
2
+ 𝑥4
2
− 1)2
where 𝒄 and 𝐻 are defined as above, and 𝑘 being the parameter of the penalty term.
The algorithms under investigation require the calculation of both the gradient function and the
hessian matrix. As shown above, in the unconstrained case, the hessian is a symmetric, positive
definite matrix, perfect for implementation of Newtonian methods. When the hessian matrix is
calculated in order to implement these algorithms for solving the constrained case, the positive
definiteness of the matrix is potentially lost. This is a possible cause for any non-convergence issues
arising from implementation of the Newtonian methods. For the equations for the gradient function
and hessian matrix see appendices: MATLAB.
The L2 penalty method analytically finds the minimum point by evaluating 𝒙∗
= lim 𝑘→∞ 𝒙 𝑘
. When
solving for the minimum point numerically using the Newtonian algorithms, a small value of 𝑘 was
chosen and increased in order to simulate this limiting process. It was expected that as the value of
𝑘 increased the minimum point the algorithms found converged on the value of 𝑥∗
found by the
MATLAB Optimization tool.
Theoretical Convergence
Under specific circumstances, the various methods exhibit differing rates of convergence.
 For an initial point sufficiently close to the minimum, if the hessian is positive definite, a
local/global minimum actually exists, the step size at each iteration satisfies the Armijo-
Goldstein and Wolfe conditions and f is C3
, then the rate of convergence for Newton's
Method is quadratic. (see appendices for univariate proof).
 Quasi-Newton methods are known to exhibit superlinear convergence under certain
circumstances:
 The BFGS Method can be shown to converge to the global minimum at a superlinear
rate if the starting hessian is positive definite and the objective function is twice
differentiable and convex. (Powell, 1976)
 Likewise, the SR1 method, exhibits superlinear convergence under the same
conditions. (Nocedal and Wright, 1999).
 As an aside, the Steepest Descent Method is known to converge at a linear rate. In addition,
it is not adversely affected, as the Newtonian methods are known to be, by horizontal
asymptotes where divergence is sometimes observed.
The correlation of results to these theoretical rates was examined. It is expected that Newton's
Method will converge the quickest. The quasi Newtonian methods are expected to be the next
fastest methods to converge followed lastly by the Steepest Descent Method.
9
Results, Conclusions and Recommendations:
Results for Unconstrained Case
The performance of the chosen methods was analysed for the specified objective function by
choosing 10 starting points, 𝒙0 (8 randomly generated, [0 0 0 0] and the known minimum at [1 0 -1
2]).
Point 𝑥1 𝑥2 𝑥3 𝑥4
1 0 0 0 0
2 1 0 -1 2
3 0.81 0.91 0.13 0.91
4 3.16 0.49 1.39 2.73
5 9.57 -4.85 8.00 -1.42
6 14.66 -4.04 12.09 3.42
7 1.60 3.35 -16.46 18.81
8 -0.30 -7.47 -17.90 1.86
9 2.23 11.37 15.51 17.57
10 7.52 -10.79 -10.14 17.14
Table 1: List of starting points
*Starting points truncated to two decimal places.
Results shown here were generated with tolerance values of 0.01 for T and tolerances 1,2 (see
appendices for full list of all results).
Disregarding data sets that failed to return a value for x* and sets where the number of iterations
exceeded the average by >500% the following table of average values was generated:
Newton's Method Steepest Descent BFGS (hessian) BFGS (identity) SR1 (hessian) SR1 (identity)
x(1) 0.99998 1.01376 1.06218 0.94505 1.00033 0.996722222
x(2) -0.00007 0.01406 0.03947 -0.02716 0.00152 0.002833333
x(3) -1.00019 -0.9882 -0.97035 -1.01809 -0.99881 -0.996133333
x(4) 1.99986 2.00992 2.02388 1.98639 2.00088 2.003922222
f(x*) -167.28 -167.2749 -167.27892 -167.27965 -167.28 -167.2799889
Elapsed Time (s) 0.0040865 0.1965647 0.0100982 0.014235375 0.0119615 0.0125613
Iterations 9 1010 9.2 5.75 24.6 24.1
Elapsed Time per
Iteration (s) 0.000454056 0.000194619 0.00109763 0.002475717 0.00048624 0.000521216
Table 2: Summary of algorithms performances (averaged values given robust algorithm implementation)
The BFGS and SR1 methods were both ran using 𝐻 (as defined in the introduction) and the 4x4
identity as 𝐻0 inputs.
All algorithms converged on the global minimum at 𝒙∗
= [1 0 − 1 2]; 𝑓(𝒙∗) = −167.28 for all
staring point except in one instance of the SR1 method initialsed with the identity matrix from point
7. However given the ‘shallowness’ of the function and the tolerances, outlying 𝒙∗
values were
occasionally generated. This ‘shallowness’ refers to, based on the results in the appendix, the
relatively small magnitude of the gradient vector at many points of the objective function close to
10
the minimum. To this end, the BFGS Method and Steepest Descent Method often stopped only part
way to the global minimum.
For a given set of parameters it was often the case that the BFGS method regularly converged faster
than Newton’s method albeit with less precision. However, for this ideal quadratic objective function
Newton’s Method was less prone to ‘getting stuck’ (see appendices data: BFGS (identity) had runs
where iteration values were 28905 and 126675) and always took the least time to converge. The
accuracy of calculated 𝒙∗
was greatest for Newton’s Method followed by SR1, BFGS and Steepest
Descent in decreasing order of accuracy. As expected the quasi-Newtonian methods each converged
faster (from these points, by an order of magnitude) than the Steepest Descent method. The
average iteration of the Steepest Descent method, however, took less time to compute than the
other methods. A cause of this could be that all other methods require operations on a 4x4 matrix
such as an inversion of a hessian or an update of the inverse hessian approximation at each
iteration, whereas the Steepest Descent Method requires only calculation of a gradient vector.
Tolerance Variation (see appendices for tables of results)
With tolerance 2 held constant at 0.01 and T held constant at 1:
For Newton's Method, the algorithm converged quickest with tolerance 1 values set at 0.0001.
When given the identity matrix for the starting iteration the efficiency of the SR1 method appeared
to generally increase as the tolerance 1 became stricter. When given the hessian of the program to
start with, there was no real discernible pattern as to what effect varying the tolerance 1 had. The
BFGS method, when given either the identity or the program's hessian to start with, behaved as
intuitively expected and computational time increased with tightened tolerances. Hence the results
varied and were not always consistent with our expectations. Further investigation and more data
would allow for greater quantitative analysis (see discussion).
With regard to the BFGS method, on occasion the value of T was altered to get the algorithm to
converge. This issue did not arise when implementing Newton’s Method or with the SR1 Method. As
T is only used in the Multi Variable Half Open Section Search portion of the algorithm, this points to
a possible incompatibility between this particular algorithm and the BFGS method under certain
conditions. Finding an alternative method for doing this task (e.g. using a step size that meets the
Armijo-Goldstein and Wolff conditions) would rectify this issue and may increase the robustness of
the BFGS method.
Summary
In summary, each method found the global minimum to the desired accuracy in a vast majority of
cases; however, Newton’s method was the most accurate method and took less computational time.
The convergence rates were mostly-consistent with theory in that the quasi-Newtonian algorithms
generally converged slower than Newton’s method and faster than the Steepest Descent Method.
Results for Constrained Case
Algorithmic performance given the original objective function constrained by 𝑥 𝑇
𝑥 = 1 was next
analysed. Firstly, the constrained case was solved using MATLAB’s Optimization Toolbox:
11
Interior Point Algorithm Active Set Algorithm
x(1) -0.025 -0.025
x(2) 0.311 0.311
x(3) -0.789 -0.789
x(4) 0.53 0.53
f(x*) -133.56022058 -133.56022058
Average Iterations 22.2 33.6
Elapsed Time (s) 1.05 2.01
Elapsed Time per Iteration (s) 0.047297 0.059821
Table 3: MATLAB Optimisation Tool Results
The following values were obtained using the first five starting points listed in Table 1.
To solve this constrained problem using Newton's Method and its variants, the L2 penalty method
was used. In applying the algorithms the following results were returned, with the penalty term
= 10,000,000 :
Newton's Method BFGS (Hessian) BFGS (Identity)* SR1 (Hessian) SR1 (Identity)
x(1) -0.02477 N/A -0.02477 -0.02477 -0.02477
x(2) 0.31073 N/A 0.31073 0.31073 0.31073
x(3) -0.78876 N/A -0.78876 -0.78876 -0.78876
x(4) 0.52980 N/A 0.52980 0.52980 0.52980
f(x*) -133.56022894 N/A -133.56022894 -133.5603044 -133.5603044
Elapsed Time (s) 0.2051092 N/A 0.112444 2.104064 1.678844
Iterations 169 N/A 111 2037 659
Elapsed Time per
Iteration (s)
0.001214 N/A 0.001013 0.001033 0.002546
Table 4: Newtonian Methods Constrained Problem Results
*BFGS (identity) only returned results for the [0 0 0 0] starting point. As such, its results are not
averaged. The rest of the results are averaged data returned for the five starting points.
The BFGS algorithm’s (see appendices: MATLAB) L2 implementation was especially fragile given this
constraint. It did not return any results when given the program's hessian for the staring iteration
and only found the minimum once when given the identity as a starting hessian. In that one case
where it found a minimum, the BFGS method found the same minimum as the other two algorithms,
closely matching MATLAB optimization output.
Slightly more robust was the SR1 method. It managed to find the minimum from more starting
points than the BFGS method did, although not from all starting points. This slightly increased
robustness did come at a cost however, with the SR1 method requiring a very large number of
iterations and a longer timeframe in which to operate, making it more computationally expensive to
use. The SR1 update does not impose positive definiteness on the updated hessian and this fact may
have contributed at the algorithms increased success rate when compared to the BFGS method.
In general, the relative reduced robustness of the BFGS and SR1 implementations, given the
specified problem, possibly stems from the fact that they are both quasi-Newton Methods derived
from the Secant Method. As quasi-Newton Methods are designed to avoid the computationally
expensive hessian inversion, the hessian is instead approximated using finite differences of the
function gradient, and data is interpolated with iterations (Indiana University, 2012). As Quasi-
12
Newton methods are multivariate generalisations of the secant method, the same problem exists for
both methods – namely, that if the initial values used are not close enough to x*, the methods may
fail to converge entirely. (Ohio University, 2012)
In contrast, Newton's Method performed exceptionally well. It found the minimum from all starting
points and did so relatively quickly in terms of both time and number of iterations.
Compared to the MATLAB optimization algorithms, all of the Newtonian algorithms took more
iterations to converge, as expected. Surprisingly, Newton's Method was able to outperform
MATLAB’s optimization algorithms with regard to speed.
Hessian Analysis (both programs)
For both the constrained and unconstrained cases, the BFGS Method converged in less iterations
and more accurately when it was started with the identity matrix as the inverse hessian
approximation. It was also more robust when starting with the identity, always converging in the
unconstrained case and at least finding the global minimum once in the constrained case. The
method was quicker in terms of elapsed time when given the program's hessian to start with. It is
therefore recommended that the identity be used as 𝐻0 when minimising a function such as this via
the BFGS Method.
In the unconstrained case, the difference in choosing 𝐻0 as the starting inverse hessian
approximation for the SR1 Method was negligible. There is no significant difference in the time
taken, accuracy or average number or iterations required to warrant recommending one particular
𝐻0 over the other. In the constrained case, whilst there may be no difference between the results in
terms of accuracy, there is a more pronounced disparity between iterations and time taken. Starting
with the function's hessian caused the SR1 Method to take nearly three times as many iterations and
almost a 20% longer timeframe. Therefore, using the identity matrix for 𝐻0 for this constrained case
is a much better alternative.
As was noted earlier, the hessian of this nonlinear function is a constant 4x4 matrix regardless of the
algorithm's current 𝒙 𝑘
. This means that it is not very computationally taxing to compute the hessian
and it only needs to be computed once. This deals with the problem of the hessian inversion being
computationally expensive. Since it is known that the hessian is positive definite, so too is the
inverse of the hessian. Thus, the Newton Direction will always be a descent direction. Whilst this is
only true for this function (and functions of similar forms), it means that Newton's Method behaved
very well in this particular problem.
Summary
Given an ideal function such as this, that is to say a quadratic and convex nonlinear program, based
on the above analysis, Newton's Method outperformed both of its variants (BFGS and SR1) and the
Steepest Descent Method. With the problem formulated using the L2 penalty method it is the best
algorithm to use for such a program.
13
Discussion:
The objective function analysed by this project was particularly suited to minimization via Newton’s
Method. For other programs, especially non-convex and non-quadratic ones, the results obtained by
this paper may not hold. The BFGS and SR1 method were formulated precisely because the relative
effectiveness of Newton’s method diminishes with increasing complexity.
In addition, the methodology implemented one class of many available algorithms which specifically
used the Golden Section Search in conjunction with a particular open interval search algorithm. A
variety of methods could have also been used to determine an appropriate step size to move during
each iteration. For example, step sizes satisfying the Armijo-Goldstein and Wolfe conditions would
be an appropriate choice. Hence, a whole family of dissimilar results could have been generated
from the same starting points using different algorithms which could just as easily be considered
Newtonian.
The analysis was very origin-centric in that the starting points were all within a relatively similar
distance from [0 0 0 0] 𝑇
and hence [1 0 −1 2] 𝑇
and [−0.025 0.311 − 0.789 0.53] 𝑇
, the
global minimums. Analysis from starting iterations further from the minimums should yield results
consistent with those generated by this report; further investigation is needed.
As discussed in the previous section, in such a nonlinear program as this, the inverse of the hessian
needs to only be calculated once. As long as it is known the hessian does not change for any point in
ℝ4
and inversion of that constant hessian matrix is computationally feasible, the coding for
Newton's Method used here could be adjusted to remove the evaluation and inversion of the
hessian at each iteration. Such a change would result in less calculations per iteration speeding up
the algorithm. The results returned for this particular case would be even better if such an
adjustment was made. The drawback of doing so is that the adjusted method could only be applied
to cases where the hessian is a constant matrix, severely restricting its applicability.
For the constrained case an analytical solve of using the KKT method would have been possible
albeit complicated and not solvable by simple linear algebra operations due to the quadratic nature
of the constraint. If this project had gone down this path instead of utilizing MATLAB’s optimization
tools an exact value of 𝒙∗
could have been used as a point of reference.
The shortcomings of the BFGS algorithm’s implementation of the L2 penalty method requires further
analysis and perhaps troubleshooting.
Finally, the analysis of varying the tolerances of the algorithms used by this report could have been
furthered with more systematically obtained data. It would be expected that for a general case
decreasing the ‘strictness’ of the major tolerance would decrease computational time taken (this
was not always the case: see results). By way of contrast, varying the tolerances of the open interval
search and golden section search would have been expected to exhibit different effects for different
starting points and for different problems. To optimize an algorithm a balance must be struck
between accuracy and time taken to generate an appropriate step length. Hence, ideal tolerance
values exist for different algorithm, different starting points and for each iteration. Further
investigation may reveal common properties of these ideal tolerances given the algorithms used.
14
References:
Arora, J. (2011). Introduction to Optimum Design [electronic resource]. p.42 Burlington Elsevier
Science.
Chong, E. Zak, S. (2008). An Introduction To Optimization 3rd Edition. pp. 155-156. John Wiley and
Sons.
Conn, A., Gould, N. and Toint, P. (1991). "Convergence of quasi-Newton matrices generated by the
symmetric rank one update". Mathematical Programming (Springer Berlin/ Heidelberg) 50 (1): pp.
177–195.
Farzin, K. and Wah, L. (2012). On the performance of a new symmetric rank-one method with restart
for solving unconstrained optimization problems. Computers and Mathematics with Applications,
Volume 64, Issue 6, September 2012, pp. 2141-2152,
http://www.sciencedirect.com/science/article/pii/S089812211200449X
Hauser, K. (2012). Lecture 6: Multivariate Newton's Method and Quasi-Newton Methods. p. 5.
Available online: http://homes.soic.indiana.edu/classes/spring2012/csci/b553-
hauserk/newtons_method.pdf
Indiana University. (2012). Lecture 6: Multivariate Newton's Method and Quasi-Newton methods.
Available online: http://homes.soic.indiana.edu/classes/spring2012/csci/b553-
hauserk/newtons_method.pdf
Indian Institute of Technology. (2002). Convergence of Newton-Raphson method. Available online:
http://ecourses.vtu.ac.in/nptel/courses/Webcourse-contents/IIT-
KANPUR/Numerical%20Analysis/numerical-analysis/Rathish-kumar/ratish-1/f3node7.html
Nocedal, J. and Wright, S.J. (1999). Numerical Optimization, pp. 220, 144.
Ohio University (2012). Lecture 6: Secant Methods. Available online:
http://www.math.ohiou.edu/courses/math3600/lecture6.pdf
Powell, M. (1976). 'Superlinear convergence Some global convergence properties of a variable
metric algorithm for minimization without exact line searches', Nonlinear Programming, Vol 4,
Society for Industrial and Applied Mathematics, p. 53
15
Appendices:
Proofs
Positive Eigenvalues imply invertiblity of matrix:
𝐷𝑒𝑓𝑖𝑛𝑒 𝑎 𝑝𝑜𝑙𝑦𝑛𝑜𝑚𝑖𝑎𝑙 𝑝(𝑡) = (𝑡 − 𝜆1)(𝑡 − 𝜆2) … (𝑡 − 𝜆 𝑛)
𝑇ℎ𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑡𝑒𝑟𝑚: (−1) 𝑛
𝜆1 𝜆2 … 𝜆 𝑛
𝐿𝑒𝑡 𝑝(𝑡) = det(𝑡𝐼 − 𝐴)
𝑊ℎ𝑒𝑟𝑒 𝐴 𝑖𝑠 𝑎 𝑠𝑞𝑢𝑎𝑟𝑒 𝑚𝑎𝑡𝑟𝑖𝑥: 𝑝(0) = 𝑑𝑒𝑡(−𝐴) = (−1) 𝑛
det(𝐴)
det(𝐴) = 𝜆1 𝜆2 … 𝜆 𝑛
if 𝜆1, 𝜆2,, … , 𝜆 𝑛 > 0
det(𝐴) ≠ 0
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝐴 𝑖𝑠 𝑖𝑛𝑣𝑒𝑟𝑡𝑖𝑏𝑒
Newton’s method Converges Quadratically for Univariate Case:
𝐿𝑒𝑡 𝑥𝑖 𝑏𝑒 𝑎 𝑟𝑜𝑜𝑡 𝑜𝑓 𝑓(𝑥) = 0
𝐿𝑒𝑡 𝑥 𝑛 𝑏𝑒 𝑎𝑛 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑥𝑖: |𝑥𝑖 − 𝑥 𝑛| = 𝜀 < 1
𝐵𝑦 𝑇𝑎𝑦𝑙𝑜𝑟 𝑠𝑒𝑟𝑖𝑒𝑠 𝑒𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛:
0 = 𝑓(𝑥𝑖) = 𝑓(𝑥 𝑛 + 𝜀) = 𝑓(𝑥 𝑛) + 𝑓′(𝑥 𝑛)( 𝑥𝑖 − 𝑥 𝑛) + 0.5𝑓′′(𝜉)( 𝑥𝑖 − 𝑥 𝑛)2
𝐹𝑜𝑟 𝑠𝑜𝑚𝑒 𝜉 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑥𝑖, 𝑥 𝑛
𝐹𝑜𝑟 𝑁𝑒𝑤𝑡𝑜𝑛′
𝑠 𝑚𝑒𝑡ℎ𝑜𝑑: −𝑓′(𝑥 𝑛)(𝑥 𝑛+1 − 𝑥 𝑛) = 𝑓(𝑥 𝑛)
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒: 0 = 𝑓′(𝑥 𝑛)(𝑥𝑖 − 𝑥 𝑛+1) + 0.5𝑓′′(𝜉)( 𝑥𝑖 − 𝑥 𝑛)2
( 𝑥𝑖 − 𝑥 𝑛), (𝑥𝑖 − 𝑥 𝑛+1) 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑒𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚𝑠 𝑓𝑜𝑟 𝑠𝑢𝑐𝑒𝑠𝑠𝑖𝑣𝑒 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠
(𝑥𝑖 − 𝑥 𝑛+1) ∝ ( 𝑥𝑖 − 𝑥 𝑛)2
Q.E.D.
(Indian Institute of Technology, 2002)
16
Data
All tolerance values 0.01:
Newton's Method:
x* f(x*) Elapsed Time (s) Iterations
1.0000 -0.0000 -1.0000 2.0000 -167.28 0.009944 10
1 0 -1 2 -167.28 0.000134 0
1.0000 -0.0001 -1.0001 2.0001 -167.28 0.004207 9
0.9999 -0.0000 -1.0002 2.0000 -167.28 0.001905 9
1.0002 -0.0001 -1.0002 1.9999 -167.28 0.00208 10
0.9999 0.0000 -1.0001 2.0000 -167.28 0.005061 11
1.0000 0.0001 -1.0004 1.9995 -167.28 0.003064 10
1.0000 0.0001 -0.9999 2.0000 -167.28 0.005054 11
0.9999 -0.0008 -1.0011 1.9990 -167.28 0.004321 9
0.9999 0.0001 -0.9999 2.0001 -167.28 0.005095 11
Steepest Descent:
x* f(x*) Elapsed Time (s) Iterations
0.2126 -0.4083 -1.2812 1.7836 -167.2769 0.00655 34
1 0 -1 2 -167.28 0.000116 0
1.1539 0.1993 -0.8189 2.1600 -167.2789 0.038846 106
2.1659 0.6656 -0.5248 2.3718 -167.2728 0.170743 798
1.7844 0.3563 -0.7776 2.1590 -167.2768 0.077666 418
2.1659 0.6648 -0.5257 2.3709 -167.2728 0.481597 2600
-0.1767 -0.6709 -1.4786 1.6256 -167.2727 0.274945 1399
-0.1749 -0.6699 -1.4779 1.6262 -167.2727 0.273263 1332
2.1767 0.6709 -0.5213 2.3744 -167.2727 0.303484 1575
-0.1702 -0.6672 -1.4760 1.6277 -167.2727 0.338437 1838
BFGS (hessian for starting iterate):
x* f(x*) Elapsed Time (s) Iterations
0.1915 -0.4173 -1.2823 1.7865 -167.2767 0.004697 6
1 0 -1 2 -167.28 0.000144 0
1.2740 0.1562 -0.8887 2.0870 -167.2796 0.002729 7
0.9981 0.0014 -0.9982 2.0018 -167.28 0.004855 10
2.1585 0.6613 -0.5280 2.3692 -167.2729 0.006476 7
1.0040 -0.0019 -1.0030 1.9967 -167.28 0.02334 11
0.9987 0.0005 -0.9991 2.0011 -167.28 0.016815 14
0.9946 -0.0028 -1.0020 1.9983 -167.28 0.016214 13
1.0029 -0.0024 -1.0021 1.9982 -167.28 0.010847 10
0.9995 -0.0003 -1.0001 2.0000 -167.28 0.014865 14
17
BFGS (identity for starting iterate):
x* f(x*) Elapsed Time (s) Iterations
0.2028 -0.4142 -1.2815 1.7865 -167.2768 0.001249 3
1 0 -1 2 -167.28 0.000297 0
1.2434 0.1361 -0.9034 2.0753 -167.2797 0.005207 5
1.0024 0.0016 -0.9992 2.0004 -167.28 0.003858 7
1.0118 0.0074 -0.9948 2.0039 -167.28 3.954961 28905
0.9991 -0.0006 -1.0005 1.9996 -167.28 0.079164 9
0.9976 -0.0041 -1.0046 1.9955 -167.28 0.004674 7
0.9953 -0.0024 -1.0012 1.9992 -167.28 0.008941 7
1.0007 0.0009 -0.9994 2.0004 -167.28 0.010493 8
0.9974 0.0037 -0.9963 2.0031 -167.28 16.554056 126675
SR1 (hessian for starting iterate):
x* f(x*) Elapsed Time (s) Iterations
1.0001 0.0001 -0.9999 1.9999 -167.28 0.011665 14
1 0 -1 2 -167.28 0.000299 0
1.0000 0.0000 -1.0000 2.0000 -167.28 0.007054 21
0.9985 -0.0008 -1.0003 2.0000 -167.28 0.013603 20
1.0173 0.0085 -0.9961 2.0015 -167.28 0.015117 19
0.9996 0.0115 -0.9892 2.0093 -167.28 0.007496 19
1.0000 0.0000 -1.0000 2.0000 -167.28 0.012073 37
0.9812 -0.0119 -1.0090 1.9927 -167.28 0.028734 52
1.0066 0.0056 -0.9956 2.0038 -167.28 0.013331 39
1.0000 0.0022 -0.9980 2.0016 -167.28 0.010243 25
SR1 (identity for starting iterate):
x* f(x*) Elapsed Time (s) Iterations
1.0014 -0.0007 -0.9997 2.0009 -167.28 0.023934 17
1 0 -1 2 -167.28 0.000314 0
1.0004 0.0001 -1.0001 1.9997 -167.28 0.015986 25
1.0053 0.0051 -0.9963 2.0025 -167.28 0.009545 30
0.9998 -0.0005 -1.0002 2.0000 -167.28 0.012679 37
1.0000 -0.0000 -1.0000 2.0000 -167.28 0.01236 34
0.9723 0.0241 -0.9692 2.0305 -167.2799 0.003297 9
NaN NaN NaN NaN NaN 0.006176 7
0.9991 -0.0003 -1.0002 1.9998 -167.28 0.013616 41
0.9922 -0.0023 -0.9995 2.0019 -167.28 0.027706 41
18
Newton's Method
x0 Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations
[0 0 0 0] 0.1 0.01 1 [ NaN -Inf -Inf -Inf] NaN 0.3685380 353
[0 0 0 0] 0.01 0.01 1 [1 0 -1 2] -167.279999996088 0.0294290 4
[0 0 0 0] 0.001 0.01 1 [1 0 -1 2] -167.279999999938 0.0160210 3
[0 0 0 0] 0.0001 0.01 1 [1 0 -1 2] -167.279999999976 0.0058770 2
[0 0 0 0] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0072560 2
[0 0 0 0] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0104300 3
[1 0 -1 2] 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0003140 0
[1 0 -1 2] 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0001230 0
[1 0 -1 2] 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000790 0
[1 0 -1 2] 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000750 0
[1 0 -1 2] 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000740 0
[1 0 -1 2] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000720 0
[0.81 0.91 0.13 0.91] 0.1 0.01 1 [NaN NaN NaN NaN] NaN 0.3632520 353
[0.81 0.91 0.13 0.91] 0.01 0.01 1 [1 0 -1 2] -167.279999998378 0.0060940 4
[0.81 0.91 0.13 0.91] 0.001 0.01 1 [1 0 -1 2] -167.279999999974 0.0052550 3
[0.81 0.91 0.13 0.91] 0.0001 0.01 1 [1 0 -1 2] -167.279999999990 0.0043530 2
[0.81 0.91 0.13 0.91] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0052190 2
[0.81 0.91 0.13 0.91] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0058110 3
[3.16 0.49 1.39 2.73] 0.1 0.01 1 [NaN NaN NaN NaN] NaN 0.3621590 353
[3.16 0.49 1.39 2.73] 0.01 0.01 1 [1 0 -1 2] -167.279999997557 0.0058490 4
[3.16 0.49 1.39 2.73] 0.001 0.01 1 [1 0 -1 2] -167.279999999961 0.0052710 3
[3.16 0.49 1.39 2.73] 0.0001 0.01 1 [1 0 -1 2] -167.279999999985 0.0043780 2
19
[3.16 0.49 1.39 2.73] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0051530 2
[3.16 0.49 1.39 2.73] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0116540 2
[9.57 -4.85 -8.00 -1.42] 0.1 0.01 1 [NaN NaN NaN NaN] NaN 0.3284300 352
[9.57 -4.85 -8.00 -1.42] 0.01 0.01 1 [1 0 -1 2] -167.279999995272 0.0051650 4
[9.57 -4.85 -8.00 -1.42] 0.001 0.01 1 [1 0 -1 2] -167.279999999925 0.0045340 3
[9.57 -4.85 -8.00 -1.42] 0.0001 0.01 1 [1 0 -1 2] -167.279999999971 0.0037930 2
[9.57 -4.85 -8.00 -1.42] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0045140 2
[9.57 -4.85 -8.00 -1.42] 0.000001 0.01 1 [1 0 -1 2] -167.279999999999 0.0072210 3
20
SR1 Method
x0 Starting Hessian Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations
[0 0 0 0] Identity 0.1 0.01 1 [1 0 -1 2] -167.279999128240 0.1688580 89
[0 0 0 0] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999575407 0.0692100 39
[0 0 0 0] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999999853 0.0587820 26
[0 0 0 0] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999999 0.1066900 38
[0 0 0 0] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999973 0.0380570 10
[0 0 0 0] Identity 0.000001 0.01 1 [NaN NaN NaN NaN] NaN 0.0307350 9
[0 0 0 0] Program's Hessian 0.1 0.01 1 [1.42 0.12 -0.94 2.05] -167.278456189735 0.0620180 31
[0 0 0 0] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999919554 0.0949530 57
[0 0 0 0] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999987586 0.0400010 21
[0 0 0 0] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.279999999990 0.0298450 13
[0 0 0 0] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.1016380 38
[0 0 0 0] Program's Hessian 0.000001 0.01 1 [1 0 -1 2] -167.279999999999 0.1077790 37
[1 0 -1 2] Identity 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0001780 0
[1 0 -1 2] Identity 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0000800 0
[1 0 -1 2] Identity 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000720 0
[1 0 -1 2] Identity 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000700 0
[1 0 -1 2] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000700 0
[1 0 -1 2] Identity 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000690 0
[1 0 -1 2] Program's Hessian 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0000880 0
[1 0 -1 2] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0000760 0
[1 0 -1 2] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000750 0
[1 0 -1 2] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000710 0
[1 0 -1 2] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000710 0
[1 0 -1 2] Program's Hessian 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000690 0
[0.81 0.91 0.13 0.91] Identity 0.1 0.01 1 [1.40 0.52 -0.52 2.43] -167.272369332640 0.1391750 94
[0.81 0.91 0.13 0.91] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999927619 0.0512890 30
21
[0.81 0.91 0.13 0.91] Identity 0.001 0.01 1 [NaN NaN NaN NaN] NaN 0.0337580 17
[0.81 0.91 0.13 0.91] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999999 0.0577020 26
[0.81 0.91 0.13 0.91] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999955 0.0280640 10
[0.81 0.91 0.13 0.91] Identity 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0415910 13
[0.81 0.91 0.13 0.91] Program's Hessian 0.1 0.01 1 [1.03 .01 -0.99 2.00] -167.279983904904 0.2144600 156
[0.81 0.91 0.13 0.91] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999620384 0.0843110 46
[0.81 0.91 0.13 0.91] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999999547 0.0196120 10
[0.81 0.91 0.13 0.91] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.279999999991 0.0290150 13
[0.81 0.91 0.13 0.91] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.279999999997 0.0357070 13
[0.81 0.91 0.13 0.91] Program's Hessian 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0538590 17
[3.16 0.49 1.39 2.73] Identity 0.1 0.01 1 [1 0 -1 2] -167.279998954778 0.1396510 89
[3.16 0.49 1.39 2.73] Identity 0.01 0.01 1 [NaN, NaN, NaN, NaN] NaN 0.0152440 9
[3.16 0.49 1.39 2.73] Identity 0.001 0.01 1 [NaN, NaN, NaN, NaN] NaN 0.0126580 6
[3.16 0.49 1.39 2.73] Identity 0.0001 0.01 1 [NaN, NaN, NaN, NaN] NaN 0.0322300 13
[3.16 0.49 1.39 2.73] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999990 0.0244790 9
[3.16 0.49 1.39 2.73] Identity 0.000001 0.01 1 [NaN, NaN, NaN, NaN] NaN 0.0194650 7
[3.16 0.49 1.39 2.73] Program's Hessian 0.1 0.01 1 [1 0 -1 2] -167.279999971364 0.0673550 41
[3.16 0.49 1.39 2.73] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999999102 0.0518930 30
[3.16 0.49 1.39 2.73] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999999986 0.0547630 29
[3.16 0.49 1.39 2.73] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.279999997643 0.0263000 9
[3.16 0.49 1.39 2.73] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0633330 25
[3.16 0.49 1.39 2.73] Program's Hessian 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0312680 10
[9.57 -4.85 -8.00 -1.42] Identity 0.1 0.01 1 [0.35 -0.05 -0.92 2.12] -167.272798242689 0.0848760 72
[9.57 -4.85 -8.00 -1.42] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999999861 0.0446150 31
[9.57 -4.85 -8.00 -1.42] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999953795 0.0294250 17
[9.57 -4.85 -8.00 -1.42] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999996731 0.0385920 15
[9.57 -4.85 -8.00 -1.42] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999715 0.0541070 22
[9.57 -4.85 -8.00 -1.42] Identity 0.000001 0.01 1 [1 0 -1 2] -167.279999999999 0.0781950 31
[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.1 0.01 1 [1.03 0.02 -0.99 2.00] -167.279959843584 0.0323530 23
22
[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999991203 0.0652370 41
[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999876384 0.0174150 9
[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.279999999999 0.0651250 29
[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.279999999997 0.0410020 13
[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0834600 25
23
Steepest Descent Method
x0 Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations
[0 0 0 0] 0.1 0.01 1 [-Inf NaN NaN NaN] NaN 0.8100010 377
[0 0 0 0] 0.01 0.01 1 [0.22 -0.41 -1.28 1.78] -167.276916941098 0.2438320 196
[0 0 0 0] 0.001 0.01 1 [0.89 -0.06 -1.05 1.96] -167.279933780104 8.3233250 6416
[0 0 0 0] 0.0001 0.01 1 [0.99 -0.01 -1 2] -167.279999267465 3.8602080 2417
[0 0 0 0] 0.00001 0.01 1 [1 0 -1 2] -167.279999997389 22.7155470 12213
[0 0 0 0] 0.000001 0.01 1 [1 0 -1 2] -167.279999999942 18.2888800 8432
[1 0 -1 2] 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0011080 0
[1 0 -1 2] 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0001110 0
[1 0 -1 2] 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000690 0
[1 0 -1 2] 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000660 0
[1 0 -1 2] 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000660 0
[1 0 -1 2] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000630 0
[0.81 0.91 0.13 0.91] 0.1 0.01 1 [-Inf NaN NaN NaN] NaN 0.3082390 377
[0.81 0.91 0.13 0.91] 0.01 0.01 1 [1.16 0.19 -0.83 2.15] -167.279100661797 1.0436020 984
[0.81 0.91 0.13 0.91] 0.001 0.01 1 [1.12 0.07 -0.95 2.04] -167.279928943921 1.9716170 1514
[0.81 0.91 0.13 0.91] 0.0001 0.01 1 [1 0 -1 2] -167.279999266127 2.4358950 1833
[0.81 0.91 0.13 0.91] 0.00001 0.01 1 [1 0 -1 2] -167.279999999681 1.9436270 999
[0.81 0.91 0.13 0.91] 0.000001 0.01 1 [1 0 -1 2] -167.279999999926 5.8643590 2553
[3.16 0.49 1.39 2.73] 0.1 0.01 1 [-Inf NaN NaN NaN] NaN 0.3068300 377
[3.16 0.49 1.39 2.73] 0.01 0.01 1 [2.02 0.58 -0.59 2.32] -167.274489679393 6.4725080 5980
[3.16 0.49 1.39 2.73] 0.001 0.01 1 [1.11 0.061 -0.96 2.03] -167.279938709635 17.6293290 15514
[3.16 0.49 1.39 2.73] 0.0001 0.01 1 [1.01 0.01 -1 2] -167.279999278511 8.2242300 5651
[3.16 0.49 1.39 2.73] 0.00001 0.01 1 [1 0 -1 2] -167.279999997974 1.9071060 1631
[3.16 0.49 1.39 2.73] 0.000001 0.01 1 [1 0 -1 2] -167.279999999979 4.4262350 1784
[9.57 -4.85 -8.00 -1.42] 0.1 0.01 1 [-Inf Inf NaN NaN] NaN 0.3026670 376
[9.57 -4.85 -8.00 -1.42] 0.01 0.01 1 [1.75 0.35 -0.77 2.16] -167.277168000179 3.8968330 3948
[9.57 -4.85 -8.00 -1.42] 0.001 0.01 1 [1.12 0.07 -0.95 2.04] -167.279928743320 5.0956410 3914
24
[9.57 -4.85 -8.00 -1.42] 0.0001 0.01 1 [1 0 -1 2] -167.279999264172 3.6328800 2661
[9.57 -4.85 -8.00 -1.42] 0.00001 0.01 1 [1 0 -1 2] -167.279999998846 25.4060450 16377
[9.57 -4.85 -8.00 -1.42] 0.000001 0.01 1 [1 0 -1 2] -167.279999999938 4.1823240 C
25
BFGS Method
x0 Starting Hessian Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations
[0 0 0 0] Identity 0.1 0.01 1 [0.22 -0.41 -1.28 1.79] -167.276910497404 0.0077950 5
[0 0 0 0] Identity 0.01 0.01 1 [0.61 -0.31 -1.26 1.78] -167.278342371136 0.0092760 5
[0 0 0 0] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999999944 0.0167320 6
[0 0 0 0] Identity 0.0001 0.01 0.0001 [1 0 -1 2] -167.279999999619 0.0260180 7
[0 0 0 0] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999982 0.0232830 6
[0 0 0 0] Identity 0.000001 0.01 1 [1 0 -1 2] -167.279999999999 0.0269040 6
[0 0 0 0] Program's Hessian 0.1 0.01 1 [0.19 -0.41 -1.28 1.78] -167.276642869312 0.0094350 7
[0 0 0 0] Program's Hessian 0.01 0.01 1 [0.19 -0.42 -1.28 1.79] -167.276717196021 0.0057380 4
[0 0 0 0] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999992937 0.0238710 9
[0 0 0 0] Program's Hessian 0.0001 0.01 0.00000001 [NaN, NaN, NaN, NaN] NaN 0.0415680 13
[0 0 0 0] Program's Hessian 0.00001 0.01 0.1 [1 0 -1 2] -167.279999999938 0.0834650 22
[0 0 0 0] Program's Hessian 0.000001 0.01 0.00000001 [NaN NaN NaN NaN] NaN 1.4213690 2330
[1 0 -1 2] Identity 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0002520 0
[1 0 -1 2] Identity 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0001150 0
[1 0 -1 2] Identity 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000720 0
[1 0 -1 2] Identity 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000700 0
[1 0 -1 2] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000680 0
[1 0 -1 2] Identity 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000670 0
[1 0 -1 2] Program's Hessian 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0003180 0
[1 0 -1 2] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0001160 0
[1 0 -1 2] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000720 0
[1 0 -1 2] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000700 0
[1 0 -1 2] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000680 0
[1 0 -1 2] Program's Hessian 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000680 0
[0.81 0.91 0.13 0.91] Identity 0.1 0.01 1 [1.05 0.30 -0.69 2.29] -167.275083269658 0.0100910 4
[0.81 0.91 0.13 0.91] Identity 0.01 0.01 1 [1.24 0.14 -0.90 2.08] -167.279686990669 0.0090960 5
26
[0.81 0.91 0.13 0.91] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999999376 0.0185580 7
[0.81 0.91 0.13 0.91] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999860 0.0183590 6
[0.81 0.91 0.13 0.91] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0260800 8
[0.81 0.91 0.13 0.91] Identity 0.000001 0.01 0.01 [1 0 -1 2] -167.279999999999 0.1026530 43
[0.81 0.91 0.13 0.91] Program's Hessian 0.1 0.01 1 [1.02 0.32 -0.66 2.33] -167.273452580260 0.0098400 7
[0.81 0.91 0.13 0.91] Program's Hessian 0.01 0.01 1 [1.28 0.16 -0.89 2.09] -167.279598023217 0.0122180 7
[0.81 0.91 0.13 0.91] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999992578 0.0266280 10
[0.81 0.91 0.13 0.91] Program's Hessian 0.0001 0.01 0.000001 [NaN NaN NaN NaN] NaN 1.0398700 1601
[0.81 0.91 0.13 0.91] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0368260 10
[0.81 0.91 0.13 0.91] Program's Hessian 0.000001 0.01 0.000000001 [NaN NaN NaN NaN] NaN 0.0878040 78
[3.16 0.49 1.39 2.73] Identity 0.1 0.01 1 [ 2.95 0.95 -0.37 2.47] -167.260921728458 0.0148570 7
[3.16 0.49 1.39 2.73] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999971584 0.0201850 8
[3.16 0.49 1.39 2.73] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999998218 0.0196480 7
[3.16 0.49 1.39 2.73] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999995 0.0273770 9
[3.16 0.49 1.39 2.73] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999995 0.0163080 5
[3.16 0.49 1.39 2.73] Identity 0.000001 0.01 0.0001 [1 0 -1 2] -167.279999999999 0.0242500 5
[3.16 0.49 1.39 2.73] Program's Hessian 0.1 0.01 1 [3 1.49 0.19 2.99] -167.244594503201 0.0097640 7
[3.16 0.49 1.39 2.73] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999963524 0.0209800 10
[3.16 0.49 1.39 2.73] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999915565 0.0278820 10
[3.16 0.49 1.39 2.73] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.279999999946 0.0292690 9
[3.16 0.49 1.39 2.73] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0413620 13
[3.16 0.49 1.39 2.73] Program's Hessian 0.000001 0.01 1 [NaN NaN NaN NaN] NaN 32.2398260 11663
[9.57 -4.85 -8.00 -1.42] Identity 0.1 0.01 1 [ 2.00 0.58 -0.58 2.33] -167.274520285135 0.0099090 6
[9.57 -4.85 -8.00 -1.42] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999526260 0.0190540 8
[9.57 -4.85 -8.00 -1.42] Identity 0.001 0.01 0.1 [1 0 -1 2] -167.279999999709 0.0216050 7
[9.57 -4.85 -8.00 -1.42] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999999 0.0256380 8
[9.57 -4.85 -8.00 -1.42] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0312400 9
[9.57 -4.85 -8.00 -1.42] Identity 0.000001 0.01 0.1 [1 0 -1 2] -167.279999999999 0.0293100 7
[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.1 0.01 1 [2.16 0.66 -0.53 2.37] -167.272880184292 0.0146060 10
27
[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.01 0.01 1 [2.16 0.66 -0.53 2.37] -167.272880718121 0.0105100 6
[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999999362 0.0412020 13
[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.279999999987 0.0335710 11
[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0322950 10
[9.57 -4.85 -8.00 -1.42] Program's Hessian 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 84.1332250 30847
28
Newton's Method For k = 10,000,000 All tolerances at
0.01
T set to 1 (adjusted as
necessary to get convergence)
Starting Point xmin fmin Iterations Elapsed Time (s)
[0 0 0 0] [-0.024771532289335 0.310732926395592 -0.788760177923239 0.529801104907597] -133.560228943395 39 0.03467
[1 0 -1 2] [-0.024771837343072 0.310733028722944 -0.788760301760888 0.529800846242057] -133.560228943387 143 0.253257
[0.81 0.91 0.13 0.91] [-0.024771522265604 0.310732911352949 -0.788760170844310 0.529801124775638] -133.560228943395 173 0.183464
[3.16 0.49 1.39 2.73] [-0.024771527241936 0.310732906803061 -0.788760174684506 0.529801121482055] -133.560228943395 240 0.243349
[9.57 -4.85 -8.00 -1.42] [-0.024771536889997 0.310732912478593 -0.788760169890084 0.529801124846407] -133.560228943395 250 0.310806
BFGS ID
[0 0 0 0] [-0.024770871284853 0.310733636611044 -0.788758902260001 0.529802618520184] -133.560228943197 111 0.112444
[1 0 -1 2] Refused to return an answer
[0.81 0.91 0.13 0.91] Refused to return an answer
[3.16 0.49 1.39 2.73] Refused to return an answer
[9.57 -4.85 -8.00 -1.42] Refused to return an answer
BFGS Hess
[0 0 0 0] Refused to return an answer
[1 0 -1 2] Refused to return an answer
[0.81 0.91 0.13 0.91] Refused to return an answer
[3.16 0.49 1.39 2.73] Refused to return an answer
[9.57 -4.85 -8.00 -1.42] Refused to return an answer
SR1 ID
[0 0 0 0] [-0.024771567761620 0.310733441325913 -0.788761623923610 0.529802126005156] -133.560304375650 378 1.013921
[1 0 -1 2] [-0.024771544747263 0.310733182847815 -0.788761385804056 0.529802633470826] -133.560304375634 819 2.392423
[0.81 0.91 0.13 0.91] [-0.024777470366075 0.310725229879482 -0.788760807092803 0.529807865053035] -133.560304368405 619 1.81805
[3.16 0.49 1.39 2.73] Refused to return an answer
[9.57 -4.85 -8.00 -1.42] [-0.024770231258035 0.310730257079946 -0.788765912375838 0.529797671405605] -133.560304373593 821 1.490982
29
SR1 Hess
[0 0 0 0] [-0.024773909306389 0.310738355421821 -0.788757020809805 0.529805987358458] -133.560304372956 1976 1.343549
[1 0 -1 2] [-0.024775941428827 0.310732519348041 -0.788760367888054 0.529804332231798] -133.560304374565 1734 2.006957
[0.81 0.91 0.13 0.91] [-0.024769720656432 0.310730507404563 -0.788765053684288 0.529798826909580] -133.560304374229 2254 2.432939
[3.16 0.49 1.39 2.73] [-0.024767341442569 0.310733094291290 -0.788764366771071 0.529798443639664] -133.560304374047 2184 2.63281
[9.57 -4.85 -8.00 -1.42] Refused to return an answer
30
MATLAB
Objective Function File: f.m
function val = f(x)
% This m-file is the objective function for our unconstrained
% nonlinear program.
% Definitions.
c = [5.04; -59.4; 146.4; -96.6];
hessian = [
0.16 -1.2 2.4 -1.4;
-1.2 12 -27 16.8;
2.4 -27 64.8 -42;
-1.4 16.8 -42 28];
xs = [x(1); x(2); x(3); x(4)];
val = c' * xs + 0.5 * xs' * hessian * xs;
end
Gradient Function File: gradf.m
function grad = gradf(x)
% This is the gradient function of our objective function, f.m.
% Definitions.
c = [5.04 -59.4 146.4 -96.6];
hessian = [
0.16 -1.2 2.4 -1.4;
-1.2 12 -27 16.8;
2.4 -27 64.8 -42;
-1.4 16.8 -42 28];
xs = [x(1) x(2) x(3) x(4)];
grad = c + xs * hessian;
end
Hessian Function File: hessf.m
function hessian = hessf(x)
% The hessian of our objective function, f.m.
% Note the the hessian is independent of x.
hessian = [
0.16 -1.2 2.4 -1.4;
-1.2 12 -27 16.8;
2.4 -27 64.8 -42;
-1.4 16.8 -42 28];
end
Objective Function File, Penalty Method: fpen.m
function val = f(x,k)
% The objective function implemented with the L2 penalty method.
31
% Evaluate with increasing values of k to simulate evaluating the limit
% as k approaches infinity.
% Ensure that k is the same for fpen, gradfpen and hessfpen.
% Definitions
k = 10000000;
c = [5.04 -59.4 146.4 -96.6];
hessian = [
0.16 -1.2 2.4 -1.4;
-1.2 12 -27 16.8;
2.4 -27 64.8 -42;
-1.4 16.8 -42 28];
xs = [x(1); x(2); x(3); x(4)];
constraint = 1-(x(1))^2-(x(2))^2-(x(3))^2-(x(4))^2;
% Function
val = c * xs + 0.5 * xs' * hessian * xs + (k/2)*(constraint)^2;
end
Gradient Function File, Penalty Method: gradfpen.m
function grad = gradf(x,k)
% The gradient function implemented with the L2 penalty method.
% Evaluate with increasing values of k to simulate evaluating the limit
% as k approaches infinity.
% Ensure that k is the same for fpen, gradfpen and hessfpen.
%Definitions
k = 10000000;
g = ((x(1))^2+(x(2))^2+(x(3))^2+(x(4))^2-1);
c = [5.04 -59.4 146.4 -96.6];
hessian = [
0.16 -1.2 2.4 -1.4;
-1.2 12 -27 16.8;
2.4 -27 64.8 -42;
-1.4 16.8 -42 28];
xs = [x(1) x(2) x(3) x(4)];
% Function
grad =c + xs * hessian + [2*k*x(1)*g; 2*k*x(2)*g; 2*k*x(3)*g; 2*k*x(4)*g]';
end
Hessian Function File, Penalty Method: hessfpen.m
function hessian = hessf(x,k)
% This is the hessian of the L2 penalty method for our program.
% Evaluate with increasing values of k to simulate evaluating the limit
% as k approaches infinity.
% Ensure that k is the same for fpen, gradfpen and hessfpen
%Definitions
k = 10000000;
h11 = 2*k*(3*(x(1))^2 + (x(2))^2 + (x(3))^2 + (x(4))^2-1);
32
h22 = 2*k*((x(1))^2 + 3*(x(2))^2 + (x(3))^2 + (x(4))^2-1);
h33 = 2*k*((x(1))^2 + (x(2))^2 + 3*(x(3))^2 + (x(4))^2-1);
h44 = 2*k*((x(1))^2 + (x(2))^2 + (x(3))^2 + 3*(x(4))^2-1);
unchessian = [
0.16 -1.2 2.4 -1.4;
-1.2 12 -27 16.8;
2.4 -27 64.8 -42;
-1.4 16.8 -42 28];
% Function
hessian = unchessian + [h11 4*k*x(1)*x(2) 4*k*x(1)*x(3) 4*k*x(1)*x(4);...
4*k*x(1)*x(2) h22 4*k*x(2)*x(3) 4*k*x(4)*x(2);...
4*k*x(1)*x(3) 4*k*x(3)*x(2) h33 4*k*x(3)*x(4);...
4*k*x(1)*x(4) 4*k*x(4)*x(2) 4*k*x(3)*x(4) h44];
end
Constraint File: MATLAB Optimisation Tool
function [c, ceq] = xtx(x)
% The constraint as required for the MATLAB Optimization Tool.
% c is the set of nonlinear inequality constraints. Empty in our case.
c = [];
% ceq is the set of nonlinear equality constraints.
ceq = x(1)^2 + x(2)^2 + x(3)^2 + x(4)^2 - 1;
end
33
SR1 Quasi-Newton Method: NewtonMethod_SR1.m
% INPUT:
%
% f - the multivariable function to minimise (a separate
% user-defined MATLAB function m-file)
%
% gradf - function which returns the gradient vector of f evaluated
% at x (also a separate user-defined MATLAB function
% m-file)
%
% x0 - the starting iterate
%
% tolerance1 - tolerance for stopping criterion of algorithm
%
% tolerance2 - tolerance for stopping criterion of line minimisation (eg:
% in golden section search)
%
% H0 - a matrix used as the first approximation to the hessian.
% Updated as the algorithm progresses
%
% T - parameter used by the "improved algorithm for
% finding an upper bound for the minimum" along
% each given descent direction
%
% OUTPUT:
%
% xminEstimate - estimate of the minimum
%
% fminEstimate - the value of f at xminEstimate
function [xminEstimate, fminEstimate, iteration] = NewtonMethod_SR1(f,...
gradf, x0,H0,tolerance1, tolerance2, T)
tic %starts timer
k = 0; % initialize iteration counter
iteration_number=0; %initialise count
xk = x0; % row vector
xk_old=x0; % row vector
H_old=H0; % square matrix
while ( norm(feval(gradf, xk)) >= tolerance1 )
iteration_number = iteration_number + 1;
H_old = H_old / max(max(H_old)); % Correction if det H_old gets
% too large or small
dk = transpose(-H_old*transpose(feval(gradf, xk))); % gives dk as
% a row vector
% minimise f with respect to t in the direction dk, which involves
% two steps:
% (1) find upper and lower bound, [a,b], for the stepsize t using
% the "improved procedure" presented in the lecture notes
[a, b] = multiVariableHalfOpen(f, xk, dk, T);
% (2) use golden section algorithm (suitably modified for
% functions of more than one variable) to estimate the
34
% stepsize t in [a,b] which minimises f in the direction dk
% starting at xk
[tmin, fmin] = multiVariableGoldenSectionSearch(f, a, b, tolerance2,...
xk, dk);
% note: we do not actually need fmin, but we do need tmin
% update the iteration counter and the current iterate
k = k + 1;
xk = xk + tmin*dk;
xk_new = xk_old + tmin*dk;
% update the hessian approximation
gk = (feval(gradf, xk_new) - feval(gradf, xk_old))'; %column vector
s = (xk_new - xk)' - (H_old * gk);
st = s';
H_new = H_old + (s * st) / (st * gk);
% keep track of the old values
xk_old=xk_new;
H_old=H_new;
end
% assign output values
toc
xminEstimate = xk;
fminEstimate = feval(f,xminEstimate)
iteration = iteration_number

More Related Content

What's hot

The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Modeling of Granular Mixing using Markov Chains and the Discrete Element Method
Modeling of Granular Mixing using Markov Chains and the Discrete Element MethodModeling of Granular Mixing using Markov Chains and the Discrete Element Method
Modeling of Granular Mixing using Markov Chains and the Discrete Element Methodjodoua
 
Dimensional analysis
Dimensional analysisDimensional analysis
Dimensional analysisRonak Parmar
 
Fundamentals of Finite Difference Methods
Fundamentals of Finite Difference MethodsFundamentals of Finite Difference Methods
Fundamentals of Finite Difference Methods1rj
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
 
Bayesian approximation techniques of Topp-Leone distribution
Bayesian approximation techniques of Topp-Leone distributionBayesian approximation techniques of Topp-Leone distribution
Bayesian approximation techniques of Topp-Leone distributionPremier Publishers
 
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...Adam Fausett
 
Dimensional analysis - Part 1
Dimensional analysis - Part 1 Dimensional analysis - Part 1
Dimensional analysis - Part 1 Ramesh B R
 
Estimating Reconstruction Error due to Jitter of Gaussian Markov Processes
Estimating Reconstruction Error due to Jitter of Gaussian Markov ProcessesEstimating Reconstruction Error due to Jitter of Gaussian Markov Processes
Estimating Reconstruction Error due to Jitter of Gaussian Markov ProcessesMudassir Javed
 

What's hot (20)

Dimensional analysis
Dimensional analysisDimensional analysis
Dimensional analysis
 
Fem 1
Fem 1Fem 1
Fem 1
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Modeling of Granular Mixing using Markov Chains and the Discrete Element Method
Modeling of Granular Mixing using Markov Chains and the Discrete Element MethodModeling of Granular Mixing using Markov Chains and the Discrete Element Method
Modeling of Granular Mixing using Markov Chains and the Discrete Element Method
 
Dimensional analysis
Dimensional analysisDimensional analysis
Dimensional analysis
 
Fundamentals of Finite Difference Methods
Fundamentals of Finite Difference MethodsFundamentals of Finite Difference Methods
Fundamentals of Finite Difference Methods
 
DIMENSIONAL ANALYSIS (Lecture notes 08)
DIMENSIONAL ANALYSIS (Lecture notes 08)DIMENSIONAL ANALYSIS (Lecture notes 08)
DIMENSIONAL ANALYSIS (Lecture notes 08)
 
Q130402109113
Q130402109113Q130402109113
Q130402109113
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
 
Fem lecture
Fem lectureFem lecture
Fem lecture
 
Dimesional Analysis
Dimesional Analysis Dimesional Analysis
Dimesional Analysis
 
Bayesian approximation techniques of Topp-Leone distribution
Bayesian approximation techniques of Topp-Leone distributionBayesian approximation techniques of Topp-Leone distribution
Bayesian approximation techniques of Topp-Leone distribution
 
Lecture notes 01
Lecture notes 01Lecture notes 01
Lecture notes 01
 
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
 
Equilibrium method
Equilibrium methodEquilibrium method
Equilibrium method
 
Dimensional analysis - Part 1
Dimensional analysis - Part 1 Dimensional analysis - Part 1
Dimensional analysis - Part 1
 
Using Partitioned Design Matrices in Analyzing Nested-Factorial Experiments
Using Partitioned Design Matrices in Analyzing Nested-Factorial ExperimentsUsing Partitioned Design Matrices in Analyzing Nested-Factorial Experiments
Using Partitioned Design Matrices in Analyzing Nested-Factorial Experiments
 
Cu24631635
Cu24631635Cu24631635
Cu24631635
 
Estimating Reconstruction Error due to Jitter of Gaussian Markov Processes
Estimating Reconstruction Error due to Jitter of Gaussian Markov ProcessesEstimating Reconstruction Error due to Jitter of Gaussian Markov Processes
Estimating Reconstruction Error due to Jitter of Gaussian Markov Processes
 
N41049093
N41049093N41049093
N41049093
 

Viewers also liked

From gradient decent to CDN
From gradient decent to CDNFrom gradient decent to CDN
From gradient decent to CDNxiaoerxiaoer
 
Timelines at scale
Timelines at scaleTimelines at scale
Timelines at scaleViet Nt
 
Cxx.jl の紹介 The Julia C++ interface
Cxx.jl の紹介 The Julia C++ interfaceCxx.jl の紹介 The Julia C++ interface
Cxx.jl の紹介 The Julia C++ interfaceRyuichi YAMAMOTO
 
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015Susanna-Assunta Sansone
 
Day 1 Recap at #CannesLions 2013 / #OgilvyCannes
Day 1 Recap at #CannesLions 2013 / #OgilvyCannes Day 1 Recap at #CannesLions 2013 / #OgilvyCannes
Day 1 Recap at #CannesLions 2013 / #OgilvyCannes Ogilvy
 
solar power kits UNIVPO
solar power kits UNIVPOsolar power kits UNIVPO
solar power kits UNIVPOMark Robinson
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyChris Evelo
 
Nuevas tecnologías digitales que van a pisar bien fuerte
Nuevas tecnologías digitales que van a pisar bien fuerteNuevas tecnologías digitales que van a pisar bien fuerte
Nuevas tecnologías digitales que van a pisar bien fuertecdcomputadora12
 
Daily Newsletter: 15th April, 2011
Daily Newsletter: 15th April, 2011Daily Newsletter: 15th April, 2011
Daily Newsletter: 15th April, 2011Fullerton Securities
 
CIA: FLYING SAUCERS OVER BELGIAN CONGO URANIUM MINES
CIA: FLYING SAUCERS OVER BELGIAN CONGO URANIUM MINESCIA: FLYING SAUCERS OVER BELGIAN CONGO URANIUM MINES
CIA: FLYING SAUCERS OVER BELGIAN CONGO URANIUM MINESThierry Debels
 
Curriculum Vitae Ariel Salinas English
Curriculum Vitae Ariel Salinas EnglishCurriculum Vitae Ariel Salinas English
Curriculum Vitae Ariel Salinas Englishariel salinas
 
Rare Phenomena On Earth
Rare Phenomena On EarthRare Phenomena On Earth
Rare Phenomena On Earthixigo.com
 
نصوص متضاربه في الكتاب المقدس
نصوص متضاربه في الكتاب المقدسنصوص متضاربه في الكتاب المقدس
نصوص متضاربه في الكتاب المقدسabuthamer
 
أغرب حفلة زواج جماعي من نوعه !!!
أغرب حفلة زواج جماعي من نوعه !!!أغرب حفلة زواج جماعي من نوعه !!!
أغرب حفلة زواج جماعي من نوعه !!!hamzakzook
 

Viewers also liked (16)

From gradient decent to CDN
From gradient decent to CDNFrom gradient decent to CDN
From gradient decent to CDN
 
Timelines at scale
Timelines at scaleTimelines at scale
Timelines at scale
 
Cxx.jl の紹介 The Julia C++ interface
Cxx.jl の紹介 The Julia C++ interfaceCxx.jl の紹介 The Julia C++ interface
Cxx.jl の紹介 The Julia C++ interface
 
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015
 
Agile Financial Times Apr09
Agile Financial Times Apr09Agile Financial Times Apr09
Agile Financial Times Apr09
 
Day 1 Recap at #CannesLions 2013 / #OgilvyCannes
Day 1 Recap at #CannesLions 2013 / #OgilvyCannes Day 1 Recap at #CannesLions 2013 / #OgilvyCannes
Day 1 Recap at #CannesLions 2013 / #OgilvyCannes
 
solar power kits UNIVPO
solar power kits UNIVPOsolar power kits UNIVPO
solar power kits UNIVPO
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
 
Nuevas tecnologías digitales que van a pisar bien fuerte
Nuevas tecnologías digitales que van a pisar bien fuerteNuevas tecnologías digitales que van a pisar bien fuerte
Nuevas tecnologías digitales que van a pisar bien fuerte
 
Daily Newsletter: 15th April, 2011
Daily Newsletter: 15th April, 2011Daily Newsletter: 15th April, 2011
Daily Newsletter: 15th April, 2011
 
CIA: FLYING SAUCERS OVER BELGIAN CONGO URANIUM MINES
CIA: FLYING SAUCERS OVER BELGIAN CONGO URANIUM MINESCIA: FLYING SAUCERS OVER BELGIAN CONGO URANIUM MINES
CIA: FLYING SAUCERS OVER BELGIAN CONGO URANIUM MINES
 
Curriculum Vitae Ariel Salinas English
Curriculum Vitae Ariel Salinas EnglishCurriculum Vitae Ariel Salinas English
Curriculum Vitae Ariel Salinas English
 
Rare Phenomena On Earth
Rare Phenomena On EarthRare Phenomena On Earth
Rare Phenomena On Earth
 
نصوص متضاربه في الكتاب المقدس
نصوص متضاربه في الكتاب المقدسنصوص متضاربه في الكتاب المقدس
نصوص متضاربه في الكتاب المقدس
 
أغرب حفلة زواج جماعي من نوعه !!!
أغرب حفلة زواج جماعي من نوعه !!!أغرب حفلة زواج جماعي من نوعه !!!
أغرب حفلة زواج جماعي من نوعه !!!
 
Selecting & Installing WordPress Themes
Selecting & Installing WordPress ThemesSelecting & Installing WordPress Themes
Selecting & Installing WordPress Themes
 

Similar to MAST30013 Techniques in Operations Research

A New SR1 Formula for Solving Nonlinear Optimization.pptx
A New SR1 Formula for Solving Nonlinear Optimization.pptxA New SR1 Formula for Solving Nonlinear Optimization.pptx
A New SR1 Formula for Solving Nonlinear Optimization.pptxMasoudIbrahim3
 
A Comparison Of Iterative Methods For The Solution Of Non-Linear Systems Of E...
A Comparison Of Iterative Methods For The Solution Of Non-Linear Systems Of E...A Comparison Of Iterative Methods For The Solution Of Non-Linear Systems Of E...
A Comparison Of Iterative Methods For The Solution Of Non-Linear Systems Of E...Stephen Faucher
 
83662164 case-study-1
83662164 case-study-183662164 case-study-1
83662164 case-study-1homeworkping3
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...SSA KPI
 
Numerical Study of Some Iterative Methods for Solving Nonlinear Equations
Numerical Study of Some Iterative Methods for Solving Nonlinear EquationsNumerical Study of Some Iterative Methods for Solving Nonlinear Equations
Numerical Study of Some Iterative Methods for Solving Nonlinear Equationsinventionjournals
 
Applications and Analysis of Bio-Inspired Eagle Strategy for Engineering Opti...
Applications and Analysis of Bio-Inspired Eagle Strategy for Engineering Opti...Applications and Analysis of Bio-Inspired Eagle Strategy for Engineering Opti...
Applications and Analysis of Bio-Inspired Eagle Strategy for Engineering Opti...Xin-She Yang
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...IJITCA Journal
 
A Fast and Inexpensive Particle Swarm Optimization for Drifting Problem-Spaces
A Fast and Inexpensive Particle Swarm Optimization for Drifting Problem-SpacesA Fast and Inexpensive Particle Swarm Optimization for Drifting Problem-Spaces
A Fast and Inexpensive Particle Swarm Optimization for Drifting Problem-SpacesZubin Bhuyan
 
A Connectionist Approach To The Quadratic Assignment Problem
A Connectionist Approach To The Quadratic Assignment ProblemA Connectionist Approach To The Quadratic Assignment Problem
A Connectionist Approach To The Quadratic Assignment ProblemSheila Sinclair
 
A BRIEF SURVEY OF METHODS FOR SOLVING NONLINEAR LEAST-SQUARES PROBLEMS
A BRIEF SURVEY OF METHODS FOR SOLVING NONLINEAR LEAST-SQUARES PROBLEMSA BRIEF SURVEY OF METHODS FOR SOLVING NONLINEAR LEAST-SQUARES PROBLEMS
A BRIEF SURVEY OF METHODS FOR SOLVING NONLINEAR LEAST-SQUARES PROBLEMSShannon Green
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Willy Marroquin (WillyDevNET)
 
M.G.Goman, A.V.Khramtsovsky (1997) - Global Stability Analysis of Nonlinear A...
M.G.Goman, A.V.Khramtsovsky (1997) - Global Stability Analysis of Nonlinear A...M.G.Goman, A.V.Khramtsovsky (1997) - Global Stability Analysis of Nonlinear A...
M.G.Goman, A.V.Khramtsovsky (1997) - Global Stability Analysis of Nonlinear A...Project KRIT
 
Population based optimization algorithms improvement using the predictive par...
Population based optimization algorithms improvement using the predictive par...Population based optimization algorithms improvement using the predictive par...
Population based optimization algorithms improvement using the predictive par...IJECEIAES
 

Similar to MAST30013 Techniques in Operations Research (20)

A New SR1 Formula for Solving Nonlinear Optimization.pptx
A New SR1 Formula for Solving Nonlinear Optimization.pptxA New SR1 Formula for Solving Nonlinear Optimization.pptx
A New SR1 Formula for Solving Nonlinear Optimization.pptx
 
A Comparison Of Iterative Methods For The Solution Of Non-Linear Systems Of E...
A Comparison Of Iterative Methods For The Solution Of Non-Linear Systems Of E...A Comparison Of Iterative Methods For The Solution Of Non-Linear Systems Of E...
A Comparison Of Iterative Methods For The Solution Of Non-Linear Systems Of E...
 
83662164 case-study-1
83662164 case-study-183662164 case-study-1
83662164 case-study-1
 
S4101116121
S4101116121S4101116121
S4101116121
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...
 
10.1.1.34.7361
10.1.1.34.736110.1.1.34.7361
10.1.1.34.7361
 
Numerical Study of Some Iterative Methods for Solving Nonlinear Equations
Numerical Study of Some Iterative Methods for Solving Nonlinear EquationsNumerical Study of Some Iterative Methods for Solving Nonlinear Equations
Numerical Study of Some Iterative Methods for Solving Nonlinear Equations
 
Applications and Analysis of Bio-Inspired Eagle Strategy for Engineering Opti...
Applications and Analysis of Bio-Inspired Eagle Strategy for Engineering Opti...Applications and Analysis of Bio-Inspired Eagle Strategy for Engineering Opti...
Applications and Analysis of Bio-Inspired Eagle Strategy for Engineering Opti...
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
 
panel regression.pptx
panel regression.pptxpanel regression.pptx
panel regression.pptx
 
Ds33717725
Ds33717725Ds33717725
Ds33717725
 
Ds33717725
Ds33717725Ds33717725
Ds33717725
 
A Fast and Inexpensive Particle Swarm Optimization for Drifting Problem-Spaces
A Fast and Inexpensive Particle Swarm Optimization for Drifting Problem-SpacesA Fast and Inexpensive Particle Swarm Optimization for Drifting Problem-Spaces
A Fast and Inexpensive Particle Swarm Optimization for Drifting Problem-Spaces
 
A Connectionist Approach To The Quadratic Assignment Problem
A Connectionist Approach To The Quadratic Assignment ProblemA Connectionist Approach To The Quadratic Assignment Problem
A Connectionist Approach To The Quadratic Assignment Problem
 
A BRIEF SURVEY OF METHODS FOR SOLVING NONLINEAR LEAST-SQUARES PROBLEMS
A BRIEF SURVEY OF METHODS FOR SOLVING NONLINEAR LEAST-SQUARES PROBLEMSA BRIEF SURVEY OF METHODS FOR SOLVING NONLINEAR LEAST-SQUARES PROBLEMS
A BRIEF SURVEY OF METHODS FOR SOLVING NONLINEAR LEAST-SQUARES PROBLEMS
 
Finite Element Methods
Finite Element  MethodsFinite Element  Methods
Finite Element Methods
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...
 
App8
App8App8
App8
 
M.G.Goman, A.V.Khramtsovsky (1997) - Global Stability Analysis of Nonlinear A...
M.G.Goman, A.V.Khramtsovsky (1997) - Global Stability Analysis of Nonlinear A...M.G.Goman, A.V.Khramtsovsky (1997) - Global Stability Analysis of Nonlinear A...
M.G.Goman, A.V.Khramtsovsky (1997) - Global Stability Analysis of Nonlinear A...
 
Population based optimization algorithms improvement using the predictive par...
Population based optimization algorithms improvement using the predictive par...Population based optimization algorithms improvement using the predictive par...
Population based optimization algorithms improvement using the predictive par...
 

MAST30013 Techniques in Operations Research

  • 1. 1 MAST30013 Techniques in Operations Research Newton's Method: A Comparative Analysis of Algorithmic Convergence Efficiency T.Lee thlee@student.unimelb.edu.au J.Rigby j.rigby@student.unimelb.edu.au L.Russell lrussell@student.unimelb.edu.au Department of Mathematics and Statistics University of Melbourne
  • 2. 2 Summary: Objective By considering a specific problem the aim of this project is to provide an example of the implementation of traditional Newtonian Methods in multivariate minimization applications. The effectiveness of three methods: Newton’s, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) and Symmetric Rank 1 (SR1) are to be examined. By analysing the performance of these algorithms in minimising a quadratic and convex function, a recommendation will be given as to the best method to apply to such a case. These methods cannot be applied to all multivariate functions and for a given problem a technique may 'succeed' where others 'fail'. This success may manifest as convergence at a faster rate or in the form of algorithmic robustness. Investigation into which conditions facilitate the successful and efficient application of the chosen algorithms will be undertaken. Minimization algorithms require varying degrees of further information in addition to the objective function. Newtonian methods, in a relative sense, require more information than other common methods like the Steepest Descent Method (SDM). The advantages and disadvantages of this further requisite information will be examined. Findings and Conclusions For the specific quadratic and convex non-linear program outlined subsequently in the introduction, it was found that Newton’s Method performed best for both the constrained and unconstrained problem. It was computationally quicker, usually requiring less iterations to solve the problems and achieved greater convergence accuracy than the BFGS, SR1 and SDM methods, consistent with the outlined theory. The ‘ideal’ nature of the specific case required further evaluation of the other methods without consideration given to Newton’s method. For a constrained problem the SR1 method achieved non- trivial convergence success partly attributed to the algorithms flexibility in not always choosing a descent direction (that is, positive definiteness of the approximated hessian is not imposed). Recommendations Whilst theoretical convergence rates are greater for Newton’s method than for Quasi-Newton methods and the SDM in turn, the applicability of certain algorithms is highly dependent on the specifics of a problem. For more complex general cases the advantages of using quasi-Newtonian methods become apparent as hessian inversion becomes more computationally taxing. Analysis suggests that when seeking to minimize quadratic and convex nonlinear programs, Newton's Method appears to perform better than any of the other tested methods.
  • 3. 3 Introduction: The Objective Function This project investigates the advantages and disadvantages of using three variations of Newton's Method and contrasts their convergence efficiencies against one another as well as the Steepest Descent Method for the general case and the specified problem: min 𝑓(𝒙) = 𝒄 𝑇 𝒙 + 1 2 𝒙 𝑇 𝐻𝒙 where 𝒄 = [5.04, −59.4, 146.4, 96.6] 𝑇 , 𝐻 = [ 0.16 −1.2 2.4 −1.4 −1.2 12.0 −27.0 16.8 2.4 −27.0 64.8 −42.0 −1.4 16.8 −42.0 28.0 ] A constrained case, where the objective function is subject to the constraint 𝒙 𝑇 𝒙 = 1, is considered and analysed with Newtonian methods implementing an L2 penalty program. The results are contrasted against output from the MATLAB Optimisation Tool. In the following sections this report will detail a method of analysing the constrained and unconstrained objective function and compare algorithmic output to analytical solutions. Results will be contrasted with theory and meaningful conclusions made where grounded by empirical evidence. Any results requiring further substantiation will be discussed subsequently. Newton's Method Newton's Method (also known as the Newton-Raphson Method) is a method for minimising an unconstrained multivariate function. Given a starting point, this method approximates the objective function with a second order Taylor polynomial and proceeds to minimise this approximation by moving in the 'Newton direction'. The output is subsequently used as the new starting point and the process is iteratively repeated (Chong and Zak, 2008). Newton's Method seeks to increase the rate of convergence by using second order information of the function it is operating on. This second order information is the hessian function, denoted 𝛻2 𝑓. The 'Newton direction' mentioned above is defined to be: 𝒅 𝑘 ≔ −𝛻2 𝑓(𝒙 𝑘 ) −1 𝛻𝑓(𝒙 𝑘 ) where 𝒙 𝑘 denotes a particular iterate point of the algorithm. It is defined this way such that if the second order approximation to a given function held exactly, then it would be minimised in one step with a step size of 1. BFGS and SR1 Minimisation via Newton's Method requires the calculation of the gradient and hessian matrices. In addition, the hessian must also be inverted. This inversion can be quite computationally expensive (standard techniques are known to be 𝑂(𝑛3 ) )(Hauser, 2012) and may not be defined. Additionally,
  • 4. 4 the Newton direction is not necessarily guaranteed to be a descent direction if the hessian is not positive definite. These two potential problems may affect the quality of any implementation of Newton’s method and give rise to the need for quasi-Newton methods. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) and Symmetric Rank 1 (SR1) Quasi-Newton methods have been formulated specifically in order to bypass these concerns by approximating the hessian from successively calculated gradient vectors (Farzin and Wah, 2012). Both methods attempt to satisfy the secant equation at each iteration: 𝒙 𝑘+1 − 𝒙 𝑘 = 𝐻 𝑘+1(∇𝑓(𝒙 𝑘+1 ) − ∇𝑓(𝒙 𝑘 )) During each iteration the approximated hessian is said to ‘updated’ in a manner dependent on the method: BFGS Update 𝐻 𝑘+1 = 𝐻 𝑘 + 1 + 〈𝒓 𝑘 , 𝒈 𝑘〉 〈𝒔 𝑘, 𝒈 𝑘〉 𝒔 𝑘 (𝒔 𝑘 ) 𝑇 − [𝒔 𝑘 (𝒓 𝑘 ) 𝑇 + 𝒓 𝑘 (𝒔 𝑘 ) 𝑇 ] where 𝒔 𝑘 = 𝒙 𝑘+1 − 𝒙 𝑘 , 𝒈 𝑘 = 𝛁f(𝒙k+1 ) − 𝛁f(𝒙 𝑘 ), 𝒓 𝑘 = 𝐻 𝑘 𝒈 𝑘 〈𝒔 𝑘, 𝒈 𝑘〉 𝐻 𝑘+1 is an approximation to the inverse of the hessian. The BFGS update always satisfies the secant equation and maintains positive definiteness of the hessian approximation (if initialized as such). Additionally the BFGS update satisfies the useful symmetry property 𝐻 𝑘+1 = 𝐻 𝑘+1 𝑇 . It can also be shown that 𝐻 𝑘+1 differs from its predecessor by a rank-2 matrix (Nocedal and Wright, 1999). SR1 Update The SR1 update is a simpler rank-1 update that also maintains the symmetry of the hessian and seeks to (but does not always) satisfy the secant equation. (Nocedal and Wright, 1999) 𝐻 𝑘+1 = 𝐻 𝑘 + (∆ 𝒙 𝑘 − 𝐻 𝑘 𝒚 𝑘)(∆ 𝒙 𝑘 − 𝐻 𝑘 𝒚 𝑘) 𝑇 (∆ 𝒙 𝑘 − 𝐻 𝑘 𝒚 𝑘) 𝑇 𝒚 𝑘 where 𝒚 𝑘 = ∇𝑓( 𝒙 𝑘 + ∆ 𝒙 𝑘) − ∇𝑓( 𝒙 𝑘). As with the BFGS method, 𝐻 𝑘+1 is an approximation to the inverse of the hessian. This update does not guarantee that the update be positive definite and subsequently does not ensure that following iterations always move in descent directions. In practice, the approximated hessians generated by the SR1 method exhibit faster converge towards the true hessian inverse than the BFGS method (Conn, Gould and Toint, 1991). A known drawback affecting the robustness of the SR1 method is that the denominator can vanish (Nocedal & Wright 1999). Where this is the case algorithmic robustness can be increased simply by skipping the updating process of troublesome iteration.
  • 5. 5 Method: Analysis Method First a theoretical analysis of the function was completed. Then each of the methods of minimisation was applied to the outlined objective function in both its constrained and unconstrained form using MATLAB algorithms (see appendices: MATLAB). The unconstrained results from these three methods have been compared to each other and to the Steepest Descent Method in order to draw conclusions about the best method to apply. The constrained program results were compared to results calculated from the MATLAB Optimisation Tool. The criteria analysed were, time taken and number of iterations taken to converge on global minimum (within a tolerance value), accuracy of 𝒙∗ and algorithmic robustness. Analysis of Unconstrained Objective Function The properties of the considered problem need to be determined to draw meaningful conclusions from output results. Knowledge about the behaviour of this function, specifically the type of function, location and number of any stationary points, the class of stationary points and whether they are local or global can be determined for this case. Functions of the following form are defined as quadratic: 𝑓(𝑥) = 𝛼 + 〈𝒄, 𝒙〉 + 1 2 〈𝒙, 𝐵𝒙〉 = 𝛼 + 𝒄 𝑇 𝒙 + 1 2 𝒙 𝑇 𝐵𝒙 The specified function is of this form with 𝛼 = 0, 𝐵 = 𝐻 and 𝒄 = 𝑐 as detailed above. This identification means that: 𝛻𝑓(𝒙) = 𝒄 + 𝐻𝒙 𝛻2 𝑓(𝒙) = 𝐻 A particular feature of quadratic functions is that they are convex if and only if their hessian is positive semi-definite. Calculating the eigenvalues of H (via MATLAB) reveals: 𝜆1 = 0.0066657144469 𝜆2 = 0.0591221937911 𝜆3 = 1.4840596546051 𝜆4 = 103.4101524371569 Since each eigenvalue is positive, this tells us that H is positive definite (see appendices: Proofs). Positive definite matrices are also positive semi-definite, and so the quadratic function is also convex. The function under investigation is quadratic and convex, hence solvable via matrix algebra. For a convex and at least 𝐶1 function, 𝛻𝑓(𝒙∗) = 0 if and only if 𝒙∗ is a global minimum of f. Note that by the fact that the function is quadratic, it must be at least 𝐶1 . Therefore: 𝛻𝑓(𝒙) = 0 ⟺ 𝐻𝒙∗ + 𝒄 = 0
  • 6. 6 ⟺ 𝐻𝒙∗ = −𝒄 ⟺ 𝒙∗ = −𝐻−1 𝒄 For the specified function: 𝐻−1 = [ 100 50 33.333 25 50 33.333 25 20 33.333 25 20 16.667 25 20 16.667 14.286 ] And recall: 𝒄 = [ 5.04 −59.4 146.4 −96.6 ] This implies that: 𝒙∗ = − [ 100 50 33.333 25 50 33.333 25 20 33.333 25 20 16.667 25 20 16.667 14.286 ] ∗ [ 5.04 −59.4 146.4 −96.6 ] = [ 1 0 −1 2 ] So 𝒙∗ = [1 0 −1 2] 𝑇 is the global minimum to the nonlinear function. There will be no other minimums of this function. It is expected that the algorithms converge to this point. Thus, the result in calculating f(x*) = -167.28. Note that the inverse of the hessian matrix for the program is a constant matrix, that is to say that it does not change for all elements of ℝ4 . Recall that one of the potential problems with Newton's method was that the hessian may not be invertible or positive definite. Both of these cases are not true for this specified problem. Implementation of Algorithms Each algorithm under investigation has been written into MATLAB code and included in the appendices. They each call on a common univariate line-search algorithm, the Golden Section Search, one of the better methods in the class of robust interval reducing methods.(Arora, 2011) Similarly, they implement an algorithm for finding an upper bound on the location of the minimum on a half-open interval which doubles the incremented step size with each iteration. Each algorithm has the following parameters that can be altered:  𝑥0  The starting point for the algorithm.  Tolerance 1  The stopping criteria for the particular algorithm. In all cases, this is a check of the magnitude of the gradient vector at a particular iteration point 𝒙 𝑘 against 0. If it is ‘close enough’ to zero, the algorithm will end. ‘Close enough’ is defined as the value set for this tolerance.
  • 7. 7  Tolerance 2  The stopping criteria for the Golden Section Search as detailed above. This value sets how large the interval estimate will be when the line search is complete.  T  The parameter used the Multi Variable Half Open Interval Search nested in each of the algorithms. 2(𝑘−1) 𝑇 is the increase to the upper bound during each iteration when trying to find an interval on which the minimum of the approximation must exist.  𝐻0  This is the ‘starting hessian’, thought of as an approximation to the inverse hessian of the program. It is only present in the BFGS and SR1 methods. In this paper, 𝒙0, tolerance 1 and 𝐻0 values (where appropriate) have been varied and the effects analysed. The effect of changing tolerance 2 and the value of T have not been analysed because they do not relate directly (see above) to the methods under investigation. It is expected that the result of altering 𝒙0 will depend on the distance of 𝒙0 from the global minimum. The expectation is that the closer 𝒙0 is to the minimum the less time and number of iterations the algorithm will be expected to take to converge. Having a more strict tolerance (that is, bringing tolerance 1 closer to 0) should result in an increased number of iterations and time taken. The algorithm in question must get closer to the true global minimum in order to comply with the more strict tolerance, and hence is expected to require more computational time. Additionally, the effect of changing 𝐻0 will be expected to depend on how well 𝐻0 approximates the true hessian inverse of the function. A better approximation (such as giving the algorithm the function’s true hessian inverse to begin with) would be expected to reduce the amount of computation time required for convergence. However, the idea behind the BFGS and SR1 methods is to avoid calculating the inverse of the hessian directly. As such, the hessian inverse will not be used as a 𝐻0 in order to simulate more realistic conditions under which these two methods might be implemented. Constrained Case In solving the constrained case, two approaches were taken. The first was to solve the nonlinear program using the MATLAB Optimization Tool, specifically via the interior point and active set algorithms. This gave the solution point as well as some data and intuition in regards to the time taken and iterations required to solve such a constrained problem (alternate analytical approaches could have been used to provide this reference value, see discussion). Since the algorithms implemented by the MATLAB Optimization Tool are specifically designed to solve constrained nonlinear programs, the expectation was that they would outperform the Newtonian algorithms under investigation. The second method for solving the constrained case was via the L2 penalty method. Converting the constrained nonlinear program into an unconstrained nonlinear program allowed for the Newtonian algorithms to be implemented. The specified constraint is: 𝒙 𝑇 𝒙 = 1 ⟺ 𝑥1 2 + 𝑥2 2 + 𝑥3 2 + 𝑥4 2 − 1 = 0
  • 8. 8 The L2 penalty method requires that this constraint be converted into a penalty term and added to the objective function. So, rather than minimizing the original objective function with the above constraint, the function to be minimised was instead: 𝑃𝑘(𝑥) = 𝒄 𝑇 𝒙 + 1 2 𝒙 𝑇 𝐻𝒙 + 𝑘 2 (𝑥1 2 + 𝑥2 2 + 𝑥3 2 + 𝑥4 2 − 1)2 where 𝒄 and 𝐻 are defined as above, and 𝑘 being the parameter of the penalty term. The algorithms under investigation require the calculation of both the gradient function and the hessian matrix. As shown above, in the unconstrained case, the hessian is a symmetric, positive definite matrix, perfect for implementation of Newtonian methods. When the hessian matrix is calculated in order to implement these algorithms for solving the constrained case, the positive definiteness of the matrix is potentially lost. This is a possible cause for any non-convergence issues arising from implementation of the Newtonian methods. For the equations for the gradient function and hessian matrix see appendices: MATLAB. The L2 penalty method analytically finds the minimum point by evaluating 𝒙∗ = lim 𝑘→∞ 𝒙 𝑘 . When solving for the minimum point numerically using the Newtonian algorithms, a small value of 𝑘 was chosen and increased in order to simulate this limiting process. It was expected that as the value of 𝑘 increased the minimum point the algorithms found converged on the value of 𝑥∗ found by the MATLAB Optimization tool. Theoretical Convergence Under specific circumstances, the various methods exhibit differing rates of convergence.  For an initial point sufficiently close to the minimum, if the hessian is positive definite, a local/global minimum actually exists, the step size at each iteration satisfies the Armijo- Goldstein and Wolfe conditions and f is C3 , then the rate of convergence for Newton's Method is quadratic. (see appendices for univariate proof).  Quasi-Newton methods are known to exhibit superlinear convergence under certain circumstances:  The BFGS Method can be shown to converge to the global minimum at a superlinear rate if the starting hessian is positive definite and the objective function is twice differentiable and convex. (Powell, 1976)  Likewise, the SR1 method, exhibits superlinear convergence under the same conditions. (Nocedal and Wright, 1999).  As an aside, the Steepest Descent Method is known to converge at a linear rate. In addition, it is not adversely affected, as the Newtonian methods are known to be, by horizontal asymptotes where divergence is sometimes observed. The correlation of results to these theoretical rates was examined. It is expected that Newton's Method will converge the quickest. The quasi Newtonian methods are expected to be the next fastest methods to converge followed lastly by the Steepest Descent Method.
  • 9. 9 Results, Conclusions and Recommendations: Results for Unconstrained Case The performance of the chosen methods was analysed for the specified objective function by choosing 10 starting points, 𝒙0 (8 randomly generated, [0 0 0 0] and the known minimum at [1 0 -1 2]). Point 𝑥1 𝑥2 𝑥3 𝑥4 1 0 0 0 0 2 1 0 -1 2 3 0.81 0.91 0.13 0.91 4 3.16 0.49 1.39 2.73 5 9.57 -4.85 8.00 -1.42 6 14.66 -4.04 12.09 3.42 7 1.60 3.35 -16.46 18.81 8 -0.30 -7.47 -17.90 1.86 9 2.23 11.37 15.51 17.57 10 7.52 -10.79 -10.14 17.14 Table 1: List of starting points *Starting points truncated to two decimal places. Results shown here were generated with tolerance values of 0.01 for T and tolerances 1,2 (see appendices for full list of all results). Disregarding data sets that failed to return a value for x* and sets where the number of iterations exceeded the average by >500% the following table of average values was generated: Newton's Method Steepest Descent BFGS (hessian) BFGS (identity) SR1 (hessian) SR1 (identity) x(1) 0.99998 1.01376 1.06218 0.94505 1.00033 0.996722222 x(2) -0.00007 0.01406 0.03947 -0.02716 0.00152 0.002833333 x(3) -1.00019 -0.9882 -0.97035 -1.01809 -0.99881 -0.996133333 x(4) 1.99986 2.00992 2.02388 1.98639 2.00088 2.003922222 f(x*) -167.28 -167.2749 -167.27892 -167.27965 -167.28 -167.2799889 Elapsed Time (s) 0.0040865 0.1965647 0.0100982 0.014235375 0.0119615 0.0125613 Iterations 9 1010 9.2 5.75 24.6 24.1 Elapsed Time per Iteration (s) 0.000454056 0.000194619 0.00109763 0.002475717 0.00048624 0.000521216 Table 2: Summary of algorithms performances (averaged values given robust algorithm implementation) The BFGS and SR1 methods were both ran using 𝐻 (as defined in the introduction) and the 4x4 identity as 𝐻0 inputs. All algorithms converged on the global minimum at 𝒙∗ = [1 0 − 1 2]; 𝑓(𝒙∗) = −167.28 for all staring point except in one instance of the SR1 method initialsed with the identity matrix from point 7. However given the ‘shallowness’ of the function and the tolerances, outlying 𝒙∗ values were occasionally generated. This ‘shallowness’ refers to, based on the results in the appendix, the relatively small magnitude of the gradient vector at many points of the objective function close to
  • 10. 10 the minimum. To this end, the BFGS Method and Steepest Descent Method often stopped only part way to the global minimum. For a given set of parameters it was often the case that the BFGS method regularly converged faster than Newton’s method albeit with less precision. However, for this ideal quadratic objective function Newton’s Method was less prone to ‘getting stuck’ (see appendices data: BFGS (identity) had runs where iteration values were 28905 and 126675) and always took the least time to converge. The accuracy of calculated 𝒙∗ was greatest for Newton’s Method followed by SR1, BFGS and Steepest Descent in decreasing order of accuracy. As expected the quasi-Newtonian methods each converged faster (from these points, by an order of magnitude) than the Steepest Descent method. The average iteration of the Steepest Descent method, however, took less time to compute than the other methods. A cause of this could be that all other methods require operations on a 4x4 matrix such as an inversion of a hessian or an update of the inverse hessian approximation at each iteration, whereas the Steepest Descent Method requires only calculation of a gradient vector. Tolerance Variation (see appendices for tables of results) With tolerance 2 held constant at 0.01 and T held constant at 1: For Newton's Method, the algorithm converged quickest with tolerance 1 values set at 0.0001. When given the identity matrix for the starting iteration the efficiency of the SR1 method appeared to generally increase as the tolerance 1 became stricter. When given the hessian of the program to start with, there was no real discernible pattern as to what effect varying the tolerance 1 had. The BFGS method, when given either the identity or the program's hessian to start with, behaved as intuitively expected and computational time increased with tightened tolerances. Hence the results varied and were not always consistent with our expectations. Further investigation and more data would allow for greater quantitative analysis (see discussion). With regard to the BFGS method, on occasion the value of T was altered to get the algorithm to converge. This issue did not arise when implementing Newton’s Method or with the SR1 Method. As T is only used in the Multi Variable Half Open Section Search portion of the algorithm, this points to a possible incompatibility between this particular algorithm and the BFGS method under certain conditions. Finding an alternative method for doing this task (e.g. using a step size that meets the Armijo-Goldstein and Wolff conditions) would rectify this issue and may increase the robustness of the BFGS method. Summary In summary, each method found the global minimum to the desired accuracy in a vast majority of cases; however, Newton’s method was the most accurate method and took less computational time. The convergence rates were mostly-consistent with theory in that the quasi-Newtonian algorithms generally converged slower than Newton’s method and faster than the Steepest Descent Method. Results for Constrained Case Algorithmic performance given the original objective function constrained by 𝑥 𝑇 𝑥 = 1 was next analysed. Firstly, the constrained case was solved using MATLAB’s Optimization Toolbox:
  • 11. 11 Interior Point Algorithm Active Set Algorithm x(1) -0.025 -0.025 x(2) 0.311 0.311 x(3) -0.789 -0.789 x(4) 0.53 0.53 f(x*) -133.56022058 -133.56022058 Average Iterations 22.2 33.6 Elapsed Time (s) 1.05 2.01 Elapsed Time per Iteration (s) 0.047297 0.059821 Table 3: MATLAB Optimisation Tool Results The following values were obtained using the first five starting points listed in Table 1. To solve this constrained problem using Newton's Method and its variants, the L2 penalty method was used. In applying the algorithms the following results were returned, with the penalty term = 10,000,000 : Newton's Method BFGS (Hessian) BFGS (Identity)* SR1 (Hessian) SR1 (Identity) x(1) -0.02477 N/A -0.02477 -0.02477 -0.02477 x(2) 0.31073 N/A 0.31073 0.31073 0.31073 x(3) -0.78876 N/A -0.78876 -0.78876 -0.78876 x(4) 0.52980 N/A 0.52980 0.52980 0.52980 f(x*) -133.56022894 N/A -133.56022894 -133.5603044 -133.5603044 Elapsed Time (s) 0.2051092 N/A 0.112444 2.104064 1.678844 Iterations 169 N/A 111 2037 659 Elapsed Time per Iteration (s) 0.001214 N/A 0.001013 0.001033 0.002546 Table 4: Newtonian Methods Constrained Problem Results *BFGS (identity) only returned results for the [0 0 0 0] starting point. As such, its results are not averaged. The rest of the results are averaged data returned for the five starting points. The BFGS algorithm’s (see appendices: MATLAB) L2 implementation was especially fragile given this constraint. It did not return any results when given the program's hessian for the staring iteration and only found the minimum once when given the identity as a starting hessian. In that one case where it found a minimum, the BFGS method found the same minimum as the other two algorithms, closely matching MATLAB optimization output. Slightly more robust was the SR1 method. It managed to find the minimum from more starting points than the BFGS method did, although not from all starting points. This slightly increased robustness did come at a cost however, with the SR1 method requiring a very large number of iterations and a longer timeframe in which to operate, making it more computationally expensive to use. The SR1 update does not impose positive definiteness on the updated hessian and this fact may have contributed at the algorithms increased success rate when compared to the BFGS method. In general, the relative reduced robustness of the BFGS and SR1 implementations, given the specified problem, possibly stems from the fact that they are both quasi-Newton Methods derived from the Secant Method. As quasi-Newton Methods are designed to avoid the computationally expensive hessian inversion, the hessian is instead approximated using finite differences of the function gradient, and data is interpolated with iterations (Indiana University, 2012). As Quasi-
  • 12. 12 Newton methods are multivariate generalisations of the secant method, the same problem exists for both methods – namely, that if the initial values used are not close enough to x*, the methods may fail to converge entirely. (Ohio University, 2012) In contrast, Newton's Method performed exceptionally well. It found the minimum from all starting points and did so relatively quickly in terms of both time and number of iterations. Compared to the MATLAB optimization algorithms, all of the Newtonian algorithms took more iterations to converge, as expected. Surprisingly, Newton's Method was able to outperform MATLAB’s optimization algorithms with regard to speed. Hessian Analysis (both programs) For both the constrained and unconstrained cases, the BFGS Method converged in less iterations and more accurately when it was started with the identity matrix as the inverse hessian approximation. It was also more robust when starting with the identity, always converging in the unconstrained case and at least finding the global minimum once in the constrained case. The method was quicker in terms of elapsed time when given the program's hessian to start with. It is therefore recommended that the identity be used as 𝐻0 when minimising a function such as this via the BFGS Method. In the unconstrained case, the difference in choosing 𝐻0 as the starting inverse hessian approximation for the SR1 Method was negligible. There is no significant difference in the time taken, accuracy or average number or iterations required to warrant recommending one particular 𝐻0 over the other. In the constrained case, whilst there may be no difference between the results in terms of accuracy, there is a more pronounced disparity between iterations and time taken. Starting with the function's hessian caused the SR1 Method to take nearly three times as many iterations and almost a 20% longer timeframe. Therefore, using the identity matrix for 𝐻0 for this constrained case is a much better alternative. As was noted earlier, the hessian of this nonlinear function is a constant 4x4 matrix regardless of the algorithm's current 𝒙 𝑘 . This means that it is not very computationally taxing to compute the hessian and it only needs to be computed once. This deals with the problem of the hessian inversion being computationally expensive. Since it is known that the hessian is positive definite, so too is the inverse of the hessian. Thus, the Newton Direction will always be a descent direction. Whilst this is only true for this function (and functions of similar forms), it means that Newton's Method behaved very well in this particular problem. Summary Given an ideal function such as this, that is to say a quadratic and convex nonlinear program, based on the above analysis, Newton's Method outperformed both of its variants (BFGS and SR1) and the Steepest Descent Method. With the problem formulated using the L2 penalty method it is the best algorithm to use for such a program.
  • 13. 13 Discussion: The objective function analysed by this project was particularly suited to minimization via Newton’s Method. For other programs, especially non-convex and non-quadratic ones, the results obtained by this paper may not hold. The BFGS and SR1 method were formulated precisely because the relative effectiveness of Newton’s method diminishes with increasing complexity. In addition, the methodology implemented one class of many available algorithms which specifically used the Golden Section Search in conjunction with a particular open interval search algorithm. A variety of methods could have also been used to determine an appropriate step size to move during each iteration. For example, step sizes satisfying the Armijo-Goldstein and Wolfe conditions would be an appropriate choice. Hence, a whole family of dissimilar results could have been generated from the same starting points using different algorithms which could just as easily be considered Newtonian. The analysis was very origin-centric in that the starting points were all within a relatively similar distance from [0 0 0 0] 𝑇 and hence [1 0 −1 2] 𝑇 and [−0.025 0.311 − 0.789 0.53] 𝑇 , the global minimums. Analysis from starting iterations further from the minimums should yield results consistent with those generated by this report; further investigation is needed. As discussed in the previous section, in such a nonlinear program as this, the inverse of the hessian needs to only be calculated once. As long as it is known the hessian does not change for any point in ℝ4 and inversion of that constant hessian matrix is computationally feasible, the coding for Newton's Method used here could be adjusted to remove the evaluation and inversion of the hessian at each iteration. Such a change would result in less calculations per iteration speeding up the algorithm. The results returned for this particular case would be even better if such an adjustment was made. The drawback of doing so is that the adjusted method could only be applied to cases where the hessian is a constant matrix, severely restricting its applicability. For the constrained case an analytical solve of using the KKT method would have been possible albeit complicated and not solvable by simple linear algebra operations due to the quadratic nature of the constraint. If this project had gone down this path instead of utilizing MATLAB’s optimization tools an exact value of 𝒙∗ could have been used as a point of reference. The shortcomings of the BFGS algorithm’s implementation of the L2 penalty method requires further analysis and perhaps troubleshooting. Finally, the analysis of varying the tolerances of the algorithms used by this report could have been furthered with more systematically obtained data. It would be expected that for a general case decreasing the ‘strictness’ of the major tolerance would decrease computational time taken (this was not always the case: see results). By way of contrast, varying the tolerances of the open interval search and golden section search would have been expected to exhibit different effects for different starting points and for different problems. To optimize an algorithm a balance must be struck between accuracy and time taken to generate an appropriate step length. Hence, ideal tolerance values exist for different algorithm, different starting points and for each iteration. Further investigation may reveal common properties of these ideal tolerances given the algorithms used.
  • 14. 14 References: Arora, J. (2011). Introduction to Optimum Design [electronic resource]. p.42 Burlington Elsevier Science. Chong, E. Zak, S. (2008). An Introduction To Optimization 3rd Edition. pp. 155-156. John Wiley and Sons. Conn, A., Gould, N. and Toint, P. (1991). "Convergence of quasi-Newton matrices generated by the symmetric rank one update". Mathematical Programming (Springer Berlin/ Heidelberg) 50 (1): pp. 177–195. Farzin, K. and Wah, L. (2012). On the performance of a new symmetric rank-one method with restart for solving unconstrained optimization problems. Computers and Mathematics with Applications, Volume 64, Issue 6, September 2012, pp. 2141-2152, http://www.sciencedirect.com/science/article/pii/S089812211200449X Hauser, K. (2012). Lecture 6: Multivariate Newton's Method and Quasi-Newton Methods. p. 5. Available online: http://homes.soic.indiana.edu/classes/spring2012/csci/b553- hauserk/newtons_method.pdf Indiana University. (2012). Lecture 6: Multivariate Newton's Method and Quasi-Newton methods. Available online: http://homes.soic.indiana.edu/classes/spring2012/csci/b553- hauserk/newtons_method.pdf Indian Institute of Technology. (2002). Convergence of Newton-Raphson method. Available online: http://ecourses.vtu.ac.in/nptel/courses/Webcourse-contents/IIT- KANPUR/Numerical%20Analysis/numerical-analysis/Rathish-kumar/ratish-1/f3node7.html Nocedal, J. and Wright, S.J. (1999). Numerical Optimization, pp. 220, 144. Ohio University (2012). Lecture 6: Secant Methods. Available online: http://www.math.ohiou.edu/courses/math3600/lecture6.pdf Powell, M. (1976). 'Superlinear convergence Some global convergence properties of a variable metric algorithm for minimization without exact line searches', Nonlinear Programming, Vol 4, Society for Industrial and Applied Mathematics, p. 53
  • 15. 15 Appendices: Proofs Positive Eigenvalues imply invertiblity of matrix: 𝐷𝑒𝑓𝑖𝑛𝑒 𝑎 𝑝𝑜𝑙𝑦𝑛𝑜𝑚𝑖𝑎𝑙 𝑝(𝑡) = (𝑡 − 𝜆1)(𝑡 − 𝜆2) … (𝑡 − 𝜆 𝑛) 𝑇ℎ𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑡𝑒𝑟𝑚: (−1) 𝑛 𝜆1 𝜆2 … 𝜆 𝑛 𝐿𝑒𝑡 𝑝(𝑡) = det(𝑡𝐼 − 𝐴) 𝑊ℎ𝑒𝑟𝑒 𝐴 𝑖𝑠 𝑎 𝑠𝑞𝑢𝑎𝑟𝑒 𝑚𝑎𝑡𝑟𝑖𝑥: 𝑝(0) = 𝑑𝑒𝑡(−𝐴) = (−1) 𝑛 det(𝐴) det(𝐴) = 𝜆1 𝜆2 … 𝜆 𝑛 if 𝜆1, 𝜆2,, … , 𝜆 𝑛 > 0 det(𝐴) ≠ 0 𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝐴 𝑖𝑠 𝑖𝑛𝑣𝑒𝑟𝑡𝑖𝑏𝑒 Newton’s method Converges Quadratically for Univariate Case: 𝐿𝑒𝑡 𝑥𝑖 𝑏𝑒 𝑎 𝑟𝑜𝑜𝑡 𝑜𝑓 𝑓(𝑥) = 0 𝐿𝑒𝑡 𝑥 𝑛 𝑏𝑒 𝑎𝑛 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑥𝑖: |𝑥𝑖 − 𝑥 𝑛| = 𝜀 < 1 𝐵𝑦 𝑇𝑎𝑦𝑙𝑜𝑟 𝑠𝑒𝑟𝑖𝑒𝑠 𝑒𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛: 0 = 𝑓(𝑥𝑖) = 𝑓(𝑥 𝑛 + 𝜀) = 𝑓(𝑥 𝑛) + 𝑓′(𝑥 𝑛)( 𝑥𝑖 − 𝑥 𝑛) + 0.5𝑓′′(𝜉)( 𝑥𝑖 − 𝑥 𝑛)2 𝐹𝑜𝑟 𝑠𝑜𝑚𝑒 𝜉 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑥𝑖, 𝑥 𝑛 𝐹𝑜𝑟 𝑁𝑒𝑤𝑡𝑜𝑛′ 𝑠 𝑚𝑒𝑡ℎ𝑜𝑑: −𝑓′(𝑥 𝑛)(𝑥 𝑛+1 − 𝑥 𝑛) = 𝑓(𝑥 𝑛) 𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒: 0 = 𝑓′(𝑥 𝑛)(𝑥𝑖 − 𝑥 𝑛+1) + 0.5𝑓′′(𝜉)( 𝑥𝑖 − 𝑥 𝑛)2 ( 𝑥𝑖 − 𝑥 𝑛), (𝑥𝑖 − 𝑥 𝑛+1) 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑒𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚𝑠 𝑓𝑜𝑟 𝑠𝑢𝑐𝑒𝑠𝑠𝑖𝑣𝑒 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 (𝑥𝑖 − 𝑥 𝑛+1) ∝ ( 𝑥𝑖 − 𝑥 𝑛)2 Q.E.D. (Indian Institute of Technology, 2002)
  • 16. 16 Data All tolerance values 0.01: Newton's Method: x* f(x*) Elapsed Time (s) Iterations 1.0000 -0.0000 -1.0000 2.0000 -167.28 0.009944 10 1 0 -1 2 -167.28 0.000134 0 1.0000 -0.0001 -1.0001 2.0001 -167.28 0.004207 9 0.9999 -0.0000 -1.0002 2.0000 -167.28 0.001905 9 1.0002 -0.0001 -1.0002 1.9999 -167.28 0.00208 10 0.9999 0.0000 -1.0001 2.0000 -167.28 0.005061 11 1.0000 0.0001 -1.0004 1.9995 -167.28 0.003064 10 1.0000 0.0001 -0.9999 2.0000 -167.28 0.005054 11 0.9999 -0.0008 -1.0011 1.9990 -167.28 0.004321 9 0.9999 0.0001 -0.9999 2.0001 -167.28 0.005095 11 Steepest Descent: x* f(x*) Elapsed Time (s) Iterations 0.2126 -0.4083 -1.2812 1.7836 -167.2769 0.00655 34 1 0 -1 2 -167.28 0.000116 0 1.1539 0.1993 -0.8189 2.1600 -167.2789 0.038846 106 2.1659 0.6656 -0.5248 2.3718 -167.2728 0.170743 798 1.7844 0.3563 -0.7776 2.1590 -167.2768 0.077666 418 2.1659 0.6648 -0.5257 2.3709 -167.2728 0.481597 2600 -0.1767 -0.6709 -1.4786 1.6256 -167.2727 0.274945 1399 -0.1749 -0.6699 -1.4779 1.6262 -167.2727 0.273263 1332 2.1767 0.6709 -0.5213 2.3744 -167.2727 0.303484 1575 -0.1702 -0.6672 -1.4760 1.6277 -167.2727 0.338437 1838 BFGS (hessian for starting iterate): x* f(x*) Elapsed Time (s) Iterations 0.1915 -0.4173 -1.2823 1.7865 -167.2767 0.004697 6 1 0 -1 2 -167.28 0.000144 0 1.2740 0.1562 -0.8887 2.0870 -167.2796 0.002729 7 0.9981 0.0014 -0.9982 2.0018 -167.28 0.004855 10 2.1585 0.6613 -0.5280 2.3692 -167.2729 0.006476 7 1.0040 -0.0019 -1.0030 1.9967 -167.28 0.02334 11 0.9987 0.0005 -0.9991 2.0011 -167.28 0.016815 14 0.9946 -0.0028 -1.0020 1.9983 -167.28 0.016214 13 1.0029 -0.0024 -1.0021 1.9982 -167.28 0.010847 10 0.9995 -0.0003 -1.0001 2.0000 -167.28 0.014865 14
  • 17. 17 BFGS (identity for starting iterate): x* f(x*) Elapsed Time (s) Iterations 0.2028 -0.4142 -1.2815 1.7865 -167.2768 0.001249 3 1 0 -1 2 -167.28 0.000297 0 1.2434 0.1361 -0.9034 2.0753 -167.2797 0.005207 5 1.0024 0.0016 -0.9992 2.0004 -167.28 0.003858 7 1.0118 0.0074 -0.9948 2.0039 -167.28 3.954961 28905 0.9991 -0.0006 -1.0005 1.9996 -167.28 0.079164 9 0.9976 -0.0041 -1.0046 1.9955 -167.28 0.004674 7 0.9953 -0.0024 -1.0012 1.9992 -167.28 0.008941 7 1.0007 0.0009 -0.9994 2.0004 -167.28 0.010493 8 0.9974 0.0037 -0.9963 2.0031 -167.28 16.554056 126675 SR1 (hessian for starting iterate): x* f(x*) Elapsed Time (s) Iterations 1.0001 0.0001 -0.9999 1.9999 -167.28 0.011665 14 1 0 -1 2 -167.28 0.000299 0 1.0000 0.0000 -1.0000 2.0000 -167.28 0.007054 21 0.9985 -0.0008 -1.0003 2.0000 -167.28 0.013603 20 1.0173 0.0085 -0.9961 2.0015 -167.28 0.015117 19 0.9996 0.0115 -0.9892 2.0093 -167.28 0.007496 19 1.0000 0.0000 -1.0000 2.0000 -167.28 0.012073 37 0.9812 -0.0119 -1.0090 1.9927 -167.28 0.028734 52 1.0066 0.0056 -0.9956 2.0038 -167.28 0.013331 39 1.0000 0.0022 -0.9980 2.0016 -167.28 0.010243 25 SR1 (identity for starting iterate): x* f(x*) Elapsed Time (s) Iterations 1.0014 -0.0007 -0.9997 2.0009 -167.28 0.023934 17 1 0 -1 2 -167.28 0.000314 0 1.0004 0.0001 -1.0001 1.9997 -167.28 0.015986 25 1.0053 0.0051 -0.9963 2.0025 -167.28 0.009545 30 0.9998 -0.0005 -1.0002 2.0000 -167.28 0.012679 37 1.0000 -0.0000 -1.0000 2.0000 -167.28 0.01236 34 0.9723 0.0241 -0.9692 2.0305 -167.2799 0.003297 9 NaN NaN NaN NaN NaN 0.006176 7 0.9991 -0.0003 -1.0002 1.9998 -167.28 0.013616 41 0.9922 -0.0023 -0.9995 2.0019 -167.28 0.027706 41
  • 18. 18 Newton's Method x0 Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations [0 0 0 0] 0.1 0.01 1 [ NaN -Inf -Inf -Inf] NaN 0.3685380 353 [0 0 0 0] 0.01 0.01 1 [1 0 -1 2] -167.279999996088 0.0294290 4 [0 0 0 0] 0.001 0.01 1 [1 0 -1 2] -167.279999999938 0.0160210 3 [0 0 0 0] 0.0001 0.01 1 [1 0 -1 2] -167.279999999976 0.0058770 2 [0 0 0 0] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0072560 2 [0 0 0 0] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0104300 3 [1 0 -1 2] 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0003140 0 [1 0 -1 2] 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0001230 0 [1 0 -1 2] 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000790 0 [1 0 -1 2] 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000750 0 [1 0 -1 2] 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000740 0 [1 0 -1 2] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000720 0 [0.81 0.91 0.13 0.91] 0.1 0.01 1 [NaN NaN NaN NaN] NaN 0.3632520 353 [0.81 0.91 0.13 0.91] 0.01 0.01 1 [1 0 -1 2] -167.279999998378 0.0060940 4 [0.81 0.91 0.13 0.91] 0.001 0.01 1 [1 0 -1 2] -167.279999999974 0.0052550 3 [0.81 0.91 0.13 0.91] 0.0001 0.01 1 [1 0 -1 2] -167.279999999990 0.0043530 2 [0.81 0.91 0.13 0.91] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0052190 2 [0.81 0.91 0.13 0.91] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0058110 3 [3.16 0.49 1.39 2.73] 0.1 0.01 1 [NaN NaN NaN NaN] NaN 0.3621590 353 [3.16 0.49 1.39 2.73] 0.01 0.01 1 [1 0 -1 2] -167.279999997557 0.0058490 4 [3.16 0.49 1.39 2.73] 0.001 0.01 1 [1 0 -1 2] -167.279999999961 0.0052710 3 [3.16 0.49 1.39 2.73] 0.0001 0.01 1 [1 0 -1 2] -167.279999999985 0.0043780 2
  • 19. 19 [3.16 0.49 1.39 2.73] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0051530 2 [3.16 0.49 1.39 2.73] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0116540 2 [9.57 -4.85 -8.00 -1.42] 0.1 0.01 1 [NaN NaN NaN NaN] NaN 0.3284300 352 [9.57 -4.85 -8.00 -1.42] 0.01 0.01 1 [1 0 -1 2] -167.279999995272 0.0051650 4 [9.57 -4.85 -8.00 -1.42] 0.001 0.01 1 [1 0 -1 2] -167.279999999925 0.0045340 3 [9.57 -4.85 -8.00 -1.42] 0.0001 0.01 1 [1 0 -1 2] -167.279999999971 0.0037930 2 [9.57 -4.85 -8.00 -1.42] 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0045140 2 [9.57 -4.85 -8.00 -1.42] 0.000001 0.01 1 [1 0 -1 2] -167.279999999999 0.0072210 3
  • 20. 20 SR1 Method x0 Starting Hessian Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations [0 0 0 0] Identity 0.1 0.01 1 [1 0 -1 2] -167.279999128240 0.1688580 89 [0 0 0 0] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999575407 0.0692100 39 [0 0 0 0] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999999853 0.0587820 26 [0 0 0 0] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999999 0.1066900 38 [0 0 0 0] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999973 0.0380570 10 [0 0 0 0] Identity 0.000001 0.01 1 [NaN NaN NaN NaN] NaN 0.0307350 9 [0 0 0 0] Program's Hessian 0.1 0.01 1 [1.42 0.12 -0.94 2.05] -167.278456189735 0.0620180 31 [0 0 0 0] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999919554 0.0949530 57 [0 0 0 0] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999987586 0.0400010 21 [0 0 0 0] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.279999999990 0.0298450 13 [0 0 0 0] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.1016380 38 [0 0 0 0] Program's Hessian 0.000001 0.01 1 [1 0 -1 2] -167.279999999999 0.1077790 37 [1 0 -1 2] Identity 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0001780 0 [1 0 -1 2] Identity 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0000800 0 [1 0 -1 2] Identity 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000720 0 [1 0 -1 2] Identity 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000700 0 [1 0 -1 2] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000700 0 [1 0 -1 2] Identity 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000690 0 [1 0 -1 2] Program's Hessian 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0000880 0 [1 0 -1 2] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0000760 0 [1 0 -1 2] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000750 0 [1 0 -1 2] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000710 0 [1 0 -1 2] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000710 0 [1 0 -1 2] Program's Hessian 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000690 0 [0.81 0.91 0.13 0.91] Identity 0.1 0.01 1 [1.40 0.52 -0.52 2.43] -167.272369332640 0.1391750 94 [0.81 0.91 0.13 0.91] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999927619 0.0512890 30
  • 21. 21 [0.81 0.91 0.13 0.91] Identity 0.001 0.01 1 [NaN NaN NaN NaN] NaN 0.0337580 17 [0.81 0.91 0.13 0.91] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999999 0.0577020 26 [0.81 0.91 0.13 0.91] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999955 0.0280640 10 [0.81 0.91 0.13 0.91] Identity 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0415910 13 [0.81 0.91 0.13 0.91] Program's Hessian 0.1 0.01 1 [1.03 .01 -0.99 2.00] -167.279983904904 0.2144600 156 [0.81 0.91 0.13 0.91] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999620384 0.0843110 46 [0.81 0.91 0.13 0.91] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999999547 0.0196120 10 [0.81 0.91 0.13 0.91] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.279999999991 0.0290150 13 [0.81 0.91 0.13 0.91] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.279999999997 0.0357070 13 [0.81 0.91 0.13 0.91] Program's Hessian 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0538590 17 [3.16 0.49 1.39 2.73] Identity 0.1 0.01 1 [1 0 -1 2] -167.279998954778 0.1396510 89 [3.16 0.49 1.39 2.73] Identity 0.01 0.01 1 [NaN, NaN, NaN, NaN] NaN 0.0152440 9 [3.16 0.49 1.39 2.73] Identity 0.001 0.01 1 [NaN, NaN, NaN, NaN] NaN 0.0126580 6 [3.16 0.49 1.39 2.73] Identity 0.0001 0.01 1 [NaN, NaN, NaN, NaN] NaN 0.0322300 13 [3.16 0.49 1.39 2.73] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999990 0.0244790 9 [3.16 0.49 1.39 2.73] Identity 0.000001 0.01 1 [NaN, NaN, NaN, NaN] NaN 0.0194650 7 [3.16 0.49 1.39 2.73] Program's Hessian 0.1 0.01 1 [1 0 -1 2] -167.279999971364 0.0673550 41 [3.16 0.49 1.39 2.73] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999999102 0.0518930 30 [3.16 0.49 1.39 2.73] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999999986 0.0547630 29 [3.16 0.49 1.39 2.73] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.279999997643 0.0263000 9 [3.16 0.49 1.39 2.73] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0633330 25 [3.16 0.49 1.39 2.73] Program's Hessian 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0312680 10 [9.57 -4.85 -8.00 -1.42] Identity 0.1 0.01 1 [0.35 -0.05 -0.92 2.12] -167.272798242689 0.0848760 72 [9.57 -4.85 -8.00 -1.42] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999999861 0.0446150 31 [9.57 -4.85 -8.00 -1.42] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999953795 0.0294250 17 [9.57 -4.85 -8.00 -1.42] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999996731 0.0385920 15 [9.57 -4.85 -8.00 -1.42] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999715 0.0541070 22 [9.57 -4.85 -8.00 -1.42] Identity 0.000001 0.01 1 [1 0 -1 2] -167.279999999999 0.0781950 31 [9.57 -4.85 -8.00 -1.42] Program's Hessian 0.1 0.01 1 [1.03 0.02 -0.99 2.00] -167.279959843584 0.0323530 23
  • 22. 22 [9.57 -4.85 -8.00 -1.42] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999991203 0.0652370 41 [9.57 -4.85 -8.00 -1.42] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999876384 0.0174150 9 [9.57 -4.85 -8.00 -1.42] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.279999999999 0.0651250 29 [9.57 -4.85 -8.00 -1.42] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.279999999997 0.0410020 13 [9.57 -4.85 -8.00 -1.42] Program's Hessian 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0834600 25
  • 23. 23 Steepest Descent Method x0 Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations [0 0 0 0] 0.1 0.01 1 [-Inf NaN NaN NaN] NaN 0.8100010 377 [0 0 0 0] 0.01 0.01 1 [0.22 -0.41 -1.28 1.78] -167.276916941098 0.2438320 196 [0 0 0 0] 0.001 0.01 1 [0.89 -0.06 -1.05 1.96] -167.279933780104 8.3233250 6416 [0 0 0 0] 0.0001 0.01 1 [0.99 -0.01 -1 2] -167.279999267465 3.8602080 2417 [0 0 0 0] 0.00001 0.01 1 [1 0 -1 2] -167.279999997389 22.7155470 12213 [0 0 0 0] 0.000001 0.01 1 [1 0 -1 2] -167.279999999942 18.2888800 8432 [1 0 -1 2] 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0011080 0 [1 0 -1 2] 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0001110 0 [1 0 -1 2] 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000690 0 [1 0 -1 2] 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000660 0 [1 0 -1 2] 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000660 0 [1 0 -1 2] 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000630 0 [0.81 0.91 0.13 0.91] 0.1 0.01 1 [-Inf NaN NaN NaN] NaN 0.3082390 377 [0.81 0.91 0.13 0.91] 0.01 0.01 1 [1.16 0.19 -0.83 2.15] -167.279100661797 1.0436020 984 [0.81 0.91 0.13 0.91] 0.001 0.01 1 [1.12 0.07 -0.95 2.04] -167.279928943921 1.9716170 1514 [0.81 0.91 0.13 0.91] 0.0001 0.01 1 [1 0 -1 2] -167.279999266127 2.4358950 1833 [0.81 0.91 0.13 0.91] 0.00001 0.01 1 [1 0 -1 2] -167.279999999681 1.9436270 999 [0.81 0.91 0.13 0.91] 0.000001 0.01 1 [1 0 -1 2] -167.279999999926 5.8643590 2553 [3.16 0.49 1.39 2.73] 0.1 0.01 1 [-Inf NaN NaN NaN] NaN 0.3068300 377 [3.16 0.49 1.39 2.73] 0.01 0.01 1 [2.02 0.58 -0.59 2.32] -167.274489679393 6.4725080 5980 [3.16 0.49 1.39 2.73] 0.001 0.01 1 [1.11 0.061 -0.96 2.03] -167.279938709635 17.6293290 15514 [3.16 0.49 1.39 2.73] 0.0001 0.01 1 [1.01 0.01 -1 2] -167.279999278511 8.2242300 5651 [3.16 0.49 1.39 2.73] 0.00001 0.01 1 [1 0 -1 2] -167.279999997974 1.9071060 1631 [3.16 0.49 1.39 2.73] 0.000001 0.01 1 [1 0 -1 2] -167.279999999979 4.4262350 1784 [9.57 -4.85 -8.00 -1.42] 0.1 0.01 1 [-Inf Inf NaN NaN] NaN 0.3026670 376 [9.57 -4.85 -8.00 -1.42] 0.01 0.01 1 [1.75 0.35 -0.77 2.16] -167.277168000179 3.8968330 3948 [9.57 -4.85 -8.00 -1.42] 0.001 0.01 1 [1.12 0.07 -0.95 2.04] -167.279928743320 5.0956410 3914
  • 24. 24 [9.57 -4.85 -8.00 -1.42] 0.0001 0.01 1 [1 0 -1 2] -167.279999264172 3.6328800 2661 [9.57 -4.85 -8.00 -1.42] 0.00001 0.01 1 [1 0 -1 2] -167.279999998846 25.4060450 16377 [9.57 -4.85 -8.00 -1.42] 0.000001 0.01 1 [1 0 -1 2] -167.279999999938 4.1823240 C
  • 25. 25 BFGS Method x0 Starting Hessian Tolerance1 Tolerance2 T x* fmin Elapsed Time (seconds) Iterations [0 0 0 0] Identity 0.1 0.01 1 [0.22 -0.41 -1.28 1.79] -167.276910497404 0.0077950 5 [0 0 0 0] Identity 0.01 0.01 1 [0.61 -0.31 -1.26 1.78] -167.278342371136 0.0092760 5 [0 0 0 0] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999999944 0.0167320 6 [0 0 0 0] Identity 0.0001 0.01 0.0001 [1 0 -1 2] -167.279999999619 0.0260180 7 [0 0 0 0] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999982 0.0232830 6 [0 0 0 0] Identity 0.000001 0.01 1 [1 0 -1 2] -167.279999999999 0.0269040 6 [0 0 0 0] Program's Hessian 0.1 0.01 1 [0.19 -0.41 -1.28 1.78] -167.276642869312 0.0094350 7 [0 0 0 0] Program's Hessian 0.01 0.01 1 [0.19 -0.42 -1.28 1.79] -167.276717196021 0.0057380 4 [0 0 0 0] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999992937 0.0238710 9 [0 0 0 0] Program's Hessian 0.0001 0.01 0.00000001 [NaN, NaN, NaN, NaN] NaN 0.0415680 13 [0 0 0 0] Program's Hessian 0.00001 0.01 0.1 [1 0 -1 2] -167.279999999938 0.0834650 22 [0 0 0 0] Program's Hessian 0.000001 0.01 0.00000001 [NaN NaN NaN NaN] NaN 1.4213690 2330 [1 0 -1 2] Identity 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0002520 0 [1 0 -1 2] Identity 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0001150 0 [1 0 -1 2] Identity 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000720 0 [1 0 -1 2] Identity 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000700 0 [1 0 -1 2] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000680 0 [1 0 -1 2] Identity 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000670 0 [1 0 -1 2] Program's Hessian 0.1 0.01 1 [1 0 -1 2] -167.280000000000 0.0003180 0 [1 0 -1 2] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.280000000000 0.0001160 0 [1 0 -1 2] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000720 0 [1 0 -1 2] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000700 0 [1 0 -1 2] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000680 0 [1 0 -1 2] Program's Hessian 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 0.0000680 0 [0.81 0.91 0.13 0.91] Identity 0.1 0.01 1 [1.05 0.30 -0.69 2.29] -167.275083269658 0.0100910 4 [0.81 0.91 0.13 0.91] Identity 0.01 0.01 1 [1.24 0.14 -0.90 2.08] -167.279686990669 0.0090960 5
  • 26. 26 [0.81 0.91 0.13 0.91] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999999376 0.0185580 7 [0.81 0.91 0.13 0.91] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999860 0.0183590 6 [0.81 0.91 0.13 0.91] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0260800 8 [0.81 0.91 0.13 0.91] Identity 0.000001 0.01 0.01 [1 0 -1 2] -167.279999999999 0.1026530 43 [0.81 0.91 0.13 0.91] Program's Hessian 0.1 0.01 1 [1.02 0.32 -0.66 2.33] -167.273452580260 0.0098400 7 [0.81 0.91 0.13 0.91] Program's Hessian 0.01 0.01 1 [1.28 0.16 -0.89 2.09] -167.279598023217 0.0122180 7 [0.81 0.91 0.13 0.91] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999992578 0.0266280 10 [0.81 0.91 0.13 0.91] Program's Hessian 0.0001 0.01 0.000001 [NaN NaN NaN NaN] NaN 1.0398700 1601 [0.81 0.91 0.13 0.91] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0368260 10 [0.81 0.91 0.13 0.91] Program's Hessian 0.000001 0.01 0.000000001 [NaN NaN NaN NaN] NaN 0.0878040 78 [3.16 0.49 1.39 2.73] Identity 0.1 0.01 1 [ 2.95 0.95 -0.37 2.47] -167.260921728458 0.0148570 7 [3.16 0.49 1.39 2.73] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999971584 0.0201850 8 [3.16 0.49 1.39 2.73] Identity 0.001 0.01 1 [1 0 -1 2] -167.279999998218 0.0196480 7 [3.16 0.49 1.39 2.73] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999995 0.0273770 9 [3.16 0.49 1.39 2.73] Identity 0.00001 0.01 1 [1 0 -1 2] -167.279999999995 0.0163080 5 [3.16 0.49 1.39 2.73] Identity 0.000001 0.01 0.0001 [1 0 -1 2] -167.279999999999 0.0242500 5 [3.16 0.49 1.39 2.73] Program's Hessian 0.1 0.01 1 [3 1.49 0.19 2.99] -167.244594503201 0.0097640 7 [3.16 0.49 1.39 2.73] Program's Hessian 0.01 0.01 1 [1 0 -1 2] -167.279999963524 0.0209800 10 [3.16 0.49 1.39 2.73] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999915565 0.0278820 10 [3.16 0.49 1.39 2.73] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.279999999946 0.0292690 9 [3.16 0.49 1.39 2.73] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.279999999999 0.0413620 13 [3.16 0.49 1.39 2.73] Program's Hessian 0.000001 0.01 1 [NaN NaN NaN NaN] NaN 32.2398260 11663 [9.57 -4.85 -8.00 -1.42] Identity 0.1 0.01 1 [ 2.00 0.58 -0.58 2.33] -167.274520285135 0.0099090 6 [9.57 -4.85 -8.00 -1.42] Identity 0.01 0.01 1 [1 0 -1 2] -167.279999526260 0.0190540 8 [9.57 -4.85 -8.00 -1.42] Identity 0.001 0.01 0.1 [1 0 -1 2] -167.279999999709 0.0216050 7 [9.57 -4.85 -8.00 -1.42] Identity 0.0001 0.01 1 [1 0 -1 2] -167.279999999999 0.0256380 8 [9.57 -4.85 -8.00 -1.42] Identity 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0312400 9 [9.57 -4.85 -8.00 -1.42] Identity 0.000001 0.01 0.1 [1 0 -1 2] -167.279999999999 0.0293100 7 [9.57 -4.85 -8.00 -1.42] Program's Hessian 0.1 0.01 1 [2.16 0.66 -0.53 2.37] -167.272880184292 0.0146060 10
  • 27. 27 [9.57 -4.85 -8.00 -1.42] Program's Hessian 0.01 0.01 1 [2.16 0.66 -0.53 2.37] -167.272880718121 0.0105100 6 [9.57 -4.85 -8.00 -1.42] Program's Hessian 0.001 0.01 1 [1 0 -1 2] -167.279999999362 0.0412020 13 [9.57 -4.85 -8.00 -1.42] Program's Hessian 0.0001 0.01 1 [1 0 -1 2] -167.279999999987 0.0335710 11 [9.57 -4.85 -8.00 -1.42] Program's Hessian 0.00001 0.01 1 [1 0 -1 2] -167.280000000000 0.0322950 10 [9.57 -4.85 -8.00 -1.42] Program's Hessian 0.000001 0.01 1 [1 0 -1 2] -167.280000000000 84.1332250 30847
  • 28. 28 Newton's Method For k = 10,000,000 All tolerances at 0.01 T set to 1 (adjusted as necessary to get convergence) Starting Point xmin fmin Iterations Elapsed Time (s) [0 0 0 0] [-0.024771532289335 0.310732926395592 -0.788760177923239 0.529801104907597] -133.560228943395 39 0.03467 [1 0 -1 2] [-0.024771837343072 0.310733028722944 -0.788760301760888 0.529800846242057] -133.560228943387 143 0.253257 [0.81 0.91 0.13 0.91] [-0.024771522265604 0.310732911352949 -0.788760170844310 0.529801124775638] -133.560228943395 173 0.183464 [3.16 0.49 1.39 2.73] [-0.024771527241936 0.310732906803061 -0.788760174684506 0.529801121482055] -133.560228943395 240 0.243349 [9.57 -4.85 -8.00 -1.42] [-0.024771536889997 0.310732912478593 -0.788760169890084 0.529801124846407] -133.560228943395 250 0.310806 BFGS ID [0 0 0 0] [-0.024770871284853 0.310733636611044 -0.788758902260001 0.529802618520184] -133.560228943197 111 0.112444 [1 0 -1 2] Refused to return an answer [0.81 0.91 0.13 0.91] Refused to return an answer [3.16 0.49 1.39 2.73] Refused to return an answer [9.57 -4.85 -8.00 -1.42] Refused to return an answer BFGS Hess [0 0 0 0] Refused to return an answer [1 0 -1 2] Refused to return an answer [0.81 0.91 0.13 0.91] Refused to return an answer [3.16 0.49 1.39 2.73] Refused to return an answer [9.57 -4.85 -8.00 -1.42] Refused to return an answer SR1 ID [0 0 0 0] [-0.024771567761620 0.310733441325913 -0.788761623923610 0.529802126005156] -133.560304375650 378 1.013921 [1 0 -1 2] [-0.024771544747263 0.310733182847815 -0.788761385804056 0.529802633470826] -133.560304375634 819 2.392423 [0.81 0.91 0.13 0.91] [-0.024777470366075 0.310725229879482 -0.788760807092803 0.529807865053035] -133.560304368405 619 1.81805 [3.16 0.49 1.39 2.73] Refused to return an answer [9.57 -4.85 -8.00 -1.42] [-0.024770231258035 0.310730257079946 -0.788765912375838 0.529797671405605] -133.560304373593 821 1.490982
  • 29. 29 SR1 Hess [0 0 0 0] [-0.024773909306389 0.310738355421821 -0.788757020809805 0.529805987358458] -133.560304372956 1976 1.343549 [1 0 -1 2] [-0.024775941428827 0.310732519348041 -0.788760367888054 0.529804332231798] -133.560304374565 1734 2.006957 [0.81 0.91 0.13 0.91] [-0.024769720656432 0.310730507404563 -0.788765053684288 0.529798826909580] -133.560304374229 2254 2.432939 [3.16 0.49 1.39 2.73] [-0.024767341442569 0.310733094291290 -0.788764366771071 0.529798443639664] -133.560304374047 2184 2.63281 [9.57 -4.85 -8.00 -1.42] Refused to return an answer
  • 30. 30 MATLAB Objective Function File: f.m function val = f(x) % This m-file is the objective function for our unconstrained % nonlinear program. % Definitions. c = [5.04; -59.4; 146.4; -96.6]; hessian = [ 0.16 -1.2 2.4 -1.4; -1.2 12 -27 16.8; 2.4 -27 64.8 -42; -1.4 16.8 -42 28]; xs = [x(1); x(2); x(3); x(4)]; val = c' * xs + 0.5 * xs' * hessian * xs; end Gradient Function File: gradf.m function grad = gradf(x) % This is the gradient function of our objective function, f.m. % Definitions. c = [5.04 -59.4 146.4 -96.6]; hessian = [ 0.16 -1.2 2.4 -1.4; -1.2 12 -27 16.8; 2.4 -27 64.8 -42; -1.4 16.8 -42 28]; xs = [x(1) x(2) x(3) x(4)]; grad = c + xs * hessian; end Hessian Function File: hessf.m function hessian = hessf(x) % The hessian of our objective function, f.m. % Note the the hessian is independent of x. hessian = [ 0.16 -1.2 2.4 -1.4; -1.2 12 -27 16.8; 2.4 -27 64.8 -42; -1.4 16.8 -42 28]; end Objective Function File, Penalty Method: fpen.m function val = f(x,k) % The objective function implemented with the L2 penalty method.
  • 31. 31 % Evaluate with increasing values of k to simulate evaluating the limit % as k approaches infinity. % Ensure that k is the same for fpen, gradfpen and hessfpen. % Definitions k = 10000000; c = [5.04 -59.4 146.4 -96.6]; hessian = [ 0.16 -1.2 2.4 -1.4; -1.2 12 -27 16.8; 2.4 -27 64.8 -42; -1.4 16.8 -42 28]; xs = [x(1); x(2); x(3); x(4)]; constraint = 1-(x(1))^2-(x(2))^2-(x(3))^2-(x(4))^2; % Function val = c * xs + 0.5 * xs' * hessian * xs + (k/2)*(constraint)^2; end Gradient Function File, Penalty Method: gradfpen.m function grad = gradf(x,k) % The gradient function implemented with the L2 penalty method. % Evaluate with increasing values of k to simulate evaluating the limit % as k approaches infinity. % Ensure that k is the same for fpen, gradfpen and hessfpen. %Definitions k = 10000000; g = ((x(1))^2+(x(2))^2+(x(3))^2+(x(4))^2-1); c = [5.04 -59.4 146.4 -96.6]; hessian = [ 0.16 -1.2 2.4 -1.4; -1.2 12 -27 16.8; 2.4 -27 64.8 -42; -1.4 16.8 -42 28]; xs = [x(1) x(2) x(3) x(4)]; % Function grad =c + xs * hessian + [2*k*x(1)*g; 2*k*x(2)*g; 2*k*x(3)*g; 2*k*x(4)*g]'; end Hessian Function File, Penalty Method: hessfpen.m function hessian = hessf(x,k) % This is the hessian of the L2 penalty method for our program. % Evaluate with increasing values of k to simulate evaluating the limit % as k approaches infinity. % Ensure that k is the same for fpen, gradfpen and hessfpen %Definitions k = 10000000; h11 = 2*k*(3*(x(1))^2 + (x(2))^2 + (x(3))^2 + (x(4))^2-1);
  • 32. 32 h22 = 2*k*((x(1))^2 + 3*(x(2))^2 + (x(3))^2 + (x(4))^2-1); h33 = 2*k*((x(1))^2 + (x(2))^2 + 3*(x(3))^2 + (x(4))^2-1); h44 = 2*k*((x(1))^2 + (x(2))^2 + (x(3))^2 + 3*(x(4))^2-1); unchessian = [ 0.16 -1.2 2.4 -1.4; -1.2 12 -27 16.8; 2.4 -27 64.8 -42; -1.4 16.8 -42 28]; % Function hessian = unchessian + [h11 4*k*x(1)*x(2) 4*k*x(1)*x(3) 4*k*x(1)*x(4);... 4*k*x(1)*x(2) h22 4*k*x(2)*x(3) 4*k*x(4)*x(2);... 4*k*x(1)*x(3) 4*k*x(3)*x(2) h33 4*k*x(3)*x(4);... 4*k*x(1)*x(4) 4*k*x(4)*x(2) 4*k*x(3)*x(4) h44]; end Constraint File: MATLAB Optimisation Tool function [c, ceq] = xtx(x) % The constraint as required for the MATLAB Optimization Tool. % c is the set of nonlinear inequality constraints. Empty in our case. c = []; % ceq is the set of nonlinear equality constraints. ceq = x(1)^2 + x(2)^2 + x(3)^2 + x(4)^2 - 1; end
  • 33. 33 SR1 Quasi-Newton Method: NewtonMethod_SR1.m % INPUT: % % f - the multivariable function to minimise (a separate % user-defined MATLAB function m-file) % % gradf - function which returns the gradient vector of f evaluated % at x (also a separate user-defined MATLAB function % m-file) % % x0 - the starting iterate % % tolerance1 - tolerance for stopping criterion of algorithm % % tolerance2 - tolerance for stopping criterion of line minimisation (eg: % in golden section search) % % H0 - a matrix used as the first approximation to the hessian. % Updated as the algorithm progresses % % T - parameter used by the "improved algorithm for % finding an upper bound for the minimum" along % each given descent direction % % OUTPUT: % % xminEstimate - estimate of the minimum % % fminEstimate - the value of f at xminEstimate function [xminEstimate, fminEstimate, iteration] = NewtonMethod_SR1(f,... gradf, x0,H0,tolerance1, tolerance2, T) tic %starts timer k = 0; % initialize iteration counter iteration_number=0; %initialise count xk = x0; % row vector xk_old=x0; % row vector H_old=H0; % square matrix while ( norm(feval(gradf, xk)) >= tolerance1 ) iteration_number = iteration_number + 1; H_old = H_old / max(max(H_old)); % Correction if det H_old gets % too large or small dk = transpose(-H_old*transpose(feval(gradf, xk))); % gives dk as % a row vector % minimise f with respect to t in the direction dk, which involves % two steps: % (1) find upper and lower bound, [a,b], for the stepsize t using % the "improved procedure" presented in the lecture notes [a, b] = multiVariableHalfOpen(f, xk, dk, T); % (2) use golden section algorithm (suitably modified for % functions of more than one variable) to estimate the
  • 34. 34 % stepsize t in [a,b] which minimises f in the direction dk % starting at xk [tmin, fmin] = multiVariableGoldenSectionSearch(f, a, b, tolerance2,... xk, dk); % note: we do not actually need fmin, but we do need tmin % update the iteration counter and the current iterate k = k + 1; xk = xk + tmin*dk; xk_new = xk_old + tmin*dk; % update the hessian approximation gk = (feval(gradf, xk_new) - feval(gradf, xk_old))'; %column vector s = (xk_new - xk)' - (H_old * gk); st = s'; H_new = H_old + (s * st) / (st * gk); % keep track of the old values xk_old=xk_new; H_old=H_new; end % assign output values toc xminEstimate = xk; fminEstimate = feval(f,xminEstimate) iteration = iteration_number