SlideShare a Scribd company logo
1 of 7
Download to read offline
Lecture 13: Non-linear least squares and the
Gauss-Newton method
Michael S. Floater
November 12, 2018
1 Non-linear least squares
A minimization problem that occurs frequently is the minimization of a func-
tion of the form
f(x) =
1
2
m
X
i=1
ri(x)2
, (1)
where x ∈ Rn
and ri : Rn
→ R, i = 1, . . . , m. Such a minimization problem
comes from curve fitting by least squares, where the ri are the residuals.
1.1 Linear case
Suppose we are given data (tj, yj), j = 1, . . . , m, and we want to fit a straight
line,
p(t) = x1 + x2t.
Then we would like to find x1 and x2 that minimize
1
2
m
X
i=1
(yi − p(ti))2
=
1
2
m
X
i=1
(yi − x1 − x2ti)2
.
This problem can be formulated as (1) with n = 2 where the residuals are
ri(x) = ri(x1, x2) = yi − p(ti) = yi − x1 − x2ti.
1
More generally, we could fit a polynomial
p(t) =
n
X
j=1
xjtj−1
,
or even a linear combination of basis functions φ1(t), . . . , φn(t),
p(t) =
n
X
j=1
xjφj(t).
These are again examples of (1), where
ri(x) = ri(x1, . . . , xn) = yi − p(ti).
In all these cases, the problem is linear in the sense that the solution is
found by solving a linear system of equations. This is because f is quadratic
in x. We can express f as
f(x) =
1
2
kAx − bk2
,
where A ∈ Rm,n
is the Vandermonde matrix
A =





φ1(t1) φ2(t1) · · · φn(t1)
φ1(t2) φ2(t2) · · · φn(t2)
.
.
.
.
.
.
.
.
.
φ1(tm) φ2(tm) · · · φn(tm)





,
x = [x1, x2, . . . , xn]T
is the vector of coefficients of p, and b = [y1, y2, . . . , ym]T
is the vector of data observations.
We have seen that we can then find x from the QR decomposition of A
or from the normal equations, for example.
1.2 Non-linear case
It might be more appropriate to fit a curve p(t) that does not depend linearly
on its parameters x1, . . . , xn. An example of this is the rational function
p(t) =
x1t
x2 + t
.
2
Another is the exponential function
p(t) = x1ex2t
.
In both cases we would again like to find x1 and x2 to minimize
1
2
m
X
i=1
(yi − p(ti))2
.
As for the linear case we can reformulate this as the minimization of f in (1)
with the residuals
ri(x) = ri(x1, x2) = yi − p(ti).
In these cases the problem is non-linear since f is no longer a quadratic
function (the residuals are no longer linear in the parameters x1, . . . , xn). One
approach to minimizing such an f is to try Newton’s method. Recall that
Newton’s method for minimizing f is simply Newton’s method for solving
the system of n equations, ∇f(x) = 0, which is the iteration
x(k+1)
= x(k)
− (∇2
f(x(k)
))−1
∇f(x(k)
). (2)
The advantages of Newton’s method are:
1. If f is quadratic, it converges in one step, i.e., x(1)
is the global minimum
of f for any initial guess x(0)
.
2. For non-linear least squares it converges quadratically to a local mini-
mum if the initial guess x(0)
is close enough.
The disadvantage of Newton’s method is its lack of robustness. For non-
linear least squares it might not converge. One reason for this is that the
search direction
d(k)
= −(∇2
f(x(k)
))−1
∇f(x(k)
)
might not even be a descent direction: there is no guarantee that it fulfills
the descent condition,
∇f(x(k)
))T
d(k)
< 0.
One way to improve robustness is to use the Gauss-Newton method in-
stead. The Gauss-Newton method is also simpler to implement.
3
2 Gauss-Newton method
The Gauss-Newton method is a simplification or approximation of the New-
ton method that applies to functions f of the form (1). Differentiating (1)
with respect to xj gives
∂f
∂xj
=
m
X
i=1
∂ri
∂xj
ri,
and so the gradient of f is
∇f = JT
r r,
where r = [r1, . . . , rm]T
and Jr ∈ Rm,n
is the Jacobian of r,
Jr =

∂ri
∂xj

i=1,...,m,j=1,...,n
.
Differentiating again, with respect to xk, gives
∂2
f
∂xj∂xk
=
m
X
i=1

∂ri
∂xj
∂ri
∂xk
+ ri
∂2
ri
∂xj∂xk

,
and so the Hessian of f is
∇2
f = JT
r Jr + Q,
where
Q =
m
X
i=1
ri∇2
ri.
The Gauss-Newton method is the result of neglecting the term Q, i.e., making
the approximation
∇2
f ≈ JT
r Jr. (3)
Thus the Gauss-Newton iteration is
x(k+1)
= x(k)
− (Jr(x(k)
)T
Jr(x(k)
))−1
Jr(x(k)
)T
r(x(k)
).
In general the Gauss-Newton method will not converge quadratically but
if the elements of Q are small as we approach a minimum, we can expect
fast convergence. This will be the case if either the ri or their second order
partial derivatives
∂2
ri
∂xj∂xk
4
are small as we approach a minimum.
An advantage of this method is that it does not require computing the
second order partial derivatives of the functions ri. Another is that the search
direction, i.e.,
d(k)
= −(Jr(x(k)
)T
Jr(x(k)
))−1
∇f(x(k)
),
is always a descent direction (as long as Jr(x(k)
) has full rank). This is
because JT
r Jr is positive semi-definite, which implies that (JT
r Jr)−1
is also
positive semi-definite, which means that
∇f(x(k)
))T
d(k)
= −∇f(x(k)
))T
(Jr(x(k)
)T
Jr(x(k)
))−1
∇f(x(k)
) ≤ 0.
If Jr(x(k)
) has full rank this inequality is strict. This suggests that the Gauss-
Newton method will typically be more robust than Newton’s method.
There is still no guarantee, however, that the Gauss-Newton method will
converge in general. In pratice, one would want to incorporate a step length
α(k)
into the iteration:
x(k+1)
= x(k)
+ α(k)
d(k)
,
using some rule like the Armijo rule, in order to ensure descent at each
iteration.
3 Example
In a biology experiment studying the relation between substrate concentra-
tion [S] and reaction rate in an enzyme-mediated reaction, the data in the
following table were obtained.
i 1 2 3 4 5 6 7
[S] 0.038 0.194 0.425 0.626 1.253 2.500 3.740
rate 0.050 0.127 0.094 0.2122 0.2729 0.2665 0.3317
It is desired to find a curve (model function) of the form
rate =
Vmax[S]
KM + [S]
that best fits the data in the least-squares sense, with the parameters Vmax
and KM to be determined.
5
0 0.5 1 1.5 2 2.5 3 3.5 4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Figure 1: Curve model
We can rewrite this problem as finding x1 and x2 such that
p(t) =
x1t
x2 + t
best fits the data (ti, yi), i = 1, 2, . . . , 7, of the table where ti is the i-th
concentration [S] and yi is the i-th rate. We will find x1 and x2 that minimize
the sum of squares of the residuals
ri = yi −
x1ti
x2 + ti
, i = 1, . . . , 7.
The Jacobian Jr of the vector of residuals ri with respect to the unknowns
x1 and x2 is a 7 × 2 matrix with the i-th row having the entries
∂ri
∂x1
= −
ti
x2 + t
,
∂ri
∂x2
=
x1ti
(x2 + t)2
.
Starting with the initial estimates of x1 = 0.9 and x2 = 0.2, and using
the stopping criterion
k∇fk2 ≤ 10−15
, (4)
the method converges in 14 iterations, yielding the solution x1 = 0.3618,
x2 = 0.5563. The sum of squares of residuals decreased from the initial value
of 1.445 to 0.0078. The plot in Figure 1 shows the curve determined by the
model for the optimal parameters with the observed data.
We can alternatively try the (full) Newton method. We then also need
the second order partial derivatives,
∂2
ri
∂x2
1
= 0,
∂2
ri
∂x1∂x2
=
ti
(x2 + ti)2
,
∂2
ri
∂x2
2
=
−2x1ti
(x2 + ti)3
.
6
Starting with the same initial estimates of x1 = 0.9 and x2 = 0.2, New-
ton’s method does not converge. However, if we change the initial estimates
to x1 = 0.4 and x2 = 0.6 we find that both the Gauss-Newton and New-
ton methods converge. Moreover, using again the stopping criterion of (4),
the Gauss-Newton method needs 11 iterations while Newton’s method needs
only 5.
7

More Related Content

Similar to Metodo gauss_newton.pdf

Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt msFaeco Bot
 
An Approach For Solving Nonlinear Programming Problems
An Approach For Solving Nonlinear Programming ProblemsAn Approach For Solving Nonlinear Programming Problems
An Approach For Solving Nonlinear Programming ProblemsMary Montoya
 
Mit2 092 f09_lec23
Mit2 092 f09_lec23Mit2 092 f09_lec23
Mit2 092 f09_lec23Rahman Hakim
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Alexander Litvinenko
 
Zeros of orthogonal polynomials generated by a Geronimus perturbation of meas...
Zeros of orthogonal polynomials generated by a Geronimus perturbation of meas...Zeros of orthogonal polynomials generated by a Geronimus perturbation of meas...
Zeros of orthogonal polynomials generated by a Geronimus perturbation of meas...Edmundo José Huertas Cejudo
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component AnalysisSumit Singh
 
An efficient improvement of the Newton method for solving nonconvex optimizat...
An efficient improvement of the Newton method for solving nonconvex optimizat...An efficient improvement of the Newton method for solving nonconvex optimizat...
An efficient improvement of the Newton method for solving nonconvex optimizat...Christine Maffla
 
Geometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first orderGeometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first orderJuliho Castillo
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Alexander Litvinenko
 
Linear approximations and_differentials
Linear approximations and_differentialsLinear approximations and_differentials
Linear approximations and_differentialsTarun Gehlot
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series AnalysisAmit Ghosh
 
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Xin-She Yang
 
Dynamical systems
Dynamical systemsDynamical systems
Dynamical systemsSpringer
 

Similar to Metodo gauss_newton.pdf (20)

Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt ms
 
An Approach For Solving Nonlinear Programming Problems
An Approach For Solving Nonlinear Programming ProblemsAn Approach For Solving Nonlinear Programming Problems
An Approach For Solving Nonlinear Programming Problems
 
Mit2 092 f09_lec23
Mit2 092 f09_lec23Mit2 092 f09_lec23
Mit2 092 f09_lec23
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics
 
maths convergence.pdf
maths convergence.pdfmaths convergence.pdf
maths convergence.pdf
 
Zeros of orthogonal polynomials generated by a Geronimus perturbation of meas...
Zeros of orthogonal polynomials generated by a Geronimus perturbation of meas...Zeros of orthogonal polynomials generated by a Geronimus perturbation of meas...
Zeros of orthogonal polynomials generated by a Geronimus perturbation of meas...
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
PCA on graph/network
PCA on graph/networkPCA on graph/network
PCA on graph/network
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
An efficient improvement of the Newton method for solving nonconvex optimizat...
An efficient improvement of the Newton method for solving nonconvex optimizat...An efficient improvement of the Newton method for solving nonconvex optimizat...
An efficient improvement of the Newton method for solving nonconvex optimizat...
 
Nokton theory-en
Nokton theory-enNokton theory-en
Nokton theory-en
 
Geometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first orderGeometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first order
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
Linear approximations and_differentials
Linear approximations and_differentialsLinear approximations and_differentials
Linear approximations and_differentials
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
 
Dynamical systems
Dynamical systemsDynamical systems
Dynamical systems
 

Recently uploaded

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 

Recently uploaded (20)

9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 

Metodo gauss_newton.pdf

  • 1. Lecture 13: Non-linear least squares and the Gauss-Newton method Michael S. Floater November 12, 2018 1 Non-linear least squares A minimization problem that occurs frequently is the minimization of a func- tion of the form f(x) = 1 2 m X i=1 ri(x)2 , (1) where x ∈ Rn and ri : Rn → R, i = 1, . . . , m. Such a minimization problem comes from curve fitting by least squares, where the ri are the residuals. 1.1 Linear case Suppose we are given data (tj, yj), j = 1, . . . , m, and we want to fit a straight line, p(t) = x1 + x2t. Then we would like to find x1 and x2 that minimize 1 2 m X i=1 (yi − p(ti))2 = 1 2 m X i=1 (yi − x1 − x2ti)2 . This problem can be formulated as (1) with n = 2 where the residuals are ri(x) = ri(x1, x2) = yi − p(ti) = yi − x1 − x2ti. 1
  • 2. More generally, we could fit a polynomial p(t) = n X j=1 xjtj−1 , or even a linear combination of basis functions φ1(t), . . . , φn(t), p(t) = n X j=1 xjφj(t). These are again examples of (1), where ri(x) = ri(x1, . . . , xn) = yi − p(ti). In all these cases, the problem is linear in the sense that the solution is found by solving a linear system of equations. This is because f is quadratic in x. We can express f as f(x) = 1 2 kAx − bk2 , where A ∈ Rm,n is the Vandermonde matrix A =      φ1(t1) φ2(t1) · · · φn(t1) φ1(t2) φ2(t2) · · · φn(t2) . . . . . . . . . φ1(tm) φ2(tm) · · · φn(tm)      , x = [x1, x2, . . . , xn]T is the vector of coefficients of p, and b = [y1, y2, . . . , ym]T is the vector of data observations. We have seen that we can then find x from the QR decomposition of A or from the normal equations, for example. 1.2 Non-linear case It might be more appropriate to fit a curve p(t) that does not depend linearly on its parameters x1, . . . , xn. An example of this is the rational function p(t) = x1t x2 + t . 2
  • 3. Another is the exponential function p(t) = x1ex2t . In both cases we would again like to find x1 and x2 to minimize 1 2 m X i=1 (yi − p(ti))2 . As for the linear case we can reformulate this as the minimization of f in (1) with the residuals ri(x) = ri(x1, x2) = yi − p(ti). In these cases the problem is non-linear since f is no longer a quadratic function (the residuals are no longer linear in the parameters x1, . . . , xn). One approach to minimizing such an f is to try Newton’s method. Recall that Newton’s method for minimizing f is simply Newton’s method for solving the system of n equations, ∇f(x) = 0, which is the iteration x(k+1) = x(k) − (∇2 f(x(k) ))−1 ∇f(x(k) ). (2) The advantages of Newton’s method are: 1. If f is quadratic, it converges in one step, i.e., x(1) is the global minimum of f for any initial guess x(0) . 2. For non-linear least squares it converges quadratically to a local mini- mum if the initial guess x(0) is close enough. The disadvantage of Newton’s method is its lack of robustness. For non- linear least squares it might not converge. One reason for this is that the search direction d(k) = −(∇2 f(x(k) ))−1 ∇f(x(k) ) might not even be a descent direction: there is no guarantee that it fulfills the descent condition, ∇f(x(k) ))T d(k) < 0. One way to improve robustness is to use the Gauss-Newton method in- stead. The Gauss-Newton method is also simpler to implement. 3
  • 4. 2 Gauss-Newton method The Gauss-Newton method is a simplification or approximation of the New- ton method that applies to functions f of the form (1). Differentiating (1) with respect to xj gives ∂f ∂xj = m X i=1 ∂ri ∂xj ri, and so the gradient of f is ∇f = JT r r, where r = [r1, . . . , rm]T and Jr ∈ Rm,n is the Jacobian of r, Jr = ∂ri ∂xj i=1,...,m,j=1,...,n . Differentiating again, with respect to xk, gives ∂2 f ∂xj∂xk = m X i=1 ∂ri ∂xj ∂ri ∂xk + ri ∂2 ri ∂xj∂xk , and so the Hessian of f is ∇2 f = JT r Jr + Q, where Q = m X i=1 ri∇2 ri. The Gauss-Newton method is the result of neglecting the term Q, i.e., making the approximation ∇2 f ≈ JT r Jr. (3) Thus the Gauss-Newton iteration is x(k+1) = x(k) − (Jr(x(k) )T Jr(x(k) ))−1 Jr(x(k) )T r(x(k) ). In general the Gauss-Newton method will not converge quadratically but if the elements of Q are small as we approach a minimum, we can expect fast convergence. This will be the case if either the ri or their second order partial derivatives ∂2 ri ∂xj∂xk 4
  • 5. are small as we approach a minimum. An advantage of this method is that it does not require computing the second order partial derivatives of the functions ri. Another is that the search direction, i.e., d(k) = −(Jr(x(k) )T Jr(x(k) ))−1 ∇f(x(k) ), is always a descent direction (as long as Jr(x(k) ) has full rank). This is because JT r Jr is positive semi-definite, which implies that (JT r Jr)−1 is also positive semi-definite, which means that ∇f(x(k) ))T d(k) = −∇f(x(k) ))T (Jr(x(k) )T Jr(x(k) ))−1 ∇f(x(k) ) ≤ 0. If Jr(x(k) ) has full rank this inequality is strict. This suggests that the Gauss- Newton method will typically be more robust than Newton’s method. There is still no guarantee, however, that the Gauss-Newton method will converge in general. In pratice, one would want to incorporate a step length α(k) into the iteration: x(k+1) = x(k) + α(k) d(k) , using some rule like the Armijo rule, in order to ensure descent at each iteration. 3 Example In a biology experiment studying the relation between substrate concentra- tion [S] and reaction rate in an enzyme-mediated reaction, the data in the following table were obtained. i 1 2 3 4 5 6 7 [S] 0.038 0.194 0.425 0.626 1.253 2.500 3.740 rate 0.050 0.127 0.094 0.2122 0.2729 0.2665 0.3317 It is desired to find a curve (model function) of the form rate = Vmax[S] KM + [S] that best fits the data in the least-squares sense, with the parameters Vmax and KM to be determined. 5
  • 6. 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Figure 1: Curve model We can rewrite this problem as finding x1 and x2 such that p(t) = x1t x2 + t best fits the data (ti, yi), i = 1, 2, . . . , 7, of the table where ti is the i-th concentration [S] and yi is the i-th rate. We will find x1 and x2 that minimize the sum of squares of the residuals ri = yi − x1ti x2 + ti , i = 1, . . . , 7. The Jacobian Jr of the vector of residuals ri with respect to the unknowns x1 and x2 is a 7 × 2 matrix with the i-th row having the entries ∂ri ∂x1 = − ti x2 + t , ∂ri ∂x2 = x1ti (x2 + t)2 . Starting with the initial estimates of x1 = 0.9 and x2 = 0.2, and using the stopping criterion k∇fk2 ≤ 10−15 , (4) the method converges in 14 iterations, yielding the solution x1 = 0.3618, x2 = 0.5563. The sum of squares of residuals decreased from the initial value of 1.445 to 0.0078. The plot in Figure 1 shows the curve determined by the model for the optimal parameters with the observed data. We can alternatively try the (full) Newton method. We then also need the second order partial derivatives, ∂2 ri ∂x2 1 = 0, ∂2 ri ∂x1∂x2 = ti (x2 + ti)2 , ∂2 ri ∂x2 2 = −2x1ti (x2 + ti)3 . 6
  • 7. Starting with the same initial estimates of x1 = 0.9 and x2 = 0.2, New- ton’s method does not converge. However, if we change the initial estimates to x1 = 0.4 and x2 = 0.6 we find that both the Gauss-Newton and New- ton methods converge. Moreover, using again the stopping criterion of (4), the Gauss-Newton method needs 11 iterations while Newton’s method needs only 5. 7