LeastSquaresParameterEstimation.ppt

EE-M110 2006/7, EF L5&6 1/29, v2.0
Lectures 5 & 6:
Least Squares Parameter Estimation
q1
q2
f(q)
Dr Martin Brown
Room: E1k, Control Systems Centre
Email: martin.brown@manchester.ac.uk
Telephone: 0161 306 4672
http://www.eee.manchester.ac.uk/intranet/pg/coursematerial/

EE-M110 2006/7, EF L5&6 2/29, v2.0
L5&6: Resources
Core texts
• Ljung, Chapters 4&7
• Norton, Chapter 4
• On-line, Chapters 4&5
In these two lectures, we’re looking at basic discrete time
representations of linear, time invariant plants and
models and seeing how their parameters can be
estimated using the normal equations.
The key example is the first order, linear, stable RC
electrical circuit which we met last week, and which has
an exponential response.

EE-M110 2006/7, EF L5&6 3/29, v2.0
L5&6: Learning Objectives
L5 Linear models and quadratic performance criterion
– ARX & ARMAX discrete-time, linear systems
– Predictive models, regression and exemplar data
– Residual signal
– Performance criterion
L6 Normal equations, interpretation and properties
– Quadratic cost functions
– Derive the normal equations for parameter estimation
– Examples
We’re not too concerned with system dynamics today, we’re
concentrating on the general form of least squares parameter
estimation

EE-M110 2006/7, EF L5&6 4/29, v2.0
Introduction to Parametric System Identification
In a full, physical, linear model, the model’s structure and
coefficients can be determined from first principles
In most cases, we have to estimate/tune the parameters because
of an incomplete understanding about the full system (unknown
drag, …)
We can use exemplar data (input/output examples), {x(t), y(t)}, to
estimate the unknown parameters
Initially assume that the structure is known (unrealistic, but …), and
all that remains to be estimated are the parameter values.
Plant
q
Model
q
u(t)
y(t)
y(t)
^
^
w(t)
v(t)







)
(
)
(
)
(
t
w
t
u
t
x
e(t)

EE-M110 2006/7, EF L5&6 5/29, v2.0
Recursive Parameter Estimation Framework
where:
q, q(t-1) are the real and estimated parameter vectors, respectively.
u(t) is the control input sequence
y(t), y(t), are the real and estimated outputs, respectively
e(t) is a white noise sequence (output/measurement noise)
w(t) is the disturbances from measurable sources
Controller
Plant
q
Model
q(t-1)
u(t) y(t)
y(t)
^
^
e(t)
w(t)
+
+
+
-
^
^
v(t)

EE-M110 2006/7, EF L5&6 6/29, v2.0
Basic Assumptions in System Identification
1) It is assumed that the unobservable disturbances can be
aggregated and represented by a single additive noise e(t).
There may also be input noise. Generally, it is assumed to be
zero-mean, Gaussian
2) The system is assumed to be linear with time-invariant
parameters, so q is not time-varying. This is only
approximately true within certain limits
3) The input signal u(t) is assumed exactly known. Often there
is noise associated with reading/measuring it
4) The system noise e(t) is assumed to be uncorrelated with
the input process u(t). This is unlikely to be true for instance
to due feedback of y(t)
5) The input signals need to be sufficiently exciting, they need
to excite all relevant modes in the model for identification and
testing

EE-M110 2006/7, EF L5&6 7/29, v2.0
Discrete-Time Transfer Function Models
On this course, we’re primarily concerned with discrete time signals and
systems.
Real-world physical, mechanical, electrical systems are continuous
Consider the CT resistor-capacitor circuit:
So let q-1 denote the backward shift operator q-1y(t)=y(t-1), then we have
NB we can use the c2d() Matlab function to go from the continuous time
(transfer function, state space) domain to the discrete time, z-domain.
( ) )
1
(
)
1
(
1
)
(
)
(
)
(
)
(
)
1
(
)
(
)
(
)
(














t
u
t
y
t
y
t
u
t
y
t
y
t
y
RC
t
u
t
y
dt
t
dy
RC
RC
RC
( ) 1
1
)
(
1
1
)
(
)
(
)
(
)
(
)
(









q
q
B
q
q
A
t
u
q
B
t
y
q
A
RC
RC

EE-M110 2006/7, EF L5&6 8/29, v2.0
Transfer Function/ARX DT LTI Model
The previous model is an example of an AutoRegressive with eXogenous
input), which can be expressed more generally as:
Some comments about the form of this model.
1. The degree of the polynomials determines the complexity of the system’s
response and the number of parameters that have to be estimated. The
roots of A(q) determine system stability
2. a0=1, without loss of generality, so the model can be written as a predictive
model y(t) = y(t-1) + … + u(t-1) + …
3. b0=0, as it is assumed that an input cannot instantly affect the output, and
so there must be at least a delay of one time instant between u & y
(assumes a fast enough sample time, relative to the system dynamics).
4. Typically e~N(0,s2) – independent and identically distributed
5. Close relationship between the q-shift and z-transform
6. When n=0, this produces a finite impulse response
m
m
n
n
q
b
q
b
q
B
q
a
q
a
q
A
t
e
t
u
q
B
t
y
q
A















1
1
1
1
)
(
1
)
(
)
(
)
(
)
(
)
(
)
(

EE-M110 2006/7, EF L5&6 9/29, v2.0
Linear Regression
The ARX system’s prediction model can be expressed as
• Here the model’s parameters can be written as:
• Treat the model as a deterministic system
• This is natural if the error term is considered to be insignificant or difficult
to guess
• This denotes the model structure M (linear, time invariant, for example),
and a particular model with a parameter value q, is M(q).
This can be written as a linear regression structure:
where
Parameter vector:
Input vector:
The term regression comes from the statistics literature and provides a
powerful set of techniques for determining the parameters and
interpretating the models. Need access to previous outputs y(t-1) …
)
(
))
(
1
(
)
(
)
(
)
|
(
ˆ t
y
q
A
t
u
q
B
t
y 


θ
θ
x
θ )
(
)
|
(
ˆ t
t
y T

T
m
n b
b
a
a ]
,
,
,
,
,
[ 1
1 
 


θ
T
m
t
u
t
u
n
t
y
t
y
t )]
(
,
),
1
(
),
(
,
),
1
(
[
)
( 



 

x
T
m
n b
b
a
a ]
,
,
,
,
,
[ 1
1 
 


θ

EE-M110 2006/7, EF L5&6 10/29, v2.0
LTI DT ARMAX Model
A more general discrete time, linear time invariant model also includes
Moving Average terms on the error/residual signal
Here, we describe the equation error term, e(t), as a moving average of
white noise (non-iid measurement errors)
Simple example
y(t) = 0.5y(t-1) + 0.3y(t-2) + 1.2u(t-1) - 0.3u(t-2) + 0.5e(t) + 0.5e(t-1)
This can be written as a pseudolinear regression
c
c
b
b
a
a
n
n
n
n
n
n
q
c
q
c
q
C
q
b
q
b
q
B
q
a
q
a
q
A
t
e
q
C
t
u
q
B
t
y
q
A






















1
1
1
1
1
1
1
)
(
)
(
1
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
1
)
(
)
(
)
(
)
|
(
ˆ t
y
q
C
q
A
t
u
q
C
q
B
t
y 








θ

EE-M110 2006/7, EF L5&6 11/29, v2.0
Exemplar Training Data
To estimate the unknown parameters q, we need to collect some
exemplar input-output data, and system identification is then a process
of estimating the parameter values that best fit the data.
The data is generated by a system of noisy ARX linear equations of the
form
where
y is a column vector of measured plant outputs (T,1)
X is a matrix of input regressors (T,n+m)
q is the “true” parameter vector (n+m,1)
e is the error vector (T,1)
Each row of X represents a single input/output sample. Each column of
X represents a time delayed output or input.
Note that there is a “burn-in” period to measure the time-delayed outputs
y(1), y(2), … which are necessary to form the inputs to the time-delayed
vector
[y(1), …, y(t-n)]
e
Xθ
y 


EE-M110 2006/7, EF L5&6 12/29, v2.0
Example: Data for 1st Order ARX Model
1st Order model representation
First order plant model (exponential decay) with no external disturbances
and the measurement noise is additive (Slide 7)
Input vector, output signal and parameters
At time t, the 1st order DT model is represented as
Output y(t)
Input x(t) = [y(t-1); u(t-1)]
Parameters q = [q1; q2]
Data
As there are two parameters, if the system is truly first order and there is
no measurement noise on any of the signals, we just need two (linearly
independent) samples to estimate q.
If there is measurement noise in y(t), we need to collect more data to
reduce the effect of the random noise.
Store X=[y(1) u(1); y(2) u(2); y(3) u(3); …], y=[y(2); y(3); y(4); …]

EE-M110 2006/7, EF L5&6 13/29, v2.0
Prediction Residual Signal
The residual signal (measured-predicted) is defined as:
and can be represented as:
A simple regression interpretation is
(each x represents an exemplar
sample from a single input,
single output system)
)
(t
x
Plant
q
Model
q
y(t)
y(t)
^
^
e(t)
+
+
+
-
x(t)
x(t) r(t)
output
measurement
“residual”
x
x
x
x
x
x
x
)
(
ˆ
)
(
)
( t
y
t
y
t
r 

)
(
ˆ
)
(
t
y
t
y
)
(
ˆ t
y
)
(t
y
)
(t
r

EE-M110 2006/7, EF L5&6 14/29, v2.0
The model’s response can be expressed as
y(t) = xT(t)q
where q is the model’s estimated parameter vector and x(t) is
the input vector
If y(t)=y(t), the model’s response is correct for that single
time sample. The residual r(t)=y(t)-y(t) is zero. The
residual’s magnitude gives us an idea of the “goodness” of
the parameter vector estimate for that data point.
For a set of measured outputs and predictions {y(t),y(t)}t, the
“size” of the residual vector r=y-y, is an estimate of the
parameter goodness
We can determine the size by looking at the norm of r.
Measures of Model Goodness
^ ^
^
^
^
^
^

EE-M110 2006/7, EF L5&6 15/29, v2.0
Residual Norm Measures
A vector p-norm (of a vector r) is defined by:
The most common p-norm is the 2-norm:
The vector p-norm has the properties that:
• ||r||  0
• ||r|| = 0 iff r = 0
• ||kr|| = k||r||
• ||r1+r2||  ||r1||+||r2||
For the residual vector, the norm is only zero if all the
residuals are zero. Otherwise, a small norm means that, on
average, the individual residuals are small in magnitude.
p
T
i
p
i
p
r
1
1






 

r



T
i
i
r
1
2
2
r

EE-M110 2006/7, EF L5&6 16/29, v2.0
Sum of Squared Residuals
The most common discrete time performance index is the
sum of squared residuals (2-norm squared):
For each data point, the model’s output is compared
against the plants and error is squared and summed
over the remaining points.
Any non-zero value for any of the residual values will mean
that the performance index is positive
The performance function f(q) is a function of the
parameter values, because some parameter values will
cause large residuals, others will cause small residuals.
We want the parameter values that minimize f(q) (0).











T
i
i
i y
y
f
1
2
^
2
2
)
( r
θ

EE-M110 2006/7, EF L5&6 17/29, v2.0
The aim of parameter estimation is to estimate the values of q that
minimize this performance index (sum squared residuals or errors
SSE).
When the model can predict the model exactly:
r(t) = e(t)
The residual signal is equal to the additive noise signal
Note that the SSE is often replaced by the mean squared error MSE
defined by
MSE = SSE/T  s2 (the variance of the additive noise signal)
This is the variance of the residual signal.
This is simply represents the average squared error and ensures that the
performance function does not depend on the amount of data
Example, when we have 1000 repeated trials (step responses) of 9 data
points for the DT electric circuit, with additive noise N(0,0.01)
MSE = ||r||2
2/T = 0.0103  s2
RMSE = 0.1015  s`
Relationship between Noise & Residual

EE-M110 2006/7, EF L5&6 18/29, v2.0
Example: DT RC Electrical Circuit
Consider the DT, first order, LTI representation of the RC circuit which is an
ARX model (Slide 7 & 12)
Assume that /RC=0.5, then:
y(t) = 0.5*y(t-1) + 0.5*u(t-1)
Here the system is initially at rest y(0)=0. Note that u here refers to a step
signal which is switched on at t=1 & u(0)=0, rather than the control
signal
Assume that 10 steps are taken, we collect 9 data points for system
identification:
>> X=[y(1:end-1)’ u(1:end-1)];
>> y1 = y(2:end)’;
Gaussian random noise of standard error 0.05 was also added to y1
>> y1e = y1+0.05*randn(size(y1));

















1
992
.
0
1
5
.
0
1
0
0
0


X

















996
.
0
75
.
0
5
.
0
0

y

EE-M110 2006/7, EF L5&6 19/29, v2.0
Example: Noisy Electric Circuit
Note here, we’re cheating a bit by assuming the exact
measurement y(t-1) is available to the model’s input but only
the noisy measurement ye(t) is available to the model’s output.
NB, in these notes, y() generally denotes the noisy output

















1
992
.
0
1
5
.
0
1
0
0
0


X

















996
.
0
75
.
0
5
.
0
0

y




















008
.
1
774
.
0
510
.
0
037
.
0
)
01
.
0
,
0
(

y
ye
 
 
090
.
0
ˆ
,
0081
.
0
ˆ
,
073
.
0
033
.
0
029
.
0
001
.
0
0367
.
0
9744
.
0
744
.
0
511
.
0
0
ˆ
]
5173
0
4756
0
[
ˆ
2
2
2







s
s
r
r
y
θ
T
T
T
.
.

 NB
randn(‘state’, 123456)

EE-M110 2006/7, EF L5&6 20/29, v2.0
Parameter Estimation
An important part of system identification is being able to
estimate the parameters of a linear model, when a
quadratic performance function is used to measure the
model’s goodness.
This produces the well-known normal equations for least
squares estimation
• This is a closed form solution
• Efficiently and robustly solved (in Matlab)
• Permits a statistical interpretation
• Can be solved recursively
Investigated over the next 3-4 lectures
( ) y
X
X
X
θ T
T 1
ˆ 


EE-M110 2006/7, EF L5&6 21/29, v2.0
Noise-free Parameter Determination
Parameter estimation works by assuming a plant/model
structure, which is taken to be exactly known.
If there are n+m parameters in the model, we can collect
n+m pieces of data (linearly independent – to ensure
that the input/data matrix, X, is invertible):
Xq = y
and invert the matrix to find the exact parameter values:
q =X-1y
In Matlab, both of the following forms are equivalent:
theta = inv(X)*y;
theta = Xy;
theta = [0.5 0.5] % Previous example

EE-M110 2006/7, EF L5&6 22/29, v2.0
Linear Model and Quadratic Performance
When the model is linear and the data is noisy (missing inputs,
unmeasurable disturbances), the Sum Squared Error (SSE)
performance index can be expressed as:
This expression is quadratic in q. Typically
size(X,1)>>size(X,2)
It is of the form (for 2 inputs/parameters):
The equivalent system of linear equations Xq=y+e is inconsistent
( )
( )























T
i
T
i
T
i
T
i
i
T
i
i
T
i
T
i
i
T
i
i
i
y
y
y
y
y
f
1
2
1
1
2
1
2
1
2
^
2
)
(
θ
x
θ
x
θ
x
θ
2
1
2
2
2
1
2
1 5
.
0
6
8
3
2
5
)
( q
q
q
q
q
q 





θ
f

EE-M110 2006/7, EF L5&6 23/29, v2.0
Quadratic Matrix Representation
This can also be expressed in matrix form
The general form for a quadratic is:
where
( ) ( )
Xθ
X
θ
y
X
θ
y
y
Xθ
X
θ
y
X
θ
Xθ
y
y
y
Xθ
y
Xθ
y
θ
T
T
T
T
T
T
T
T
T
T
T
T
f










2
)
(
c
f T
T


 θ
f
Hθ
θ
θ 2
1
)
(
)
(
.
.
j
i
k j
k
i
k
ij
T
x
x
E
x
x
H
D
X
X
H




)
(
.
y
x
E
y
x
f
i
k k
i
k
i
T
D
y
X
f







Hessian/covariance matrix Cross-correlation vector

EE-M110 2006/7, EF L5&6 24/29, v2.0
When the parameter vector is optimal:
For a quadratic MSE with a linear model:
At optimality:
In Matlab, the normal equations are:
thetaHat = inv(X’*X)*X’*y;
thetaHat = pinv(X)*y;
thetaHat = Xy;
Normal Equations for a Linear Model
0
θ


f
( )
y
X
Xθ
X
y
y
y
X
θ
Xθ
X
θ
θ
θ
T
T
T
T
T
T
T
f
2
2
2









f
^
q
q
( ) y
X
X
X
θ
0
y
X
θ
X
X
T
T
T
T
1
ˆ
ˆ





EE-M110 2006/7, EF L5&6 25/29, v2.0
Example 1: 2 Parameter Model
Data: 3 data and 2 unknowns































2
.
0
95
.
0
1
.
1
85
.
0
1
2
2
.
1
8
.
0
2
2
1
q
q
T
]
052
.
1
988
.
0
[
ˆ 
θ


























2
.
0
95
.
0
1
.
1
,
85
.
0
1
2
2
.
1
8
.
0
2
y
X
Find Least Squares solution to:
Form variance/covariance matrix and cross correlation vector
Invert variance/covariance matrix









36
.
5
85
.
4
85
.
4
44
.
6
X
XT







85
.
0
26
.
1
y
XT
( ) 







5848
.
0
4404
.
0
4404
.
0
4870
.
0
1
X
XT
Least squares solution
( )
1
ˆ T T


θ X X X y

EE-M110 2006/7, EF L5&6 26/29, v2.0
Example 2: Electrical Circuit ARX Model
9 exemplars and 2 parameters.
Additive measurement noise
T
]
511
.
0
467
.
0
[
ˆ 
θ
Inverse Hessian matrix







89
.
6
57
.
5
y
XT
( ) 











799
.
0
897
.
0
897
.
0
194
.
1
1
1
X
X
H T
Least squares solution

















1
992
.
0
1
5
.
0
1
0
0
0


X

















007
.
1
774
.
0
510
.
0
037
.
0

y
Hessian (variance/covariance)
matrix and correlation vector








8
0078
.
6
0078
.
6
3489
.
5
X
X
H T
NB
randn(‘state’, 123456)
See slides 7, 12, 18 & 19

EE-M110 2006/7, EF L5&6 27/29, v2.0
We can “plot” the performance index against different
parameter values in a model
As shown earlier, f() is a quadratic function in q
It is “centred” at q, I.e. f(q) = min f(q)
The shape (contours) depends on the Hessian matrix X, this
influences the ability to identify the plant. See next lectures
q1
q2
f
Investigation into the Performance Function
^ ^

EE-M110 2006/7, EF L5&6 28/29, v2.0
L5&6 Summary
ARX and ARMAX discrete time linear models are widely used
System identification is being considered simply as parameter estimation
The residual vector is used to assess the quality of the model (parameter
vector)
The sum, squared error/residual (2-norm) is commonly used to measure
the residual’s size because it can be interpreted as the noise variance
and because it is analytically convenient
For a linear model, the SSE is a quadratic function of the parameters,
which can be differentiated to estimate the optimal parameter via the
normal equations

EE-M110 2006/7, EF L5&6 29/29, v2.0
L5&6 Lab
Theory
• Make sure you can derive the normal equations S22-24
Matlab
1. Implement the DT RC circuit simulation, S18, so you can perform a least
squares parameter estimation given noisy data about the electrical circuit
2. Set the Gaussian random seed, as per S26 and check your estimates
are the same
3. Set different seed and note that the optimal parameter values are
different
4. Perform the step experiment 10, 100, 1000, … times and note that the
estimated optimal parameter values tend towards the true values of [0.5
0.5].
5. Load the data into the identification toolbox GUI and create a first order
parametric model with model orders [1 1 1]. NB you do not need to
remove the means from the data (why not?). Calculate the model and
view the value of the parameters and the model fit, as well as checking
the step response and validating the model.

LeastSquaresParameterEstimation.ppt

Recommended

Recommended

More Related Content

Similar to LeastSquaresParameterEstimation.ppt

Similar to LeastSquaresParameterEstimation.ppt (20)

Recently uploaded

Recently uploaded (20)

LeastSquaresParameterEstimation.ppt