SlideShare a Scribd company logo
1 of 81
1
These notes contain comments on Orfanidis book, Wiener Filter, adaptive filter, Karhunen-
Loeve expansion, Kalman Filter, Blind Deconvolution, and others.
Introduction to Random Variables
In this section, we present a short review of probability concepts. Let x be a random variable that
lies in the range  x , and has probability density function (pdf) f(x). Its mean m,
variance 2
 , and nth moment are defined by the expectation values:
  


 dxxxfxEm )(  x
      


 dxxfxExxExE )(
222
  x
  


 dxxfxxE nn
)(  x
Notice that f(x), and  n
xE are all deterministic quantities.
For N realizations of the random variable x, 0x , 1x , …, 1Nx we have:
  





1
0
1
)(
N
i
ix
N
dxxxfxEm
   





1
0
1
)(
N
i
n
i
nn
x
N
dxxfxxE
The probability that the random variable x will assume a value within an interval of values [a, b]
is given by:
  
b
a
dxxfbxaob )(Pr
A commonly used model for the pdf f(x) of the random variable x is the Gaussian or normal
distribution which is given as:
2
  







 
 2
2
2 2
1
exp
2
1
)(

xEx
xf  x
In typical signal processing problems, such as designing filters to remove or separate noise from
signal, it is often assumed that the noise interference is Gaussian.
Joint and Conditional Densities, and Bayes’ Rule:
In many situations we deal with more than one random variable i.e. random vectors. A pair of
two different random variables  21, xxx  may be thought of as a vector-valued random
variable. Its statistical description requires the knowledge of the joint probability density
function (jpdf)    21, xxfxf  . The two random variables may or may not be independent of
each other. A quantity that provides a measure for the degree of dependence of the two variables
on each other is the conditional density )/( 21 xxf which is the pdf of 1x given 2x . The
conditional pdf, the jpdf, and the pdf are given by the Bayes’ rule:
         11222121 /)(/, xfxxfxfxxfxxfxf 
More generally, Bayes’ rule for two events A and B is given as:
)(Pr)/(Pr)(Pr)/(Pr),(Pr AobABobBobBAobbAob 
The two random variables 1x and 2x are independent of each other if:
       2121, xfxfxxfxf 
i.e.    121 / xfxxf 
The correlation between 1x and 2x is defined by the expectation value
    




 21212121 ,21
dxdxxxfxxxxER xx ,  1x ,  2x
For N realizations of the random variables 1x , 10x , … 1,1 Nx , and 2x , 20x , … 1,2 Nx , we have:
3
     







1
0
2121212121
1
,21
N
i
iixx xx
N
dxdxxxfxxxxER
When 1x and 2x are independent the correlation is the product of the expected values i.e.
      212121 ,21
xandxxExExxER xx  are independent
Example: Assume that vxx  12  where  is a deterministic quantity and v is a Gaussian
random variable with zero mean and variance
2
v . We need to find the conditional pdf  12 / xxf .
For a given value of 1x we treat the term “ 1x ” as if it is a deterministic quantity and the only
randomness occurs due to v. Since v is Gaussian, then the conditional pdf of 2x is also Gaussian
but with mean value 1x and variance
2
v . Thus we get:
    







 
 2
2
11
212
2
1
exp
2
1
/
vv
axx
xxf

,  1x ,  2x
The concept of a random vector generalizes to any dimension. A vector of N random variables













Nx
x
x
x
...
2
1
is completely described if we know the joint pdf    Nxxxfxf ,...,, 21 . The first order statistics
is the mean m . The second-order statistics of x are its correlation matrix R, and its covariance
matrix, defined by:
 xEm 
 T
xxER 
   T
mxmxE 
4
where the superscript T denotes transpose. The ijth element of the correlation matrix R is the
correlation between the ith random variable ix and the jth random variable jx , that is,
 jiij xxER  . It is easily shown that the covariance and correlation matrices are related by
T
mmR  . When the mean is zero, R and Σ coincide. Both R and Σ are symmetric positive
semi definite matrices.
Example: The probability density of a Gaussian random vector  T
Nxxxx ...21 is
completely specified by its mean m and covariance matrix Σ, that is,
   
   








 
mxmxxf
T
N
1
2/
2
1
exp
det2
1
)(

Example: Under a linear transformation, a Gaussian random vector remains Gaussian i.e. a linear
function of Gaussian is Gaussian. Let  T
Nxxxx ...21 be a Gaussian random vector with
mean xm , and covariance matrix x . The linearly transformed vector xB , B is a nonsingular
N×N matrix, is Gaussian-distributed with mean and covariance given by xmBm  , and
T
xBB . These relations (mean and covariance) are valid also for non-Gaussian random
vectors. They are easily derived as follows:    xEBE  ,     TTT
BxxEBE  .
5
Random Signal Models
A stochastic process is a collection of random variables; a random vector. The index could be
time, space, volume or others. In this work, we focus on time domain index and on stationary
processes. A stationary process is a process where, generally speaking, the statistical properties
are not function of time. Thus, the vector  T
Nxxxx ...21 is a sample in time of a
stochastic process X(n) observed at instants X(1), X(2), …, X(N). Notice that we use capital
letter when dealing with stochastic processes. One of the most useful ways to model a random
signal is to consider it as being the output of a causal and stable linear filter C(z) driven by a
stationary uncorrelated (white-noise) sequence )(n , sometimes we use the notations n ,
where 




0
)(
n
n
n zczC ,
  )()()()( 2
kkRknnE    ,


 

elsewhere,0
0k,1
)(k is the delta function. Thus,


n
i
in icnnCnX
0
)()(*)()( 
where “*” is the convolution operation. The above model is termed moving average (MA)
model. Another common model is the autoregressive moving average (ARMA) model where
C(z) is the ratio of two polynomials in z i.e.
)(
)(
)(
)( z
zA
zB
zX  i.e. )()()()( zzBzXzA 
and in the time domain it has the shape
 
j
j
i
i jnbinXa )()( 
6
Example: )()( 1
10
2
2
1
10
z
zaa
zbzbb
zX 



 , then in the time domain we have
)2()1()()1()( 21010  nbnbnbnXanXa 
When the order of the numerator B(z) is zero ( 0b is nonzero and the rest of parameters are zero),
we have what is termed autoregressive (AR) model i.e. )()1()( 010 nbnXanXa  .
Maximum Likelihood Estimation:
Once the model is chosen, we use the data to find the model parameters. One of the most
commonly used methods is the maximum likelihood method because it yields unbiased estimates
of the parameters. We start with a simple AR(1) model, only one lag in X, and then generalize
the results to more than one lag. Assume that we have a set of N+1 data points, X(0), X(1), X(2),
… X(N). The signal is modeled as an AR(1) process;
)()1()( 1 nnXanX 
For a given value of X(n-1), and assuming )(n is Gaussian with zero mean and variance
2
 , the
conditional pdf  )1(/)( nXnXf becomes:
   
2
2
2
)1()(
2
1
exp
2
1
)1(/)(




naXnX
nXnXf
For the maximum likelihood method, the basic idea is to find the jpdf,
))(),...2(),1(),0(( NXXXXf , of the observations and find the estimates for the
unknowns that maximize this likelihood function.
     ...)0(),1(),2(/)3()0(),1(/)2())0(/)1(()0())(),...2(),1(),0(( XXXXfXXXfXXfXfNXXXXf 
Define    T
NXXXXN )(),...2(),1(),0(X , using Baye’s theory, the joint pdf is:
7
      
    1)0(),...1(/)(
)0(),...1()0(),...1(/)())(),...2(),1(),0((


NfXNXNXf
XNXfXNXNXfNfNXXXXf
X
X
Similarly
        
    2)0(),...2(/)1(
)0(),...2()0(),...2(/)1(1)0(),...1(


NfXNXNXf
XNXfXNXNXfNfXNXf
X
X
And we continue till we get to the initial conditions at X(m), X(m-1),…,X(0). This will yield
       



1
/)1()(),...1(),0())(),...2(),1(),0((
N
mi
iiXfmXXXfNfNXXXXf XX
Where  )(),...1(),0( mXXXf is considered to be initial conditions and could be known or be
neglected for large amount of data i.e. N>>m.
For AR(1) model, X(n) is related only to X(n-1) and we use the Baye’s formula to find a
reasonable expression for the jpdf as follows:
     ...)2(/)3()1(/)2())0(/)1(()0())(),...2(),1(),0(( XXfXXfXXfXfNXXXXf 
   

N
n
nXnXfXf
1
)1(/)()0(
Substituting for the derived expression for  )1(/)( nXnXf , we get:
   



N
n
naXnX
XfNXXXXf
1
2
2
2
)1()(
2
1
exp
2
1
)0())(),...2(),1(),0((


    

 

N
n
N naXnX
Xf
1
2
2
2/2 )1()(
2
1
exp2)0(




Taking the log of both sides we get the log likelihood function  as:
         

N
n
naXnX
NN
XfNXXXXf
1
2
2
2
)1()(
2
1
log
2
2log
2
)0(log))(),...2(),1(),0((log




8
Maximizing this quantity, ignoring  )0(log Xf , with respect to the unknowns, a and
2
 , we get
the desired results. Thus,
 




 N
naa
nXnXanX
a 1
2
ˆ,ˆ
)1()1(ˆ)(
ˆ2
2
0
22
 


which is reduced to :   0)1()1(ˆ)(
1

N
n
nXnXanX
Rearrange we get:






 N
n
N
n
nX
nXnX
a
1
2
1
)1(
)1()(
ˆ
Similarly, for
2
 , we get:
 
 


 N
naa
nXanX
N
1
2
222
ˆ,ˆ
2
)1(ˆ)(
ˆ2
1
ˆ2
0
22
 


Rearrange we get:    

N
n
nXanX
N 1
22
)1(ˆ)(
1
ˆ
We use the same approach to find the parameters of MA(1), “1” means only one lag in  .
Assume that the signal is modeled as:
)1()()( 1  nbnnX 
where )(n are independent identically distributed random variables with zero mean and variance
2

.
We also assume that 0)0(  . Since X(n) is a sum of two independent zero mean
Gaussian random variables, then it is also Gaussian with zero mean and variance equals the sum
of the two variances. Thus, )(nX ~   22
1
2
,0   bN  , where ~ means the distribution, N stands
for normal distribution.
9
We need to find the joint pdf ))(),...2(),1(),0(( NXXXXf . In many situations, the pdf of X(0)
will be ignored. Using Baye’s theory, the joint pdf is:
     ...)0(),1(),2(/)3()0(),1(/)2())0(/)1(()0())(),...2(),1(),0(( XXXXfXXXfXXfXfNXXXXf 
We need to find each element of the joint pdf. Let us start with  )0(/)1( XXf
n=1: )1()0()1()1( 1   bX , since )0( is zero by assumption
Thus,   2
2
2 2
)1(
exp
2
1
))1(()0(/)1(


X
XfXXf 
n=2: )1()2()2( 1 bX  , )1()2()1()2()2( 11 XbXbX  
Rearrange we get: )2()1()2( 1  XbX
For a given value of X(1) and X(0),
     
2
2
1
22
2
1
2 2
)1()2(
exp
2
1
2
)1()2(
exp
2
1
)0(),1(/)2(




XbXbX
XXXf




n=3:  )1()2()3()2()3()3( 111 XbXbbX   ,
i.e.   )1()2()3()1()2()3()2()3()3( 2
11111 XbXbXXbXbXbX  
For a given value of X(2), X(1) and X(0), and we also know for certainty )2( because
)1()2()2( 1 XbX  , we get:
    
2
2
11
2 2
)1()2()3(
exp
2
1
)0(),1(),2(/)3(


XbXbX
XXXXf


n=4:  )1()2()3()4()3()4()4( 2
1111 XbXbXbbX   ,
i.e.    )1()2()3()4()1()2()3()4()3()4()4( 3
1
2
111111 XbXbXbXXbXbXbXbX  
For a given value of X(3), X(2), X(1), and X(0)
10
    
2
22
111
2 2
)1()2()3()4(
exp
2
1
)0(),1(),2(),3(/)4(


XbXbXbX
XXXXXf


and in general:
   
2
2
1
2 2
)1()(
exp
2
1
)0(),...2(),1(/)(






nbnX
XnXnXnXf
where the expression for )1( n will be more complicated. We can continue in this way and find
the joint pdf. Instead we use the independence of )(n to find the joint pdf of the observations as
function of the joint pdf of )(n as follows:





































)(
...
)2(
)1(
1...0
...01
0...01
)(
)2(
)1(
1
1
Nb
b
NX
X
X



i.e. BX 
The jpdf of  is given as:
   









 


 

1
2/
2
1
exp
det2
1
)(
T
N
f
where  T
N)(...)2()1(   ,  T
NXXXX )(...)2()1( and I2
  .
Since BX  i.e. a linear function of a Gaussian vector, then it is also Gaussian and
    0 BEXE ,
T
X BB 
Thus,
   









 
XXXf X
T
X
N
1
2/
2
1
exp
det2
1
)(

11
where
T
X BB  . Maximizing the jpdf )(Xf with respect to the unknowns 1b and
2
 we
get the desired maximum likelihood estimates. The process could be repeated for higher order
MA models.
We now develop the maximum likelihood method for ARMA(1,1). Assume that the data is
modeled as:
)()1()1()( 11 nnbnXanX  
Define )1()()( 1  nXanXny , and assume that X(0)=0,
then







































)(
...
)2(
)1(
1...0
...01
0...01
)(
)2(
)1(
1
1
NX
X
X
a
a
Ny
y
y
i.e. XAy  or yAX 1

If we are able to get the joint pdf of y(n) we could get the joint pdf of X(n) as we shall see.
As before, it is assumed that 0)0(  and X(0) = 0.
n=1: )1()1()0()0()1()1( 11   bXaXy
n=2: )2()1()1()2()2( 11   bXaXy
and in general we get:





































)(
...
)2(
)1(
1...0
...01
0...01
)(
)2(
)1(
1
1
Nb
b
Ny
y
y



i.e. By 
Thus, BAyAX 11 
 is a zero mean Gaussian vector.
and
   









 
XXXf X
T
X
N
1
2/
2
1
exp
det2
1
)(

12
where    T
X BABA 11 
  . Maximizing the jpdf )(Xf with respect to the unknowns 1a , 1b
and
2
 we get the desired maximum likelihood estimates. The process could be repeated for
higher order ARMA models.
Matrix inversion:
In general, matrix inversion is time consuming and not easy to find. In some special cases, as we
have, inversion of a matrix is straightforward.
Example: consider the upper triangular matrix
















1000
100
010
001
1
2
3
a
a
a
A , its inverse is













1000
100
10
1
1
122
123233
1
a
aaa
aaaaaa
A and  













1
01
001
0001
112123
223
31
aaaaaa
aaa
a
A
T
Since
















100
010
001
0001
1
2
3
a
a
a
AT is lower triangular matrix, and since we know that
   TT
AA 11 
 , and   IAA TT

1
i.e.  




























1000
0100
0010
0001
100
010
001
0001
1
2
31
a
a
a
AT
then  1T
A is also lower triangular matrix=












1
01
001
0001
434241
3231
21
bbb
bb
b
.
By inspection, second row times first column yields: 0321  ab . Then 321 ab 
Third row times second column yields: 0232  ab . Then 232 ab 
13
Fourth row times third column yields: 0143  ab . Then 143 ab 
Third row times first column yields: 0323133231  aababb . Then 3231 aab 
Fourth row times second column yields: 0214224342  aababb . Then 2142 aab 
Fourth row times first column yields: 03214134241  aaababb . Then 32141 aaab 
This gives the complete inverse.
ARCH and GARCH Models:
In some situations, the random quantity is not independent from the observations and depends on
the data itself. This happens when the variance of the error term is not constant. Consider the
AR(p) model with ARCH(m) disturbance given as:
)()()(...)2()1()( 21 nnhpnXanXanXanX p 
and )(...)2()1()( 22
2
2
10 MnXcnXcnXccnh M 
How to Choose a Model; the Akaike information criterion (AIC):
Assuming a stationary signal, one is usually confronted with question of which is the best model
for the data. The Akaike information criterion (AIC) was developed for this purpose. Let M be
the number of estimated parameters of the model. Let  be the maximum value of the log
likelihood function (log of joint pdf of observations after maximization), then the AIC is defined
as:
22  MAIC
Given a set of data, we try several models and we select the model that minimizes the AIC.
14
Hypothesis Testing:
In its simplest form, we have two sources of a signal and we receive only one noisy version. The
basic component of a simple decision theory problem is to choose between the two sources based
on the observations we receive. If we decide on the null hypothesis 0H this means that the source
is the first signal. If we decide on the alternative hypothesis 1H this means that the source is the
second signal. For example, we could receive an EKG signal and we need to decide whether the
patient is normal 0H or sick 1H . In EEG we need to identify the presence of Epileptic danger 1H
or the patient is normal 0H … etc.
Consider a set of N observations  T
NXXXX )(...)2()1( and we need, based on these
observations, to decide which hypothesis is true 0H or 1H . To do that we find the joint pdf of the
observations under both hypothesis i.e. we need )/( 0HXf and )/( 1HXf . We then
develop the likelihood function )(X defined as
)/(
)/(
)(
0
1
HXf
HXf
X  . Notice that )(X is a
random quantity but scalar. The likelihood ratio test is to decide 1H if  )(X where  is a
threshold value, and to decide 0H otherwise. If both hypothesis are equally likely,  is set to 1.
Example: We receive a set of N noisy measurements that are independent and identically
distributed Gaussian random variables with known mean m under the hypothesis 1H , and zero
mean under the hypothesis 0H . This is stated as follows:
15
)()(:1 inmiXH  , i=1, 2, …, N
)()(:0 iniXH  , i=1, 2, …, N
Where )(in is Gaussian white noise with zero mean and 2
 variance i.e.
  





 2
2
2 2
)(
exp
2
1
)(

in
inf
Thus, 





 2
2
20
2
)(
exp
2
1
)/)((

iX
HiXf
and
 







 
 2
2
21
2
)(
exp
2
1
)/)((

miX
HiXf
Since the observations are independent we get:








N
i
iX
HXf
1
2
2
20
2
)(
exp
2
1
)/(

and
 








 

N
i
miX
HXf
1
2
2
21
2
)(
exp
2
1
)/(

The likelihood ratio is:
 


















 

 N
i
N
i
iX
miX
HXf
HXf
X
1
2
2
2
1
2
2
2
0
1
2
)(
exp
2
1
2
)(
exp
2
1
)/(
)/(
)(


After some manipulations and taking the natural log we get:
16
2
2
1
2
2
)()(ln

Nm
iX
m
X
N
i
 
When a set of new data arrives, we substitute their values in the expression of )(ln X . If
)(ln X > ln we decide 1H else we decide 0H . End of example.
Composite Hypothesis or the Generalized Likelihood Test:
In some situations, some parameters are unknown and we still need to make a decision about the
source of the signal. Specifically assume that, under 0H , the vector of unknown parameters is 0
and under 1H , the vector of unknown parameters is 1 . In this case we have a set a training data
from which we estimate the unknown parameters under both hypothesis. When a new data
arrives and we need to decide which hypothesis is true, we simply substitute the values of the
new data in the generalized likelihood ratio )(Xg . In summary:
),/(
),/(
)(
00
11
max
max
0
1
HXf
HXf
Xg





The likelihood ratio test is to decide 1H if  )(Xg where  is a threshold value, and to
decide 0H otherwise. If both hypothesis are equally likely,  is set to 1.
17
Example: Here the mean, under 1H , is unknown. Under 0H the mean is zero. When the data
arrives, we estimate the mean, m, as: 

N
i
iX
N
m
1
)(
1
ˆ . The generalized likelihood ratio becomes



































 N
i
N
i
N
i
m
g
iX
iX
N
iX
HXf
HmXf
X
1
2
2
2
1
2
2
1
2
0
1
2
)(
exp
2
1
2
)(
1
)(
exp
2
1
)/(
),/(
)(
max


The likelihood ratio test is to decide 1H if  )(Xg where  is a threshold value, and to
decide 0H otherwise.
Hypothesis Testing for Stochastic Processes:
In some applications, as in brain computer interface (BCI), we receive a signal X(t) and we need
to decide whether the signal represents a forward or a backward command or others. Usually we
have a training set of data for each hypothesis. If the signal is stationary, one option is to expand
the training signals using Karhunen-Loeve expansion that converts the signals into a set of
random variables (see the section on Karhunen-Loeve expansion). When a new set of data
arrives and we need to decide to which hypothesis it belongs. We substitute their values in the
likelihood ratio formula and decide to which group the new set of data belongs.
In this analysis, we assume that we have zero mean stochastic process. A stochastic process,
under hypothesis jH , )(tX j
is expanded in terms of orthonormal basis )(tj
i as:
18
 TtttX
i
j
i
j
i
j
,0,)()(
1
 



where 
T
jj
i
j
i dttXt
0
)()(
ik
T
j
k
j
i dttt  0
)()(
The basis )(tj
i are chosen such that the random coefficients j
i are uncorrelated; Viz:
  ik
j
i
j
k
j
iE  
For the process )(tX j
, define the covariance function ),( utK j
as:
 )()(),( uXtXEutK jjj

The covariance function, for uncorrelated j
i , satisfies the Fredholm integral equation of the
second kind:
0),()(),(
0

j
i
j
i
j
i
T
j
i
j
tduuutK 
For a stationary process,   )()()(),( utKuXtXEutK jjjj
 , and it is much easier to find the
orthogonal basis.
Assuming that j
i is a normal random variable and we know that it has zero mean and variance
j
i , the pdf is given as:
19
 
 
 
  







 2
2
2
2
exp
2
1
j
i
j
i
j
i
j
if




Define  Tj
N
jj
 ,...,1 ,the joint pdf of the observations, under jH , is given as:
 
 
 









N
i
j
i
j
i
j
i
j
j
Hf
1
2
2
2
2
exp
2
1
)/(




When a new set of data, X(t), arrives and we need to know to which hypothesis it belongs, we
simply find the different values 
T
j
i
j
i dttXt
0
)()( for the different hypothesis jH . We then
calculate the maximum of the joint pdf
 
 
 









N
i
j
i
j
i
j
i
j
j
j
Hf
1
2
2
2
2
exp
2
1
)/(max 


 .
The maximum of the joint pdfs is where the hypothesis is true.
20
Least Square Estimates
Suppose that we have two random variables X and Y that are related to each other with joint pdf
     yfyxfyxfYXf YYXYX /,),( /,  . When Y=y the random variable X=x, where y and x are
deterministic values. Our estimate of X, xˆ , is some nonlinear function of y; h(y) i.e. )(ˆ yhx  ,
and the error in the estimate is )(ˆ yhxxx  . The mean square error (m.s.e) is thus given as:
           dxyhxyxfdyyfdxdyyhxyxfesm YXYYX
2
/
2
, )(/)(,..
By conditioning on Y, the term  2
)(yhx  becomes deterministic in y, but x is still random, and
we would be able to use the Riemann calculus to find the minimum w.r.t h(y). If we have
conditioned on X, we would not be able to minimize w.r.t. h(y) using Riemann calculus and we
need to use another calculus that deals with random quantities.
We need to find an estimate for h(y) that minimizes the m.s.e. This is done by looking for
    dxyhxyxf YX
yh
2
/
)(
)(/min w.r.t. h(y). The result is:
    0)(/
)(
..
)(
2
/ 





 dxyhxyxf
yh
esm
yh
YX
Which yields        0)(/2/2)(/2 ///   dxyhyxfdxyxxfdxyhxyxf YXYXYX
And since     )(/)()(/ // yhdxyxfyhdxyhyxf YXYX  
then      YXEdxyxxfyh YX //)( /
If we have two random variables 1Y and 2Y and we need to find an estimate of X based on
observations of these two random variables i.e. ),(ˆ 21 yyhx  . As before, by minimizing the m.s.e we
getan estimate of the function ),( 21 yyh as:
21
     2121,/21 ,/,/),( 21
YYXEdxyyxxfyyh YYX
For n observed values of the n random variables 1Y , 2Y , …, nY we have:
     nnYYYXn YYYXEdxyyyxxfyyyh n
,...,,/,...,,/),...,( 2121,...,,/21 21
For a stochastic process  Y and sampling at intervals, then on the limit:
    n
n
YYYXfYXf ,...,,/lim/ 21


 )(ˆ yhx  . As before, by minimizing the m.s.e we get an estimate of the function  )( yh as:
             YXEdxyxxfyh YX //)( /
Linear Least Squares Estimates [Kailath; 2000]
In many useful applications, obtaining the conditional expectation is very difficult. Thus, one has
to resort to some other suboptimal approaches for the estimation. A common approach is the
linear least square estimate where the estimated value Xˆ of a random variable X is linearly
related to another random variable Y. Specifically;
ghYX ˆ
Where h and g are unknown but deterministic values and chosen to minimize the mean square
error m.s.e given by:
             XYhEYhgEXgEgYEhXEghYXEesm 222.. 22222

Minimizing the m.s.e. w.r.t g and h yields:
    YX mhmYEhXEg ˆˆˆ 
        YmgXYEYEgXYEYEh ˆˆˆ 2

22
The above two equations have the matrix format:
   

















XYE
m
h
g
YEm
m X
Y
Y
ˆ
ˆ1
2
Since       YXXYXY mmXYEmXmYE  , then   YXXY mmXYE   .
Similarly since      2222
YYY mYEmYE  ,then   222
YY mYE  
Substitutingwe get:
  


















 YXXY
X
YYY
Y
mm
m
h
g
mm
m
 ˆ
ˆ1
22
Invertingthe matrix we get:
 





















YXXY
X
Y
YYY
Y mm
m
m
mm
h
g


 1
1
ˆ
ˆ 22
2
Solving we get:
   2
2
/
1ˆ
YXYYXXYXY
Y
mmmmh 








and
           22
2
22
2
/
11
ˆ YXYYXXYYXY
Y
YXXYYXYY
Y
mmmmmmmmmg 
















Thus,   YYXYX mYmX  2
/ˆ 
The corresponding m.s.e. is:
 222
/... YXesm  
Under general conditions, we could interchange expectations and derivations. Thus, we could
use the derivative operator inside the expectations as follows:
23
      ghYXEghYX
g
E
g
esm













20
.. 2
Which yields       YhEgghYEXE 
i.e.     YX mhmYEhXEg ˆˆˆ 
Similarly       ghYXYEghYX
h
E
h
esm













20
.. 2
Which yields         2
YhEYgEghYYEYXE 
i.e.         YmgXYEYEgXYEYEh ˆˆˆ 2

For a vector of observations  T
nYYYY ,..., 21 , and a vector of coefficients  T
nhhhh ,..., 21 , a
scalar X is estimated as:
gYhX
T
ˆ
The least m.s.e. is derived as before as:
 YYYYXX mYRRmX  1ˆ
Where    T
YXYX mYmXER  ,    T
YYYY mYmYER 
The corresponding m.s.e. is given as:
XYYYYXXX RRRResm 1
... 

Assume that we have an observation period  ba, where we measure a scalar stochastic process
Y(t). We need to find linear least square estimate Xˆ of the random variable X based on this
observation. In this analysis we shall assume zero mean value for all random variables involved.
As before,

b
a
dYhX  )()(ˆ
24
The filter h(t) is obtained as before as:

b
a
YYXY dtRhtR  ),()()(  bat ,
Where  )()(),(  YtYEtRYY  , and assuming zero mean for Y(t).
The m.s.e. is now given as:

b
a
b
a
YYX dvdvtRvhhesm  ),()()(... 2
Geometric Interpretation of Random Variables:
We shall assume that all the random variables we are dealing with are zero mean. This will only
facilitate the analysis. The results, with minor changes, will be valid for non zero mean random
variables.
We could think of a random variable X as a vector in some abstract space with inner product
defined as:
 XYEYX ,
For stochastic processes X(t) and Y(t), defined on the interval  ba, , the inner product is defined
as:
 
b
a
YXEYX )()(, 
Thus, for 
i
iiYhXˆ we need to find an estimate of ih such that the error  XX ˆ is the
orthogonal to the observation space made from the observations iY . i.e.
j
i
ii YYhX 





  , where  means orthogonal.
In terms of the inner product
25
j
i
ii YYhX 





  means 0, 


















  j
i
iij
i
ii YYhXEYYhX j
i.e. 
i
jiij YYhYX ,,
If the observations are a stochastic process Y(t), then the estimate of X is obtained as:

b
a
dYhX  )()(ˆ
Again, the error is orthogonal to the observations. Thus,
  )(ˆ tYXX 
i.e. 0)(,)()( 







  tYdYhX
b
a


b
a
dYhtYtYX  )()(),()(,
Which means     














b
a
b
a
b
a
dtYYEhdtYYhEdYhtYEtXYE  )()()()()()()()()()(
In terms of correlation, the above expression is:

b
a
YYXY dtRhtR  ),()()(
This is exactly the same results obtained before.
The Multivariate Case:
For the estimation of a vector of random variables  T
nXXXX ,...,, 21 based on the
observations of m stochastic processes         T
m tYtYtYtY ,...,, 21 , we follow the same route as
before but now )(H is a matrix that satisfies the equation:
26

b
a
YYYX dtRHtR  ),()()(
And the linear least square estimate Xˆ satisfies the equation:

b
a
dYHX  )()(ˆ
Gram-Schmidt Orthogonalization:
Assume that we have a set of random variables 1Y , … MY that could be correlated. We need to
find the orthogonal basis of the space spanned by these random variables. We call these basis 1 ,
… M , and they are found from the observations 1Y , … MY . The basic idea is to select 1 = 1Y .
We then take 2Y and decompose it into two components 2 plus an error term such that 2 is
orthogonal to 1 i.e.   0, 2121   E . We then move to 3Y and decompose it into 3 plus two
terms such that   0, 3131   E , and   0, 3232   E . We repeat this process M times
till we get the total new M basis that are orthogonal to each other. Specifically
21212  YhY , 1211/22
ˆˆ YhYY  i.e. 121212122  hYYhY 
Where 1/2
ˆY is the estimate of 2Y given the observation 1Y and 21h is unknown to be estimated in
what to follow. Since the error is orthogonal to the observations, then
  112122 YYhY  i.e.      0112122  YYhYEE 
Using linear least square estimation and 11 Y we get:
         12
112
2
11221 /

  EYEYEYYEh
Thus,      1
12
11222 

 EYEY
27
For 3Y we find an estimate 2/33
ˆˆ YY  as a linear combination of 1 and 2 Viz;
32321313   hhY , 2321313
ˆ  hhY  i.e. 23213133  hhY 
Since the error is orthogonal to the observations, then
  2123213133 ,  hhY i.e.    012321313   hhYE , and
   022321313   hhYE
Using linear least square estimation we get:
         12
113
2
11331 /

  EYEYEYYEh
and          12
223
2
22332 /

  EYEEYEh
Thus,           2
12
2231
12
11333 

 EYEEYEY
In general we get:
     MnEYEY
n
i
iiinnn  



2,
1
1
12

Notice that     



 
1
1
12
1/
ˆˆ
n
i
iiinnnn EYEYY 
i.e. nnnn YY  1/
ˆ
In matrix notations, one could represent the observations in terms of the orthogonal basis as:





































MMMM hh
hh
h
Y
Y
Y



...
0......
0...1
0...01
0......01
...
2
1
21
3231
212
1
Define the vectors  T
nn YYYY ,...,, 21 and  T
nn  ,...,, 21 , then
nnn HY  , and nnn YH 1

28
        1
1
111
1
1
12
1/
ˆˆ 





   n
T
nn
T
nn
n
i
iiinnnn YEEYEYY 
Substituting 1
1
11 

  nnn YH , we get:
      1
1
1
1
1
111
1
1
1
111/
ˆ 








  nn
T
n
T
nnn
T
n
T
nnnn YHHYYHHYYEY
          1
1
1
11
1
1
11
1
1
1
1
11 










 nnn
T
nn
T
n
T
n
T
nn YHHYYHHYYE
   1
1
111 

 n
T
nn
T
nn YYYYYE
And    1
1
1111/
ˆ 

  n
T
nn
T
nnnnnnn YYYYYEYYY
Thus, we were able to get the orthogonal basis, n , in terms of the observations
 T
nn YYYY ,...,, 21 .
Discrete Time Recursive Estimation:
We have a random variable X that we need to estimate based on observations 0Y , 1Y , … In this
approach we, recursively, estimate X and update the estimate as new data comes in. Assume that
we have the linear least square estimate 1/ˆ,..,/ˆ
110 

 kXYYYX k
based on observations 0Y , 1Y ,
… 1kY . We now have a new observation kY and we need to update our estimate and find kX /ˆ
in terms of 1/ˆ kX . In the updating of the estimate we only use new information in the new
data. This new information is called innovation and is obtained as follows:
Define the innovation 1/
ˆ
 kkkk YY , with 01/000
ˆ YYY   , where 1/
ˆ
kkY is the estimate of
kY given the previous observations 0Y , 1Y , … 1kY . Clearly k is uncorrelated with the
29
previous random variables 0Y , 1Y , … 1kY and k is regarded as the new information in the
random variable kY . This suggests that:
1/ˆ/ˆ  kXkX + linear least square estimate of X given only k .
As before, linear least square estimate of X given only      kkkkk EXE  1
 . Collecting
terms we get:
     kkkk EXEkXkX  1
1/ˆ/ˆ 

We could also derive the same results starting from the innovation sequence 0 , 1 , … k
which are uncorrelated with each other and form the basis for the space spanned by 0Y , 1Y , …
kY .
Thus, kX /ˆ linear least square estimate of X given 0 , 1 , … k .
    


k
i
iiii EXE
0
1

          kkkk
k
i
iiii EXEEXE  1
1
0
1 



 
     kkkk EXEkX  1
1/ˆ 

This result was obtained before.
Instead of estimating just a single random variable X, we need to estimate the values of the
stochastic process lX given the observations of the stochastic process 0Y , 1Y , … kY . We use the
above equation to get:
    


k
i
iiiill EXEkX
0
1
/ˆ 
30
          kkkkl
k
i
iiiil EXEEXE  1
1
0
1 



 
Thus we are interested in klX /
ˆ linear least square estimate of X given 0 , 1 , … k .
     kkkklkl EXEX  1
1/
ˆ 
 
Notice that l could be greater than, less than or equal to the value k. We need to find the
relation  klXE  and this could come from the observation equation or others. For example, it is
not uncommon to have the observation equation: kkkk vXHY 
Where kv is additive white Gaussian noise, and kH is a matrix with proper dimensions. In order
to find  klXE  , we use
  kkkkkkkkkkkkkkk vXXHvXHXHYY   1/1/1/
ˆˆˆ
and thus         T
kkkkkkkkkkkk vXXHvXXHEE   1/1/
ˆˆ
     kk
T
k
T
kkkkkkk vvEHXXXXEH   1/1/
ˆˆ
Also        T
kkkkkkkk
T
kkkk XXvXXHEXXE 1/1/1/
ˆˆˆ
 
  T
kkkkkkk XXvEPH 1/1/
ˆ
 
01/  kkk PH
where the error variance    T
kkkkkkkk XXXXEP 1/1/1/
ˆˆ
 
Notice that since 0ˆ
1/0 X , then        TT
XXEXXXXEP 001/001/001/0
ˆˆ  
For l =k, we have:
    


k
i
iiiikkk EXEX
0
1
/
ˆ  , and     



 
1
0
1
1/
ˆ
k
i
iiiikkk EXEX 
31
     T
kkkkkkkk vXXHXEXE  1/
ˆ
Substitute 1/1/
ˆˆ
  kkkkkk XXXX , we get:
       T
kkkkkkkkkkkk vXXHXXXEXE   1/1/1/
ˆˆˆ
        T
kkkkkkk
T
kkkkkkkk vXXHXvXXHXXE   1/1/1/1/
ˆˆˆˆ
   T
kkkkkkk
T
kkk vXXHXEHP   1/1/1/
ˆˆ
Notice that 1/
ˆ
kkX is, by assumption, independent of kv and, by derivation, is orthogonal to the
error  1/
ˆ
 kkk XX i.e.     000ˆˆ
1/1/  
T
kkkkkkk vXXHXE .
Thus,   k
T
kkkkk KHPXE  1/
We now need to find a recursive relation for the error covariance 1/ kkP .
We know that:
     kkkkkkkkk EXEXX  1
1//
ˆˆ 
     kkkkkk EKX  1
1/
ˆ 
  , 0ˆ
1/0 X
    1/
1
1/1/1/
ˆˆ


  kkkkkk
T
kkkk
T
kkkkk XHYvvEHPHHPX
The above is the updating equation of the estimate when new data, kY , arrives. We need an
updating equation for the error covariance, kkP / , when new data arrives.
We know that:
   T
kkkk
T
kkkk KPHXXE   1/1/
ˆ
    kk
T
kkkkkk vvEHPHE  1/
Since    T
kkkkkkkk XXXXEP ///
ˆˆ 
We substitute    kkkkkkkk EKXX  1
1//
ˆˆ 
  and we get:
32
           T
kkkkkkkkkkkkkkkk EKXXEKXXEP  1
1/
1
1//
ˆˆ 


 
               
     T
kkkkkkk
T
kkkkkkkk
T
kkkkkkkkk
XXEKE
EKEKEEKXXEP
1/
1
111
1/1/
ˆ
ˆ








         T
kkkk
T
kkkk
T
kkkkkk KEKKEKKEKP
111
1/

  
   T
kkkkkk KEKP
1
1/

  
Substitute     kk
T
kkkkkk vvEHPHE  1/ we get:
   T
kkk
T
kkkkkkkkk KvvEHPHKPP
1
1/1//

  ,
T
kkkk HPK 1/  ,    T
XXEP 001/0 
This is the desired result. Thus, we were able to find the updated estimate kkX /
ˆ and its updated
covariance kkP /
. We now summarize the estimation steps:
(1) kkkk vXHY  (linear observation model)
(2) 1/
ˆ
 kkkk YY
(3)    1
1
1111/
ˆ


  k
T
kk
T
kkkkkkk YYYYYEYYY ,  T
nn YYYY ,...,, 21
(4)     


k
i
iiiilkl EXEX
0
1
/
ˆ 
(5)     



 
1
0
1
1/
ˆ
k
i
iiiikkk EXEX 
(6)     

 
k
i
iiiikkk EXEX
0
1
1/1
ˆ  ?? we need a recursive equation.
(7)    T
kkkkkkkk XXXXEP 1/1/1/
ˆˆ
  ?? we need a recursive equation.
33
(8)    T
kkk
T
kkkkkkkkk KvvEHPHKPP
1
1/1//

  ,
T
kkkk HPK 1/  ,
   T
XXEP 001/0 
(9)     1/
1
1/1/1//
ˆˆˆ


  kkkkkk
T
kkkk
T
kkkkkkk XHYvvEHPHHPXX , 0ˆ
1/0 X
Thus, for every step, we first find 1/
ˆ
kkX and its covariance 1/ kkP . We then update the estimates
to get kkX /
ˆ and its covariance kkP /
. To have a recursive relation, we need to find a predictive
equation, kkX /1
ˆ
 , for kX and its covariance kkP /1
. This could be obtained if we know the system
dynamics.
Assume that the system dynamics is linear with the equation:
kkkkk wXX 1
where kw is zero mean white Gaussian noise independent of all other noises and has covariance
kQ . kX is zero mean with covariance  kkk XXE which has the recursive relation:
T
kkk
T
kkkk Q  1
The linear least square estimate given observations uptil time k kkX /1
ˆ
 , is thus,
kkkkkkkkkkk XwXX ////1
ˆˆˆˆ 
Substituting for the equation of kkX /
ˆ , we get:
    1/
1
1/1/1//1
ˆˆˆ


  kkkkkk
T
kkkk
T
kkkkkkkkk XHYvvEHPHHPXX
Remember that  1/
ˆ
 kkkkk XHY which has zero mean and covariance
  kk
T
kkkk vvEHPH 1/ . Since all the terms involved are Gaussian with zero mean and 1/
ˆ
kkX is
independent of k , then the covariance  kkkkkk XXE /1/1/1
ˆˆ
  of kkX /1
ˆ
 is obtained as:
34
   T
k
T
kkkkk
T
kkkk
T
kkkk
T
kkkkkk PHvvEHPHHP  

 1/
1
1/1/1//1 , 01/0  
Notice that the covariance of 1kX , 1k , is the sum of the covariance of kkX /1
ˆ
 , kk /1 , and the
covariance of the estimated error kkP /1 . This is because the error in the estimate is orthogonal to
the observations or the estimate. Thus, we have kkkkkP /11/1   .
Using the fact that kw is independent of kX and older values we get:
       T
kkkkkkkkkkkkkk
T
kkkkkkkk XwXXwXEXXXXEP ///11/11/1
ˆˆˆˆ  
      T
kkkkkkkkkkkk wXXwXXE  //
ˆˆ
            T
k
T
kkkkk
T
k
T
kkkkk
T
k
T
kkk
T
k
T
kkkkkkk XXwEwXXEwwEXXXXE  ////
ˆˆˆˆ
And since         0ˆˆ
// 
T
kkkk
T
kkkk XXEwEXXwE because kw is independent of kX , by
assumption, and independent of ,..., 1kk YY , by assumption, then we get:
T
kkk
T
kkkkkk QPP  //1 , 01/0 P
This is the covariance of the predicted estimate kkX /1
ˆ
 .
We now summarize our finding so far:
(1) kkkk vXHY  (linear observation model)
(2) kkkkk wXX 1
(linear system dynamics model)
(3) 1/
ˆ
 kkkk YY
(4)    1
1
1111/
ˆ


  k
T
kk
T
kkkkkkk YYYYYEYYY ,  T
nn YYYY ,...,, 21
(5)     


k
i
iiiilkl EXEX
0
1
/
ˆ 
35
(6)     



 
1
0
1
1/
ˆ
k
i
iiiikkk EXEX 
(7)     

 
k
i
iiiikkk EXEX
0
1
1/1
ˆ 
kkkkk XX //1
ˆˆ  or 1/11/
ˆˆ
  kkkkk XX , this is prediction step for states
(8)    T
kkkkkkkk XXXXEP 1/1/1/
ˆˆ
 
  T
k
T
kkk
T
kkkkkk wwEPP  //1 , or   T
k
T
kkk
T
kkkkkk wwEPP   1/11/
01/0 P , this is prediction step for the covariance of the state estimate
(9)    T
kkk
T
kkkkkkkkk KvvEHPHKPP
1
1/1//

  ,
T
kkkk HPK 1/  ,
   T
XXEP 001/0  , update of the covariance matrix of the states estimate
(10)     1/
1
1/1/1//
ˆˆˆ


  kkkkkk
T
kkkk
T
kkkkkkk XHYvvEHPHHPXX , update of states
estimate
So far we were able to get an estimate of the stochastic process X(k)= kX , kkX /
ˆ , and its
covariance matrix kkP /
based on a train of observations of Y(0), Y(1), …Y(k). The above
procedures were developed by R. Kalman. We now summarize the estimation steps or the
Kalman filter equations in a vector format:
(1) kkkkk wXX 1
(linear system dynamics model), kw ~  kQN ,0 , white Gaussian
noise
(2) kkkk vXHY  (linear observation model), kv ~  kRN ,0 , white Gaussian noise
(3) Prediction step: 1/11/
ˆˆ
  kkkkk XX ,
T
kk
T
kkkkkk QPP   k1/11/
(4) Updating step: 1/
ˆ
 kkkk YY
36
T
kkkkkk HPHRS 1/ 
1
1/

 k
T
kkkk SHPK
 1/
1
1//
ˆˆˆ


  kkkkkkkkk XHYKXX
    T
kkk
T
kkkkkkkk KRKHKIPHKIP  1//
37
Karhunen-Loeve Expansion
Karhunen-Loeve Expansion; the Scalar Case [Van Trees; 1968]:
A stochastic process )(tX is expanded in terms of orthonormal basis )(ti as:
 TtttX
i
ii ,0,)()(
1
 



where 
T
ii dttXt
0
)()(
ij
T
ji dttt  0
)()(
The basis )(ti are chosen such that the random coefficients i are uncorrelated; Viz:
If   ii mE 
then     ijijjii mmE  
For the process X(t) with mean m(t), define the covariance function ),( utK as:
   )()()()(),( umuXtmtXEutK 
The covariance function, for uncorrelated i , satisfies the Fredholm integral equation of the
second kind:
0),()(),(
0
 iii
T
i tduuutK 
and could be expanded in terms of the orthonormal basis as:
 TutututK
i
iii ,0,,)()(),(
1
 



38
It is this equation that we use to find the orthonormal basis.
Proof of the Fredholm integral equation of the second kind:
Assuming zero mean stochastic process and thus zero mean random variables i , we use the
equation:
    ijijjii mmE  
Now  






 
T
j
T
iijiji duuXudttXtEE
00
)()()()( 
Exchanging expectation and integration we get:
   
T
j
T
iijiji duuuXtXEdttE
00
)()()()( 

T
j
T
i duuutKdtt
00
)(),()( 
A necessary and sufficient condition is that in the right hand side of the equation:
)()(),(
0
tduuutK jj
T
j  
In this case we get:
ijj
T
ijj
T
j
T
i dtttduuutKdtt    000
)()()(),()(
which is the left hand side of the equation.
Example: Wiener process. The Wiener process is a zero mean process with covariance






utt
tuu
tuutK
,
,
),min(),( 2
2
2



Thus,  
T
t
i
t
i
T
iii duutduuuduuutKt )()()(),()( 2
0
2
0

39
Taking the derivative w.r.t. “t” of both sides we get:

T
t
i
i
i duu
dt
td
)(
)( 2



Taking the derivative one more time we get:
)(
)( 2
2
2
t
dt
td
i
i
i 

 
This yields the solution:
2
2
22
2
1











n
T
n












 t
T
n
T
tn


2
1
sin
2
)(
Example: Stationary process; Assume that the stochastic process X(t) is zero mean and stationary
with correlation     
 
 PetXtXER )()( and spectrum
  222
2
2)(
)(




w
P
wD
wN
wS . Thus,
)(),( utRutK  . The Fredholm integral equation becomes:
   






T
t
i
tu
t
T
i
ut
T
T
i
T
T
iii duuePduuePduuutRduuutKt )()()()()(),()(  
Differentiating w.r.t “t” we get:





T
t
i
ut
t
T
i
uti
i duueePduueeP
dt
td
)()(
)(


 
Differentiating again w.r.t “t” we get:
40
  )(2)(2)()(2)(
)( 222
2
2
tPtPttPduueP
dt
td
iiiiii
T
T
i
t-ui
i 

 
 

which has a solution:
tjbtjb
i
ii
ecect 
 21)( ,
 
i
i
i
P
b

 /22
2 
 , 22
2
i
i
b
P





After some manipulations we end up with the expressions:

























evenis,sin
2
2sin
1
1
oddis,cos
2
2sin
1
1
)(
2/1
2/1
2/1
2/1
itb
Tb
Tb
T
itb
Tb
Tb
T
t
i
i
i
i
i
i
i
Karhunen-Loeve expansion for Stationary process:
For the spectrum given by
 2
2
)(
)(
wD
wN
wS  where the numerator order is q and the denominator
order is p and q<p, and assuming that the data is available for long observation time   ,t ,
the Fredholm integral equation has the form:
  


,),()()()()( tttRduuutRt iiii 
where “ ” is a the convolution operator.
Using the Fourier transform, one could obtain a solution to the above equation as:
 2
2
)(
)()()()(
wD
wN
wwwSw iiii  
41
or    )()(0 22
wwNwD ii  
For each i , there are p homogeneous solutions corresponding to the roots of   )( 22
wNwDi  .
We denote these solutions as plt ihl ,...,1),,(  .
Thus, 

p
l
ihlli tct
1
),()( 
Substitute in the Fredholm equation, to find lc and i , we get:
     



 
,,),()(),()(),(
111
tduucutRduucutRtc
p
l
ihll
p
l
ihll
p
l
ihlli 
or   


,,,...,1,),()(),( tplduuutRt ihlihli 
Example: In the above we used
  222
2
2)(
)(




w
P
wD
wN
wS , we need to solve
      )(2)()(0 2222
wPwwwNwD iiii   . The roots are located at
  0222
 Pwi  i.e.
 
i
i
i
i
i
PPP
w






 /222 22
2 


 . Thus, the
two homogeneous solutions 2,1),,( lt ihl  are given by
jwt
ih
jwt
ih etet 
 ),(,),( 21  , and
we get
jwtjwt
i ecect 
 21)(
42
Karhunen-Loeve Expansion; the Vector Case:
A stochastic vector process  T
N tXtXtX )(),...,()( 1 is expanded in terms of orthonormal basis
vector  T
iNii
ttt )(),...,()( 1   as:
 TtttX
i
ii ,0,)()(
1
 



where 
T
T
ii dttXt
0
)()(
ij
T
j
T
i
dttt  0
)()(
The basis )(ti
 are chosen such that the random coefficients i are uncorrelated; Viz:
If   ii mE 
then     ijijjii mmE  
For the process X(t) with mean m(t), define the covariance matrix ),( utK as:
   T
umuXtmtXEutK )()()()(),( 
The covariance matrix, for uncorrelated i , satisfies the Fredholm integral equation:
0),()(),(
0
 iii
T
i
tduuutK 
and could be expanded in terms of the orthonormal basis as:
 TutututK
i
T
iii ,0,,)()(),(
1
 



It is this equation that we use to find the orthonormal basis.
43
The Wiener Filter
We shall develop the Wiener filter for stationary signals. Nonstationary signals could, in many
cases, be reduced to stationary signals by focusing on a small window of the signal. In this
window, the statistical properties do not change much i.e. we have a stationary signal.
Assume that we receive/observe a signal y(n) that is correlated with another signal X(n). We use
y(n) to find an estimate, )(ˆ nX , of X(n) according to the following equation:


I
i
inyihnX
0
)()()(ˆ
Define )(ˆ)()( nXnXne  ,     22
)(ˆ)()()( nXnXEneEn 
where h(0), h(1), …, h(I) are the unknown filter parameters. In order to find the filter parameters,
we need to minimize the expected value of the squared error w.r.t. the unknowns.
  















)()()()(2)()(ˆ)(20
)(
)(
0
lnyinyihnXElnynXnXE
lh
n I
i

This yields:
    IlinylnyEihinylnyihElnynXE
I
i
I
i
,...,1,0,)()()()()()()()(
00







  
i.e. IlliRihlR
I
i
yyXy ,...,1,0,)()()(
0
 
44
These are a set of (I+1) equations in the filter coefficients h(0), h(1),…, h(I).
We usually have the observations y(n). This gives us an estimate of )(iRyy . We need to find a
relation between y(n) and X(n) in order to get an estimate of )(lRXy .
In some situations, as in noise cancelling, we have the signal X(n). We could get a numerical
Estimate for )(lRXy as
 





lN
n
Xy lnynX
lN
lR
0
)()(
1
1
)(ˆ
Assume that y(n) is a linear observation of the process X(n); Viz:
)()()( nvncXny 
where )(nv is additive zero mean
2
v variance Gaussian noise which is independent of X(n).
In this case in order to get )(lRXy , we multiply by )( lny  and take the expectations as follows:
l=0:      )()()()()()()0( nvnyEnXnycEnynyERyy 
  )()()()0( nvnvncXEcRXy 
2
)0( vXycR 
Thus,  2
)0(
1
)0( vyyXy R
c
R 
where we used       0)()()()(  nvEnXEnvnXE because X(n) and v(n) are independent and
  0)( nvE .
45
l=1:      )()1()()1()1()()1( nvnyEnXnycEnynyERyy 
)1(XycR
i.e. )1(
1
)1( yyXy R
c
R 
and in general
)(
1
)( lR
c
lR yyXy 
The Observations are convolution of the state:
Assume that y(n) is a linear observation of the process X(n); Viz:
)()1()()( 10 nvnXcnXcny 
where )(nv is additive zero mean
2
v variance Gaussian noise which is independent of X(n).
In this case in order to get )(lRXy , we multiply by )( lny  and take the expectations as follows:
l=0:        )()()1()()()()()()0( 10 nvnyEnXnyEcnXnyEcnynyERyy 
  )()()1()()1()0( 1010 nvnvnXcnXcERcRc XyXy 
2
10 )1()0( vXyXy RcRc 
Similarly
46
l=1:
       )()1()1()1()()1()()1()1( 10 nvnyEnXnyEcnXnyEcnynyERyy 
  )()1()2()1()0()1( 1010 nvnvnXcnXcERcRc XyXy 
)1()0( 01 XyXy RcRc 
In matrix format we have 



















 
)1(
)0(
)1(
)0(
01
10
2
Xy
Xy
yy
vyy
R
R
cc
cc
R
R 
Thus, 






 













)1(
)0(
)1(
)0( 21
01
10
yy
vyy
Xy
Xy
R
R
cc
cc
R
R 
For
)()()(
2
0
nvknXcny
k
k  
         )()()2()()1()()()()()()0( 210 nvnyEnXnyEcnXnyEcnXnyEcnynyERyy 
  )()()1()()2()1()0( 10210 nvnvnXcnXcERcRcRc XyXyXy 
2
210 )2()1()0( vXyXyXy RcRcRc 
Similarly
         )()1()2()1()1()1()()1()()1()1( 210 nvnyEnXnyEcnXnyEcnXnyEcnynyERyy 
  )()1()3()2()1()1()0()1( 210210 nvnvnXcnXcnXcERcRcRc XyXyXy 
47
  )1()0( 201 XyXy RccRc 
         )()2()2()2()1()2()()2()()2()2( 210 nvnyEnXnyEcnXnyEcnXnyEcnynyERyy 
  )()1()4()3()2()0()1()2( 210210 nvnvnXcnXcnXcERcRcRc XyXyXy 
)0()1()2( 210 XyXyXy RcRcRc 
In matrix format we have
 






























 
)2(
)1(
)0(
0
)2(
)1(
)0(
012
201
210
2
Xy
Xy
Xy
yy
yy
vyy
R
R
R
ccc
ccc
ccc
R
R
R 
Thus,  









 






















)2(
)1(
)0(
0
)2(
)1(
)0( 21
012
201
210
yy
yy
vyy
Xy
Xy
Xy
R
R
R
ccc
ccc
ccc
R
R
R 
Example: Assume that we have a first order filter i.e. I=1, h(0) and h(1) are unknowns. We also
have the observations y(n) related to the signal X(n) by )()()( nvncXny  . Thus,
 2
)0(
1
)0( vyyXy R
c
R  and )1(
1
)1( yyXy R
c
R 
)1()1()()0()(ˆ  nyhnyhnX
  















)()()()(2)()(ˆ)(20
)0(
)( 1
0
nyinyihnXEnynXnXE
h
n
i

This yields:
48
       )()1()1()()()0()()1()1()()()0()()( nynyEhnynyEhnynyhnynyhEnynXE 
i.e. )1()1()0()0()0( yyyyXy RhRhR 
or   )1()1()0()0()0(
1 2
yyyyvyy RhRhR
c

Similarly   















)1()()()(2)1()(ˆ)(20
)1(
)( 1
0
nyinyihnXEnynXnXE
h
n
i

This yields:
       )1()1()1()1()()0()1()1()1()1()()0()1()(  nynyEhnynyEhnynyhnynyhEnynXE
i.e. )0()1()1()0()1( yyyyXy RhRhR 
or )0()1()1()0()(
1
yyyyyy RhRhlR
c

In matrix format we have:




















 
)1(
)0(
)0()1(
)1()0(
)1(
)0(1
2
h
h
RR
RR
R
R
c yyyy
yyyy
yy
vyy 
Inverting the matrix we get







 













)1(
)0(
)0()1(
)1()0(1
)1(
)0( 21
yy
vyy
yyyy
yyyy
R
R
RR
RR
ch
h 
This is the desired result.
49
Adaptive Filters
Adaptive Frequency Estimation:
Assume that the stochastic process X(t) is the output of a Linear filter H(z) driven by white
Gaussian noise. Assume further that the filter is an AR process with real coefficients ka i.e. it has
only poles. In this case the filter is given by:
 






 M
k
jkw
k
jMw
M
jw
jw
ea
eaea
eH
1
1
1
1
...1
1
For zero mean Gaussian process )(nv with variance
2
v , the output spectrum )(wSAR
becomes:
  2
1
2
22
1
)(




M
k
jkw
k
vjw
vAR
ea
eHwS


)()()(
1
nvknXanX
M
k
k  
If we assume adaptive parameters i.e. we have )(ˆ nak , the spectrum becomes also adaptive,
),( nwSAR
, and is given by:
2
1
2
)(ˆ1
),(




M
k
jkw
k
v
AR
ena
nwS

The parameters are updated using, e.g., the LMS algorithm as follows:
50
)()()(ˆ)1(ˆ neknXnana kk  


M
k
k knXnanXnXnXne
1
)(ˆ)(ˆ)()(ˆ)()(
Example: Single sinusoid with random frequency. Assume that the receive signal is made of
single sinusoid with unknown frequency that is behaving according to OU process. The sinusoid
X(n) is modeled as and AR(2). The initial conditions X(-2), X(-1), X(0), determine the phase,
and the values of the coefficients determine the frequency.
)()2()1()( 21 nvnXanXanX 
which has the transfer function:
         222
22
21
2
2
2
2
1
1 21
1
)(
)(
 






 
zz
z
jzjz
z
azaz
z
zazazv
zX
i.e. 2/1a ,
2
1
2
2







a
a
 











 2/arctan 


f
where is the sampling interval. Notice that the arctan operation yields the phase between “
 ” and “  ”.
The frequency )(tf is modeled as an Ornestein-Uhlenbeck (OU) process;
51
  )()()( tdWdttftdf fff  
where W(t) is the Wiener process.
We use the LMS algorithm to find an estimate of the frequency. We first estimate the changing
coefficients according to the equation:
)()()(ˆ)1(ˆ neknXnana kk  


M
k
k knXnanXnXnXne
1
)(ˆ)(ˆ)()(ˆ)()(
The frequency is estimated as  







 


2/
)(ˆ
)(ˆ
arctan)(ˆ
n
n
nf
where 2/)(ˆ)(ˆ 1 nan  ,
2
1
2
2
)(ˆ
)(ˆ)(ˆ 






na
nan
Adaptive Noise Cancelling to Remove Sinusoidal Interference:
Assume that the signal X(t) is modeled as:
 000 cos)()(  nwAnSnX
where  000 cos nwA is the sinusoidal interference and S(n) is the desired unknown signal.
We also have another reference signal, y(n), that has the same frequency as the interference but
different in amplitude and phase; Viz:
52
  000 ,cos)(   andAAnwAny
The estimated signal, )(ˆ nX , and the error, e(n), are modeled as:
  





1
0
0
1
0
cos)(ˆ)()(ˆ)(ˆ
M
i
i
M
i
i inwAnainynanX 
     





 


1
0
0000 cos)(ˆcos)()(ˆ)()(
M
i
i inwAnanwAnSnXnXne 
The Wiener filter weights, )(ˆ nai , are updated through the LMS algorithm as:
)()()(ˆ)1(ˆ inynenana ii  
Notice that the signal X(n) is made of two parts; (1) e(n) which is not correlated with y(n) and
(2) )(ˆ nX which is correlated with y(n). The adaptive filter output must converge to the correlated
part and thus the remainder is the desired signal.
Adaptive Line Enhancement (ALE):
In some situations there is only one signal X(n) that is available. This signal is made of two
parts; (1) Slowly varying signal )(1 nX with slowly decaying correlation, and (2) rapidly varying
signal )(2 nX independent of )(1 nX and that could be nonstationary or has short duration
correlation. Define another signal   nXny )( , where  is a delay. We choose the delay 
such that y(n) is highly correlated with the slowly varying part of X(n) but is weakly correlated
53
with the rapidly varying part of X(n). Thus, X(n) has two components (1) )()(ˆ
1 nXnX  that is
highly correlated with y(n), and (2) )()( 2 nXne  that is orthogonal or independent of y(n) .
Let )()()( 21 nXnXnX 
then      )()()()()()( 2121   nXnXnXnXEnXnXE
       )()()()()()()()( 22122111   nXnXEnXnXEnXnXEnXnXE
  )()()( 1111  RnXnXE 
Similarly )()()()( 21  nXnXnXny
       )()()()()()()()( 2121 nXnXnXnXEnXnXEnXnyE 
       )()()()()()()()( 22122111 nXnXEnXnXEnXnXEnXnXE 
  )()()( 1111  RnXnXE
Define )1()1()()0()1()1()()0()(ˆ  nXhnXhnyhnyhnX
   )1()1()1()()()0( 2121  nXnXhnXnXh
 )1()1()()0()()(ˆ)()(  nXhnXhnXnXnXne
i.e.  )1()1()()0()()(ˆ)()(  nXhnXhnenXnenX
The signal X(n) has two parts; (1) e(n) which is uncorrelated with y(n), and (2) )(ˆ nX which is
highly correlated with y(n).
54
     22
)1()1()()0()()()(  nXhnXhnXEneEn
To find the unknown coefficients, h(0) and h(1), we minimize )(n and equate the derivative to
zero:
 )()(20
)0(
)(



nXneE
h
n
   )()1()1()()0()(2  nXnXhnXhnXE
i.e.      )()1()1()()0()()( 2
 nXnXEhnXEhnXnXE
Similarly  )1()(20
)1(
)(



nXneE
h
n
   )1()1()1()()0()(2  nXnXhnXhnXE
i.e.      )1()1()1()()0()1()( 2
 nXEhnXnXEhnXnXE
These are two equations in the two unknowns h(0) and h(1).
Since
  )1()1()()0()()1()1()()0()()(ˆ)()(  nyhnyhnenXhnXhnenXnenX ,
the signal X(n) has two parts; (1) e(n) which is uncorrelated with y(n), and (2) )(ˆ nX which is
highly correlated with y(n). In our case, )(ˆ nX is an estimate of )(1 nX and e(n) is an estimate of
)(2 nX .
55
Example: The signal is a product of stationary and nonstationary components i.e.
)()()( 21 nZnZnZ  . Taking the log operation, assuming nonnegative quantities, we get
)(log)(log)(log 21 nZnZnZ  . Define )(log)( nZnX  , )(log)( 11 nZnX  , )(log)( 22 nZnX  ,
and we follow the same steps as before.
56
Blind Deconvolution
Assume that we have one source of signals e.g. the Aortic pressure )(nPA
. We measure the
pressure at the Femoral artery )(nPF
and the Illiac artery )(nPI
through the unknown filters )(nFF
and )(nFI
:
)(*)()( nPnFnP AFF 
)(*)()( nPnFnP AII 
where “*” is the convolution operation.
Convolve the first equation by )(nFI
and the second equation by )(nFF
, we get:
)(*)(*)()(*)( nPnFnFnPnF AFIFI 
)(*)(*)()(*)( nPnFnFnPnF AIFIF 
Notice that the right hand side of both equations are equal. Thus,
)(*)(*)()(*)()(*)( nPnFnFnPnFnPnF AFIIFFI 
The two pressures )(nPF
and )(nPI
are highly correlated. If we use a Wiener filter, as correlation
canceller, we should get the output an exact replica of the other signal i.e. we should get an
estimate of )(nPI
if we pass )(nPF
through the Wiener filter.
57
Instead, if we use two Wiener filters )(nHF
and )(nHI
such that the error between the two
outputs is zero, then the output of each filter is )(nPA
i.e. )(nHF
is actually the inverse of )(nFF
and )(nHI
is the inverse of )(nFI
Thus, 

FI
i
FFFFAF inPiHnPnHnP
0
)()()(*)()(


IJ
j
IIIIAI jnPjHnPnHnP
0
)()()(*)()(
The error is    

IF J
j
II
I
i
FFAIAF jnPjHinPiHnPnPne
00
)()()()(0)()()(
Define  














  
2
00
2
)()()()()()(
IF J
j
II
I
i
FF jnPjHinPiHEneEn
To find the coefficients of the two Wiener filters, we minimize )(n w.r.t. the unknowns as
follows:
FF
J
j
II
I
i
FF
F
IkknPjnPjHinPiHE
kH
n IF
,...,0,)()()()()(0
)(
)(
00

















 

II
J
j
II
I
i
FF
I
JllnPjnPjHinPiHE
lH
n IF
,...,0,)()()()()(0
)(
)(
00

















 

This will yield FI + II -1 independent equations in the FI + II unknowns. Thus, another equation
is needed to find a unique solution.
58
Example: Assume that we have first order filters. Thus,
)1()1()()0()()()(*)()(
1
0
 


nPHnPHinPiHnPnHnP FFFF
I
i
FFFFAF
F
)1()1()()0()()()(*)()(
1
0
 


nPHnPHjnPjHnPnHnP IIII
J
j
IIIIAI
I
and   





1
0
1
0
)()()()(0)()()(
IF J
j
II
I
i
FFAIAF jnPjHinPiHnPnPne
 














  
2
00
2
)()()()()()(
IF J
j
II
I
i
FF jnPjHinPiHEneEn
To find the coefficients of the two Wiener filters, we minimize )(n w.r.t. the unknowns as
follows:
FF
J
j
II
I
i
FF
F
IkknPjnPjHinPiHE
kH
n IF
,...,0,)()()()()(0
)(
)(
00

















 

for k=0





















)()()()()(0
)0(
)( 1
0
1
0
nPjnPjHinPiHE
H
n
F
J
j
II
I
i
FF
F
IF

which yields 0)1()1()0()0()1()1()0()0(  IFIIFIFFFFFF RHRHRHRH
for k=1





















)1()()()()(0
)1(
)( 1
0
1
0
nPjnPjHinPiHE
H
n
F
J
j
II
I
i
FF
F
IF

which yields 0)0()1()1()0()0()1()1()0(  IFIIFIFFFFFF RHRHRHRH
59
Similarly for the Illiac artery we get:
for l=0





















)()()()()(0
)0(
)( 1
0
1
0
nPjnPjHinPiHE
H
n
I
J
j
II
I
i
FF
I
IF

which yields 0)1()1()0()0()1()1()0()0(  IIIIIIFIFFIF RHRHRHRH
for l=1





















)1()()()()(0
)1(
)( 1
0
1
0
nPjnPjHinPiHE
H
n
I
J
j
II
I
i
FF
I
IF

which yields 0)0()1()1()0()0()1()1()0(  IIIIIIFIFFIF RHRHRHRH
Collecting terms and putting the above equations into matrix format we get:
0
)1(
)0(
)1(
)0(
)0()1()0()1(
)1()0()1()0(
)0()1()0()1(
)1()0()1()0(





























I
I
F
F
IIIIFIFI
IIIIFIFI
FIFIFFFF
FIFIFFFF
H
H
H
H
RRRR
RRRR
RRRR
RRRR
where )()( lRlR IFFI 
These are three independent equations in four unknowns. One should be able to get another
independent equation to find a unique solution. Otherwise, we use one of the unknowns as a
numeraire. Assume that the numeraire is )0(FH . In this case the above equation is reduced to:





































)1(
)0(
)1(
)0(
)1(
)0(
)1(
)0()1()0(
)1()0()1(
)0()1()0(
FI
FI
FF
F
I
I
F
IIIIFI
IIIIFI
FIFIFF
R
R
R
H
H
H
H
RRR
RRR
RRR
This yields the estimates of the unknown parameters as function of the value of )0(FH .
60
61
Blind Deconvolution for more than One Source:
In many applications there is more than one source that is generating signals e.g. Mother EKG
and Fetus EKG. The receivers are usually more than the number of sources. We shall focus on
the case of two sources, )(1 nu and )(2 nu , and three receivers, )(1 ny , )(2 ny , and )(3 ny , and
explain how to use the Wiener filter to find the desired sources. Later on we shall generalize the
analysis. In all the analysis, it is assumed that the sources are independent and stationary signals
and the transmission media/modulation are linear time invariant filters. Specifically



























)(
)(
)()(
)()(
)()(
)(
)(
)(
2
1
3231
2221
1211
3
2
1
zu
zu
zFzF
zFzF
zFzF
zy
zy
zy
which could be split into three equations as:


















)(
)(
)()(
)()(
)(
)(
2
1
2221
1211
2
1
zu
zu
zFzF
zFzF
zy
zy


















)(
)(
)()(
)()(
)(
)(
2
1
3231
1211
3
1
zu
zu
zFzF
zFzF
zy
zy
and 

















)(
)(
)()(
)()(
)(
)(
2
1
3231
2221
3
2
zu
zu
zFzF
zFzF
zy
zy
Inverting the square matrices (assuming invertiblity) we get:
  


























)(
)(
)()(
)()(
)()()()(
1
)(
)(
2
1
1121
1222
211222112
1
zy
zy
zFzF
zFzF
zFzFzFzFzu
zu



























)(
)(
)()(
)()(
)()()()(
1
)(
)(
3
1
1131
1232
311232112
1
zy
zy
zFzh
zFzF
zFzFzFzFzu
zu
and
62



























)(
)(
)()(
)()(
)()()()(
1
)(
)(
3
2
2131
2232
312232212
1
zy
zy
zFzF
zFzF
zFzFzFzFzu
zu
For the source signal )(2 zu , we have the equation:
 )()()()(
)()()()(
21122211
211121
zFzFzFzF
zyzFzyzF


 )()()()(
)()()()(
31123211
311131
zFzFzFzF
zyzFzyzF



 
)(
)()()()(
)()()()(
2
31223221
321231
zu
zFzFzFzF
zyzFzyzF




The above equation is actually a set of three equations in the unknown filter coefficients. To see
this, we take the first two terms of equation i.e.:
   
)(
)()()()(
)()()()(
)()()()(
)()()()(
2
31123211
311131
21122211
211121
zu
zFzFzFzF
zyzFzyzF
zFzFzFzF
zyzFzyzF






Rearrange we get:
  
  )()()()()()()()(
)()()()()()()()(
31113121122211
21112131123211
zyzFzyzFzFzFzFzF
zyzFzyzFzFzFzFzF


The above equation represents a linear relation between the observations, delayed observations,
and the unknown coefficients of the mixing FIR filters. Solving this equation using for example
regression analysis, estimates for the filters could be found. Once the filter coefficients are
found, we use them to find an estimate for the source signal )(2 zu . The same approach could be
applied for )(1 zu .
Instead, we use the Wiener filter approach where we assume that the signals are stationary. Each
observed signal has two Wiener filters. The parameters of each are estimated from the
minimization of a squared error criterion. Specifically, for )(nyi we have the two Wiener filters
)(1 nHi and )(2 nHi . They are related to the data through the equations:
63
 )()()()(
)()()()(
21122211
211121
zFzFzFzF
zyzFzyzF


 )()()()(
)()()()(
31123211
311131
zFzFzFzF
zyzFzyzF



 
)(
)()()()(
)()()()(
2
31223221
321231
zu
zFzFzFzF
zyzFzyzF




In terms of the Wiener filters we get the three equations:
 )()()()()()()()()( 3311122211111 zyzHzyzHzyzHzyzHze 
 )()()()()()()()()( 3322222211112 zyzHzyzHzyzHzyzHze 
and  )()()()()()()()()( 3322223311123 zyzHzyzHzyzHzyzHze 
For the first equation we have:
 )(*)()(*)()(*)()(*)()( 3311122211111 nynHnynHnynHnynHne 
and        2
331112221111
2
11 )(*)()(*)()(*)()(*)()()( nynHnynHnynHnynHEneEn 
Minimizing )(1 n with respect to the unknown coefficients of the filters, we get an estimate of
these coefficients as function of the correlation between the observed signals )(1 ny , )(2 ny , and
)(3 ny .
In a Similar manner we obtain the equations for )(2 ne and )(3 ne .
In vector notations we have:


































)(
)(
)(
*
)()()()(
)()()()(
)()()()(
)(
)(
)(
3
2
1
32312212
32222111
31211211
3
2
1
ny
ny
ny
nHnHnHnH
nHnHnHnH
nHnHnHnH
ne
ne
ne
or )(*)()( nynHne 
Example: Assume all the Wiener filters are first order. Thus, )1()0()( 1
ijijij HzHzH 
 and
64
       2
331112221111
2
11 )(*)()(*)()(*)()(*)()()( nynHnynHnynHnynHEneEn 
    
  
















2
331331
2212211121111211
)1()1()()0(
)1()1()()0()1()1()1()()0()0(
nyHnyH
nyHnyHnyHHnyHH
E
minimizing )(1 n w.r.t the unknown coefficients we get:
 )()(0
)0(
)(
11
11
1
nyneE
H
n



,  )1()(0
)1(
)(
11
11
1



nyneE
H
n
 )()(0
)0(
)(
11
12
1
nyneE
H
n



,  )1()(0
)1(
)(
11
12
1



nyneE
H
n
Notice that the above two equations are identical. Thus, only one equation will be useful.
 )()(0
)0(
)(
21
21
1
nyneE
H
n



,  )1()(0
)1(
)(
21
21
1



nyneE
H
n
 )()(0
)0(
)(
31
31
1
nyneE
H
n



,  )1()(0
)1(
)(
31
31
1



nyneE
H
n
The above are a total of 6 equations.
 )()(0
)0(
)(
31
31
1
nyneE
H
n



,  )1()(0
)1(
)(
31
31
1



nyneE
H
n
       2
331112221111
2
11 )(*)()(*)()(*)()(*)()()( nynHnynHnynHnynHEneEn 
In a similar way we have:
minimizing )(2 n w.r.t the unknown coefficients we get:
 )()()()()()()()()( 3322222211112 zyzHzyzHzyzHzyzHze 
 )()(0
)0(
)(
22
22
2
nyneE
H
n



,  )1()(0
)1(
)(
22
22
2



nyneE
H
n
65
 )()(0
)0(
)(
32
32
2
nyneE
H
n



,  )1()(0
)1(
)(
32
32
2



nyneE
H
n
The above are a total of 4 equations.
minimizing )(3 n w.r.t the unknown coefficients we get:
 )()()()()()()()()( 3322223311123 zyzHzyzHzyzHzyzHze 
 )()(0
)0(
)(
33
31
3
nyneE
H
n



,  )1()(0
)1(
)(
33
31
3



nyneE
H
n
The above are a total of 2 equations. Thus, we have a total of 12 equations in 12 unknowns.
Unfortunately, these are not independent equations. Actually we only have 11 independent
equations. Thus, we need to assume a value for one of the unknowns and calculate the other
values of the Wiener filters in terms of this quantity; same as we did with the single source and
two measurements.
In vector notations we have:































































































































)1(
)0(
)1(
)0(
)1(
)0(
)1(
)0(
)1(
)0(
)1(
)0(
)0()1(00)0()1(00)0()1(00
)1()0(00)1()0(00)1()0(00
)0()1(00)0()1()0()1(00)0()1(
)1()0(00)1()0()1()0(00)1()0(
)0()1(00)0()1()0()1(00)0()1(
)1()0(00)1()0()1()0(00)1()0(
00)0()1(00)0()1()0()1()0()1(
00)1()0(00)1()0()1()0()1()0(
00)0()1(00)0()1()0()1()0()1(
00)1()0(00)1()0()1()0()1()0(
00)0()1(00)0()1()0()1()0()1(
00)1()0(00)1()0()1()0()1()0(
0
0
0
0
0
0
0
0
0
0
0
0
32
32
31
31
22
22
21
21
12
12
11
11
333332323131
333332323131
3333323232323131
3333323232323131
3232222222222121
3232222222222121
3333323231313131
3333323231313131
3232222221212121
3232222221212121
3131212111111111
3131212111111111
H
H
H
H
H
H
H
H
H
H
H
H
RRRRRR
RRRRRR
RRRRRRRR
RRRRRRRR
RRRRRRRR
RRRRRRRR
RRRRRRRR
RRRRRRRR
RRRRRRRR
RRRRRRRR
RRRRRRRR
RRRRRRRR
yyyyyyyyyyyy
yyyyyyyyyyyy
yyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyy
66
Once we have estimates for the Wiener filters, we go back and substitute to get an estimate for
the unknown input source signal )(2 nu . The same process could be repeated for the other signal
)(1 nu .
Example: A simpler example would be to assume that all the filters involved are constant values.



























)(
)(
)(
)(
)(
2
1
3231
2221
1211
3
2
1
zu
zu
FF
FF
FF
zy
zy
zy
which could be split into three equations as:


















)(
)(
)(
)(
2
1
2221
1211
2
1
zu
zu
FF
FF
zy
zy


















)(
)(
)(
)(
2
1
3231
1211
3
1
zu
zu
FF
FF
zy
zy
and 

















)(
)(
)(
)(
2
1
3231
2221
3
2
zu
zu
FF
FF
zy
zy
Inverting the square matrices (assuming invertiblity) we get:
  


























)(
)(1
)(
)(
2
1
1121
1222
211222112
1
zy
zy
FF
FF
FFFFzu
zu



























)(
)(1
)(
)(
3
1
1131
1232
311232112
1
zy
zy
FF
FF
FFFFzu
zu
and



























)(
)(1
)(
)(
3
2
2131
2232
312232212
1
zy
zy
FF
FF
FFFFzu
zu
For the source signal )(1 zu , we have the equation:
 21122211
212122 )()(
FFFF
zyFzyF


 31123211
312132 )()(
FFFF
zyFzyF



 
)(
)()(
1
31223221
322232
zu
FFFF
zyFzyF




67
For the source signal )(2 zu , we have the equation:
 21122211
211121 )()(
FFFF
zyFzyF


 31123211
311131 )()(
FFFF
zyFzyF



 
)(
)()(
2
31223221
321231
zu
FFFF
zyFzyF




The above equation is actually a set of three equations in the unknown filter coefficients. To see
this, we take the first two terms of equation i.e.:
   
)(
)()()()(
2
31123211
311131
21122211
211121
zu
FFFF
zyFzyF
FFFF
zyFzyF






Rearrange we get:
     )()()()( 3111312112221121112131123211 zyFzyFFFFFzyFzyFFFFF 
The above equation represents a linear relation between the observations, and the unknown
coefficients of the mixing FIR filters. Solving this equation using for example regression
analysis, estimates for the filters could be found. Once the filter coefficients are found, we use
them to find an estimate for the source signal )(2 zu . The same approach could be applied for
)(1 zu .
Instead, we use the Wiener filter approach where we assume that the signals are stationary. Each
observed signal has two Wiener filters. The parameters of each are estimated from the
minimization of a squared error criterion. Specifically, for )(nyi we have the two Wiener filters
)(1 nHi and )(2 nHi which are constant values. They are related to the data through the equations:
 21122211
211121 )()(
FFFF
zyFzyF


 31123211
311131 )()(
FFFF
zyFzyF



 
)(
)()(
2
31223221
321231
zu
FFFF
zyFzyF




In terms of the Wiener filters we get the three equations:
68
 )()()()()( 3311122211111 zyHzyHzyHzyHze 
 )()()()()( 3322222211112 zyHzyHzyHzyHze 
and  )()()()()( 3322223311123 zyHzyHzyHzyHze 
where
 21122211
21
11
FFFF
F
H


 ,
 31123211
31
12
FFFF
F
H



 21122211
11
21
FFFF
F
H

 ,
 31223221
31
22
FFFF
F
H



and
 31123211
11
31
FFFF
F
H

 ,
 31223221
21
32
FFFF
F
H


If one is able to get estimates for the Wiener filter coefficients ijH , we could find the values of
the filter coefficients ijF . This will yield an estimate for both inputs )(1 nu and )(2 nu .
We now move ahead and find the Wiener filter coefficients. For the first error equation )(1 ne we
have:
 )()()()()( 3311122211111 nyHnyHnyHnyHne 
and        2
331112221111
2
11 )()()()()()( nyHnyHnyHnyHEneEn 
Minimizing )(1 n with respect to the unknown coefficients of the filters, we get an estimate of
these coefficients as function of the correlation between the observed signals )(1 ny , )(2 ny , and
)(3 ny .
In a Similar manner we obtain the equations for )(2 ne and )(3 ne .
In vector notations we have:
69


































)(
)(
)(
)(
)(
)(
3
2
1
32312212
32222111
31211211
3
2
1
ny
ny
ny
HHHH
HHHH
HHHH
ne
ne
ne
or )()( nyHne 
       2
331112221111
2
11 )()()()()()( nyHnyHnyHnyHEneEn 
minimizing )(1 n w.r.t the unknown coefficients we get:
 )()(0
)(
11
11
1
nyneE
H
n



 )()(0
)(
11
12
1
nyneE
H
n



Notice that the above two equations are identical. Thus, only one equation will be useful.
 )()(0
)(
21
21
1
nyneE
H
n



,
 )()(0
)(
31
31
1
nyneE
H
n



,
The above are a total of 3 equations.
 )()(0
)(
31
31
1
nyneE
H
n



,
In a similar way we have:
minimizing )(2 n w.r.t the unknown coefficients we get:
 )()()()()( 3322222211112 zyHzyHzyHzyHze 
 )()(0
)(
22
22
2
nyneE
H
n



,
70
 )()(0
)(
32
32
2
nyneE
H
n



,
The above are a total of 2 equations.
minimizing )(3 n w.r.t the unknown coefficients we get:
 )()()()()( 3322223311123 zyHzyHzyHzyHze 
 )()(0
)(
33
31
3
nyneE
H
n



,
The above are a total of 1 equations. Thus, we have a total of 6 equations in 6 unknowns.
Unfortunately, these are not independent equations. Actually we only have 5 independent
equations. Thus, we need to assume a value for one of the unknowns and calculate the other
values of the Wiener filters in terms of this quantity; same as we did with the single source and
two measurements.
 )()()()()( 3322223311123 zyHzyHzyHzyHze 
 )()(0
)(
33
31
3
nyneE
H
n



,
In vector notations we have:



































































32
31
22
21
12
11
)0()0()0(0)0(0
)0(0)0()0(0)0(
)0(0)0()0(0)0(
0)0(0)0()0()0(
0)0(0)0()0()0(
0)0(0)0()0()0(
0
0
0
0
0
0
33333231
33323231
32222221
33323131
32222121
31211111
H
H
H
H
H
H
RRRR
RRRR
RRRR
RRRR
RRRR
RRRR
yyyyyyyy
yyyyyyyy
yyyyyyyy
yyyyyyyy
yyyyyyyy
yyyyyyyy
Once we have estimates for the Wiener filters, we go back and substitute to get an estimate for
the unknown input source signal )(2 nu . The same process could be repeated for the other signal
71
)(1 nu . Instead we use the estimated values of ijH to get estimates for ijF and consequently get
estimates for the inputs )(1 nu and )(2 nu . Notice that the above matrix equation does not have a
unique solution. We could always take one of the unknowns as a numeraire. For example if we
take 11H as the numeraire, the above equation becomes:






































































32
31
22
21
12
11
)0()0()0(0)0(
)0(0)0()0(0
)0(0)0()0(0
0)0(0)0()0(
0)0(0)0()0(
0
)0(
)0(
)0(
)0(
33333231
333232
322222
333231
322221
31
21
31
21
H
H
H
H
H
RRRR
RRR
RRR
RRR
RRR
H
R
R
R
R
yyyyyyyy
yyyyyy
yyyyyy
yyyyyy
yyyyyy
yy
yy
yy
yy
Taking the inverse of the right hand side matrix we get:
11
1
32
31
22
21
12
0
)0(
)0(
)0(
)0(
)0()0()0(0)0(
)0(0)0()0(0
)0(0)0()0(0
0)0(0)0()0(
0)0(0)0()0(
31
21
31
21
33333231
333232
322222
333231
322221
H
R
R
R
R
RRRR
RRR
RRR
RRR
RRR
H
H
H
H
H
yy
yy
yy
yy
yyyyyyyy
yyyyyy
yyyyyy
yyyyyy
yyyyyy







































































These are the estimated coefficients ijH as function of the numeraire 11H .
If one of the source signals, )(1 nu or )(2 nu , is much higher than the other, the estimate of the
stronger signal is usually very good. In this case, we use the techniques of noise cancelling to get
the other signal. Specifically, assume that we have found good estimate (high SNR) of )(1 nu , we
could use )()()( 2121111 nuFnuFny  as the source signal and use Wiener filter, as correlation
canceller, to find the component in )(1 ny that is correlated with )(1 nu . Effectively we are getting
new and more accurate estimate for 11F . Similarly we could use )()()( 2221212 nuFnuFny  and
72
)()()( 2321313 nuFnuFny  to get better estimates for 21F and 31F . We then repeat the above
process again to find estimates for ijH but now only 12F , 22F , and 23F are unknowns.
Assume that we receive/observe a signal )(1 nu that is correlated with another signal )(1 ny . We
use )(1 nu to find an estimate, )(ˆ1 ny , of )(1 ny according to the following equation:
)()0()(ˆ 111 nuhny 
Define )(ˆ)()( 11 nynyne  ,     2
11
2
)(ˆ)()()( nynyEneEn 
where )0(1h is the unknown filter parameter. In order to find the filter parameter, we need to
minimize the expected value of the squared error w.r.t. the unknowns.
     )()()0()(2)()(ˆ)(20
)0(
)(
1111111
1
nunuhnyEnunynyE
h
n


This yields:
     )()0()()0()()( 2
11
2
1111 nuEhnuhEnunyE 
i.e. )0()0()0( 1111 1 uuuy RhR  and
)0(
)0(
)0(
11
11
1
uu
uy
R
R
h 
)0(1h is the revised more accurate estimate of 11F . The same process could be repeated for the
other unknowns and we get:
)0(
)0(
)0(
11
12
212
uu
uy
R
R
Fh  , and
)0(
)0(
)0(
11
13
313
uu
uy
R
R
Fh 
73
We then use the equations for the estimated ijH to get the revised values for the rest of ijF
according to:
 21122211
21
11
FFFF
F
H


 ,
 31123211
31
12
FFFF
F
H



 21122211
11
21
FFFF
F
H

 ,
 31223221
31
22
FFFF
F
H



and
 31123211
11
31
FFFF
F
H

 ,
 31223221
21
32
FFFF
F
H


Finally we used the improved estimates of ijF to get improved estimates for )(2 nu .
74
The Kalman Filter
In some situations, one has a difference equation describing the evolution of the Mx1 state vector
or signal )(tX and we have the Nx1 vector )(ty measurements of the vector )(tX . In the vector
format we have:
)()()()( nvnXncny 
)()1()()( nnXnTnX 
where )(nv is Gaussian with zero mean and variance v , )(n is independent of )(nv and is
Gaussian with zero mean and variance  .
Once a model is put into state space form, the Kalman filter can be used to estimate the state
vector by filtering. The Kalman filter will provide estimates of the unobserved variable )(nX .
The purpose of filtering is to update our knowledge of the state vector as soon as a new
observation )(ny becomes available. Therefore, Kalman filter can be described as an algorithm
for the unobserved components at time n based on the available information at the same date.
The estimates of any other desired parameters including what is called hyper parameters, v and
 , can be obtained by Maximum Likelihood Estimation (MLE) algorithm as adapted by
[Shumway and Stoffer; 1982]. Estimating the states through Kalman filter encompasses three
step processes: the initial states, the predict states and the update states.
Initial state: )0/0(X , )0/0(P
75
Predict states: )1/1()()1/(  nnXnTnnX
 )()1/1()()1/( nTnnPnTnnP T
where )1/( nnX is the estimate of )(nX given observations up till time “n-1” )1( ny , and
)1/( nnP is the covariance of the estimate.
Update states:  )1/()()()()1/()/(  nnXncnynKnnXnnX
 1
)()1/()()()1/()(

 v
TT
ncnnPncncnnPnK
  )1/()()()/(  nnPncnKInnP
The )0/0(X and )0/0(P are the vectors of initial state and covariance matrix respectively. The
covariance matrix )0/0(P depicts noise of the )0/0(X . If the vector of )0/0(X and the
covariance matrix )0/0(P are not given prior, )0/0(X is assumed zero and we assume large
number for diagonal elements of the matrix )0/0(P .
76
Principal Component Analysis (PCA)
Assume that the data is given in a matrix format e.g. EEG channels, two dimensional image, …
etc. We represent this data as the mxn matrix X where each row represents for example a single
EEG channel. Before any calculations, we subtract from each row its mean value. PCA is a
method to express this data as a linear combination of the data basis vectors. Specifically; let X
and Y be m×n matrices related by a linear transformation P. X is the original recorded data set
and Y is a re-representation of that data set. Thus, we have:






















mm y
y
Y
x
x
X ...,...
11
where Y=PX or
































mmm
x
x
p
p
y
y
Y .........
111
Also let us define the mxm matrix











m
p
p
P ...
1
where i
p are the 1xm rows of P. The equation
“PX = Y” represents a change of basis i.e. the rows of P,  m
pp ,...,1
are a set of new basis
vectors.
Define the covariance matrix of X, after removing the mean from each row, as XX , and the
covariance matrix of Y as YY :
T
XX XX
n 1
1

 ,
     TTTT
YY PXXP
n
PXPX
n
YY
n 1
1
1
1
1
1






77
T
XX PP
We know that a symmetric matrix  T
XX could be diagonalized by an orthogonal matrix of its
eigenvectors i.e.   TT
EDEXX 
where D is a diagonal matrix and E is a matrix of the eigenvectors of  T
XX arranged as
columns, and 1
 EET
. The matrix  T
XX has r orthonormal eigenvectors each of dimensions
mx1, where r is the rank of the matrix. The rank r of  T
XX is usually less than m. We select the
matrix P to be a matrix where each row i
p is an eigenvector of  T
XX . By this selection,
1
 EEP T
and IEEEEPP TT
 1
. Thus,
  DPPEDEXX TTT

and      TTTT
YY PXXP
n
PXPX
n
YY
n 1
1
1
1
1
1






  D
n
PDPPP
n
TT
1
1
1
1




It is obvious that 1
 EEP T
results in a diagonal form of YY . This was the goal for PCA.
We can summarize the results of PCA in the matrices P and YY as follows: (1) The principal
components of X are the eigenvectors of  T
XX or the rows of P, (2) The ith diagonal value of
YY is the variance of X along i
p . One benefit of PCA is that we can examine the variances of
YY associated with the principal components. Often one finds that large variances associated
with the first k < m principal components, and then a precipitous drop-off. One can conclude that
the most interesting dynamics occur only in the first k dimensions.
Example: We are given the two vectors 1x10 1x and 2x as follows:
78
 1.1,5.1,1,2,3.2,1.3,9.1,2.2,5.0,5.21 x ,  9.0,6.1,1.1,6.1,7.2,0.3,2.2,9.2,7.0,4.22 x . The average of
1x is estimated as   81.11 xE and for 2x we have   91.12 xE . We subtract the means and get
new zero mean data sets. The covariance matrix is calculated as
       






0.64490.5539
0.55390.5549
2211 xExxExE . The Eigen values are 2 0.049 and 1 1.284
with the corresponding Eigen vectors  735.0,6778.01
p and  6778.0,735.02
p . Notice
that the Eigen vectors are normalized to have unity length.
It is clear that there is one big Eigen value and the other is small. Thus, one could remove the
space with the small Eigen value and retain only the space with the big Eigen value.
Using the equation
































mmm
x
x
p
p
y
y
Y .........
111
, we get the value of the transformed matrix as:















6778.0735.0
735.06778.0
2
1
y
y
Y 





1.0181,-0.31,-,-0.31,-0.,1.09,0.79,0.99,0.290.49,-1.21
.711,-0.31,-0,0.19,-0.8,1.29,0.49,0.39,0.090.69,-1.31







165,0.02,-0.,-0.35,0.0-0.21,0.180.38,0.13,0.18,0.14,-
.22.14,0.44,1.91,0.10,17,-1.68,-0-0.99,-0.20.83,1.78,-
Notice that the values of the vector 1
y are much higher than the values of the vector 2
y . This is
in agreement with the fact that the Eigen values corresponding to 1
p is much higher than the
Eigen values corresponding to 2
p . Thus, we could retain the vector 1
y as a representative of the
data and ignore the vector 2
y .
Linearization:
stochastic notes
stochastic notes
stochastic notes

More Related Content

What's hot

MATLAB ODE
MATLAB ODEMATLAB ODE
MATLAB ODEKris014
 
Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...
Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...
Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...mathsjournal
 
2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filternozomuhamada
 
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...SSA KPI
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Estimation theory 1
Estimation theory 1Estimation theory 1
Estimation theory 1Gopi Saiteja
 
Systems Of Differential Equations
Systems Of Differential EquationsSystems Of Differential Equations
Systems Of Differential EquationsJDagenais
 
A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”IOSRJM
 
Machine learning (10)
Machine learning (10)Machine learning (10)
Machine learning (10)NYversity
 
Senior Seminar: Systems of Differential Equations
Senior Seminar:  Systems of Differential EquationsSenior Seminar:  Systems of Differential Equations
Senior Seminar: Systems of Differential EquationsJDagenais
 

What's hot (20)

Section3 stochastic
Section3 stochasticSection3 stochastic
Section3 stochastic
 
Ch07 5
Ch07 5Ch07 5
Ch07 5
 
MATLAB ODE
MATLAB ODEMATLAB ODE
MATLAB ODE
 
R180304110115
R180304110115R180304110115
R180304110115
 
Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...
Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...
Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...
 
2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter
 
Am26242246
Am26242246Am26242246
Am26242246
 
Ch07 7
Ch07 7Ch07 7
Ch07 7
 
Ch07 8
Ch07 8Ch07 8
Ch07 8
 
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
 
Ch07 6
Ch07 6Ch07 6
Ch07 6
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Estimation theory 1
Estimation theory 1Estimation theory 1
Estimation theory 1
 
Pria 2007
Pria 2007Pria 2007
Pria 2007
 
patel
patelpatel
patel
 
Berans qm overview
Berans qm overviewBerans qm overview
Berans qm overview
 
Systems Of Differential Equations
Systems Of Differential EquationsSystems Of Differential Equations
Systems Of Differential Equations
 
A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”
 
Machine learning (10)
Machine learning (10)Machine learning (10)
Machine learning (10)
 
Senior Seminar: Systems of Differential Equations
Senior Seminar:  Systems of Differential EquationsSenior Seminar:  Systems of Differential Equations
Senior Seminar: Systems of Differential Equations
 

Similar to stochastic notes

Chapter-4 combined.pptx
Chapter-4 combined.pptxChapter-4 combined.pptx
Chapter-4 combined.pptxHamzaHaji6
 
this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...BhojRajAdhikari5
 
Communication Theory - Random Process.pdf
Communication Theory - Random Process.pdfCommunication Theory - Random Process.pdf
Communication Theory - Random Process.pdfRajaSekaran923497
 
Appendix 2 Probability And Statistics
Appendix 2  Probability And StatisticsAppendix 2  Probability And Statistics
Appendix 2 Probability And StatisticsSarah Morrow
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingSSA KPI
 
Errors in the Discretized Solution of a Differential Equation
Errors in the Discretized Solution of a Differential EquationErrors in the Discretized Solution of a Differential Equation
Errors in the Discretized Solution of a Differential Equationijtsrd
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and StatisticsMalik Sb
 
The Wishart and inverse-wishart distribution
 The Wishart and inverse-wishart distribution The Wishart and inverse-wishart distribution
The Wishart and inverse-wishart distributionPankaj Das
 
Econometrics 2.pptx
Econometrics 2.pptxEconometrics 2.pptx
Econometrics 2.pptxfuad80
 
Quantum physics the bottom up approach
Quantum physics the bottom up approachQuantum physics the bottom up approach
Quantum physics the bottom up approachSpringer
 
Could you please do it in R program 1. Assume that we draw n data v.pdf
Could you please do it in R program 1. Assume that we draw n data v.pdfCould you please do it in R program 1. Assume that we draw n data v.pdf
Could you please do it in R program 1. Assume that we draw n data v.pdfellanorfelicityri239
 
Interpolation techniques - Background and implementation
Interpolation techniques - Background and implementationInterpolation techniques - Background and implementation
Interpolation techniques - Background and implementationQuasar Chunawala
 
Tensor 1
Tensor  1Tensor  1
Tensor 1BAIJU V
 

Similar to stochastic notes (20)

Chapter-4 combined.pptx
Chapter-4 combined.pptxChapter-4 combined.pptx
Chapter-4 combined.pptx
 
this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...
 
Prob review
Prob reviewProb review
Prob review
 
Communication Theory - Random Process.pdf
Communication Theory - Random Process.pdfCommunication Theory - Random Process.pdf
Communication Theory - Random Process.pdf
 
Distributions
DistributionsDistributions
Distributions
 
Paper06
Paper06Paper06
Paper06
 
Appendix 2 Probability And Statistics
Appendix 2  Probability And StatisticsAppendix 2  Probability And Statistics
Appendix 2 Probability And Statistics
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programming
 
Errors in the Discretized Solution of a Differential Equation
Errors in the Discretized Solution of a Differential EquationErrors in the Discretized Solution of a Differential Equation
Errors in the Discretized Solution of a Differential Equation
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statistics
 
The Wishart and inverse-wishart distribution
 The Wishart and inverse-wishart distribution The Wishart and inverse-wishart distribution
The Wishart and inverse-wishart distribution
 
2 vectors notes
2 vectors notes2 vectors notes
2 vectors notes
 
Econometrics 2.pptx
Econometrics 2.pptxEconometrics 2.pptx
Econometrics 2.pptx
 
Quantum physics the bottom up approach
Quantum physics the bottom up approachQuantum physics the bottom up approach
Quantum physics the bottom up approach
 
Es272 ch5b
Es272 ch5bEs272 ch5b
Es272 ch5b
 
CH6.pdf
CH6.pdfCH6.pdf
CH6.pdf
 
Ch6
Ch6Ch6
Ch6
 
Could you please do it in R program 1. Assume that we draw n data v.pdf
Could you please do it in R program 1. Assume that we draw n data v.pdfCould you please do it in R program 1. Assume that we draw n data v.pdf
Could you please do it in R program 1. Assume that we draw n data v.pdf
 
Interpolation techniques - Background and implementation
Interpolation techniques - Background and implementationInterpolation techniques - Background and implementation
Interpolation techniques - Background and implementation
 
Tensor 1
Tensor  1Tensor  1
Tensor 1
 

More from cairo university

Tocci chapter 13 applications of programmable logic devices extended
Tocci chapter 13 applications of programmable logic devices extendedTocci chapter 13 applications of programmable logic devices extended
Tocci chapter 13 applications of programmable logic devices extendedcairo university
 
Tocci chapter 12 memory devices
Tocci chapter 12 memory devicesTocci chapter 12 memory devices
Tocci chapter 12 memory devicescairo university
 
Tocci ch 9 msi logic circuits
Tocci ch 9 msi logic circuitsTocci ch 9 msi logic circuits
Tocci ch 9 msi logic circuitscairo university
 
Tocci ch 7 counters and registers modified x
Tocci ch 7 counters and registers modified xTocci ch 7 counters and registers modified x
Tocci ch 7 counters and registers modified xcairo university
 
Tocci ch 6 digital arithmetic operations and circuits
Tocci ch 6 digital arithmetic operations and circuitsTocci ch 6 digital arithmetic operations and circuits
Tocci ch 6 digital arithmetic operations and circuitscairo university
 
Tocci ch 3 5 boolean algebra, logic gates, combinational circuits, f fs, - re...
Tocci ch 3 5 boolean algebra, logic gates, combinational circuits, f fs, - re...Tocci ch 3 5 boolean algebra, logic gates, combinational circuits, f fs, - re...
Tocci ch 3 5 boolean algebra, logic gates, combinational circuits, f fs, - re...cairo university
 
A15 sedra ch 15 memory circuits
A15  sedra ch 15 memory circuitsA15  sedra ch 15 memory circuits
A15 sedra ch 15 memory circuitscairo university
 
A14 sedra ch 14 advanced mos and bipolar logic circuits
A14  sedra ch 14 advanced mos and bipolar logic circuitsA14  sedra ch 14 advanced mos and bipolar logic circuits
A14 sedra ch 14 advanced mos and bipolar logic circuitscairo university
 
A13 sedra ch 13 cmos digital logic circuits
A13  sedra ch 13 cmos digital logic circuitsA13  sedra ch 13 cmos digital logic circuits
A13 sedra ch 13 cmos digital logic circuitscairo university
 
A09 sedra ch 9 frequency response
A09  sedra ch 9 frequency responseA09  sedra ch 9 frequency response
A09 sedra ch 9 frequency responsecairo university
 
5 sedra ch 05 mosfet revision
5  sedra ch 05  mosfet revision5  sedra ch 05  mosfet revision
5 sedra ch 05 mosfet revisioncairo university
 
Lecture 2 (system overview of c8051 f020) rv01
Lecture 2 (system overview of c8051 f020) rv01Lecture 2 (system overview of c8051 f020) rv01
Lecture 2 (system overview of c8051 f020) rv01cairo university
 
Lecture 1 (course overview and 8051 architecture) rv01
Lecture 1 (course overview and 8051 architecture) rv01Lecture 1 (course overview and 8051 architecture) rv01
Lecture 1 (course overview and 8051 architecture) rv01cairo university
 

More from cairo university (20)

Tocci chapter 13 applications of programmable logic devices extended
Tocci chapter 13 applications of programmable logic devices extendedTocci chapter 13 applications of programmable logic devices extended
Tocci chapter 13 applications of programmable logic devices extended
 
Tocci chapter 12 memory devices
Tocci chapter 12 memory devicesTocci chapter 12 memory devices
Tocci chapter 12 memory devices
 
Tocci ch 9 msi logic circuits
Tocci ch 9 msi logic circuitsTocci ch 9 msi logic circuits
Tocci ch 9 msi logic circuits
 
Tocci ch 7 counters and registers modified x
Tocci ch 7 counters and registers modified xTocci ch 7 counters and registers modified x
Tocci ch 7 counters and registers modified x
 
Tocci ch 6 digital arithmetic operations and circuits
Tocci ch 6 digital arithmetic operations and circuitsTocci ch 6 digital arithmetic operations and circuits
Tocci ch 6 digital arithmetic operations and circuits
 
Tocci ch 3 5 boolean algebra, logic gates, combinational circuits, f fs, - re...
Tocci ch 3 5 boolean algebra, logic gates, combinational circuits, f fs, - re...Tocci ch 3 5 boolean algebra, logic gates, combinational circuits, f fs, - re...
Tocci ch 3 5 boolean algebra, logic gates, combinational circuits, f fs, - re...
 
A15 sedra ch 15 memory circuits
A15  sedra ch 15 memory circuitsA15  sedra ch 15 memory circuits
A15 sedra ch 15 memory circuits
 
A14 sedra ch 14 advanced mos and bipolar logic circuits
A14  sedra ch 14 advanced mos and bipolar logic circuitsA14  sedra ch 14 advanced mos and bipolar logic circuits
A14 sedra ch 14 advanced mos and bipolar logic circuits
 
A13 sedra ch 13 cmos digital logic circuits
A13  sedra ch 13 cmos digital logic circuitsA13  sedra ch 13 cmos digital logic circuits
A13 sedra ch 13 cmos digital logic circuits
 
A09 sedra ch 9 frequency response
A09  sedra ch 9 frequency responseA09  sedra ch 9 frequency response
A09 sedra ch 9 frequency response
 
5 sedra ch 05 mosfet.ppsx
5  sedra ch 05  mosfet.ppsx5  sedra ch 05  mosfet.ppsx
5 sedra ch 05 mosfet.ppsx
 
5 sedra ch 05 mosfet
5  sedra ch 05  mosfet5  sedra ch 05  mosfet
5 sedra ch 05 mosfet
 
5 sedra ch 05 mosfet revision
5  sedra ch 05  mosfet revision5  sedra ch 05  mosfet revision
5 sedra ch 05 mosfet revision
 
Fields Lec 2
Fields Lec 2Fields Lec 2
Fields Lec 2
 
Fields Lec 1
Fields Lec 1Fields Lec 1
Fields Lec 1
 
Fields Lec 5&amp;6
Fields Lec 5&amp;6Fields Lec 5&amp;6
Fields Lec 5&amp;6
 
Fields Lec 4
Fields Lec 4Fields Lec 4
Fields Lec 4
 
Fields Lec 3
Fields Lec 3Fields Lec 3
Fields Lec 3
 
Lecture 2 (system overview of c8051 f020) rv01
Lecture 2 (system overview of c8051 f020) rv01Lecture 2 (system overview of c8051 f020) rv01
Lecture 2 (system overview of c8051 f020) rv01
 
Lecture 1 (course overview and 8051 architecture) rv01
Lecture 1 (course overview and 8051 architecture) rv01Lecture 1 (course overview and 8051 architecture) rv01
Lecture 1 (course overview and 8051 architecture) rv01
 

Recently uploaded

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 

Recently uploaded (20)

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 

stochastic notes

  • 1. 1 These notes contain comments on Orfanidis book, Wiener Filter, adaptive filter, Karhunen- Loeve expansion, Kalman Filter, Blind Deconvolution, and others. Introduction to Random Variables In this section, we present a short review of probability concepts. Let x be a random variable that lies in the range  x , and has probability density function (pdf) f(x). Its mean m, variance 2  , and nth moment are defined by the expectation values:       dxxxfxEm )(  x           dxxfxExxExE )( 222   x       dxxfxxE nn )(  x Notice that f(x), and  n xE are all deterministic quantities. For N realizations of the random variable x, 0x , 1x , …, 1Nx we have:         1 0 1 )( N i ix N dxxxfxEm          1 0 1 )( N i n i nn x N dxxfxxE The probability that the random variable x will assume a value within an interval of values [a, b] is given by:    b a dxxfbxaob )(Pr A commonly used model for the pdf f(x) of the random variable x is the Gaussian or normal distribution which is given as:
  • 2. 2              2 2 2 2 1 exp 2 1 )(  xEx xf  x In typical signal processing problems, such as designing filters to remove or separate noise from signal, it is often assumed that the noise interference is Gaussian. Joint and Conditional Densities, and Bayes’ Rule: In many situations we deal with more than one random variable i.e. random vectors. A pair of two different random variables  21, xxx  may be thought of as a vector-valued random variable. Its statistical description requires the knowledge of the joint probability density function (jpdf)    21, xxfxf  . The two random variables may or may not be independent of each other. A quantity that provides a measure for the degree of dependence of the two variables on each other is the conditional density )/( 21 xxf which is the pdf of 1x given 2x . The conditional pdf, the jpdf, and the pdf are given by the Bayes’ rule:          11222121 /)(/, xfxxfxfxxfxxfxf  More generally, Bayes’ rule for two events A and B is given as: )(Pr)/(Pr)(Pr)/(Pr),(Pr AobABobBobBAobbAob  The two random variables 1x and 2x are independent of each other if:        2121, xfxfxxfxf  i.e.    121 / xfxxf  The correlation between 1x and 2x is defined by the expectation value           21212121 ,21 dxdxxxfxxxxER xx ,  1x ,  2x For N realizations of the random variables 1x , 10x , … 1,1 Nx , and 2x , 20x , … 1,2 Nx , we have:
  • 3. 3              1 0 2121212121 1 ,21 N i iixx xx N dxdxxxfxxxxER When 1x and 2x are independent the correlation is the product of the expected values i.e.       212121 ,21 xandxxExExxER xx  are independent Example: Assume that vxx  12  where  is a deterministic quantity and v is a Gaussian random variable with zero mean and variance 2 v . We need to find the conditional pdf  12 / xxf . For a given value of 1x we treat the term “ 1x ” as if it is a deterministic quantity and the only randomness occurs due to v. Since v is Gaussian, then the conditional pdf of 2x is also Gaussian but with mean value 1x and variance 2 v . Thus we get:                2 2 11 212 2 1 exp 2 1 / vv axx xxf  ,  1x ,  2x The concept of a random vector generalizes to any dimension. A vector of N random variables              Nx x x x ... 2 1 is completely described if we know the joint pdf    Nxxxfxf ,...,, 21 . The first order statistics is the mean m . The second-order statistics of x are its correlation matrix R, and its covariance matrix, defined by:  xEm   T xxER     T mxmxE 
  • 4. 4 where the superscript T denotes transpose. The ijth element of the correlation matrix R is the correlation between the ith random variable ix and the jth random variable jx , that is,  jiij xxER  . It is easily shown that the covariance and correlation matrices are related by T mmR  . When the mean is zero, R and Σ coincide. Both R and Σ are symmetric positive semi definite matrices. Example: The probability density of a Gaussian random vector  T Nxxxx ...21 is completely specified by its mean m and covariance matrix Σ, that is,                   mxmxxf T N 1 2/ 2 1 exp det2 1 )(  Example: Under a linear transformation, a Gaussian random vector remains Gaussian i.e. a linear function of Gaussian is Gaussian. Let  T Nxxxx ...21 be a Gaussian random vector with mean xm , and covariance matrix x . The linearly transformed vector xB , B is a nonsingular N×N matrix, is Gaussian-distributed with mean and covariance given by xmBm  , and T xBB . These relations (mean and covariance) are valid also for non-Gaussian random vectors. They are easily derived as follows:    xEBE  ,     TTT BxxEBE  .
  • 5. 5 Random Signal Models A stochastic process is a collection of random variables; a random vector. The index could be time, space, volume or others. In this work, we focus on time domain index and on stationary processes. A stationary process is a process where, generally speaking, the statistical properties are not function of time. Thus, the vector  T Nxxxx ...21 is a sample in time of a stochastic process X(n) observed at instants X(1), X(2), …, X(N). Notice that we use capital letter when dealing with stochastic processes. One of the most useful ways to model a random signal is to consider it as being the output of a causal and stable linear filter C(z) driven by a stationary uncorrelated (white-noise) sequence )(n , sometimes we use the notations n , where      0 )( n n n zczC ,   )()()()( 2 kkRknnE    ,      elsewhere,0 0k,1 )(k is the delta function. Thus,   n i in icnnCnX 0 )()(*)()(  where “*” is the convolution operation. The above model is termed moving average (MA) model. Another common model is the autoregressive moving average (ARMA) model where C(z) is the ratio of two polynomials in z i.e. )( )( )( )( z zA zB zX  i.e. )()()()( zzBzXzA  and in the time domain it has the shape   j j i i jnbinXa )()( 
  • 6. 6 Example: )()( 1 10 2 2 1 10 z zaa zbzbb zX      , then in the time domain we have )2()1()()1()( 21010  nbnbnbnXanXa  When the order of the numerator B(z) is zero ( 0b is nonzero and the rest of parameters are zero), we have what is termed autoregressive (AR) model i.e. )()1()( 010 nbnXanXa  . Maximum Likelihood Estimation: Once the model is chosen, we use the data to find the model parameters. One of the most commonly used methods is the maximum likelihood method because it yields unbiased estimates of the parameters. We start with a simple AR(1) model, only one lag in X, and then generalize the results to more than one lag. Assume that we have a set of N+1 data points, X(0), X(1), X(2), … X(N). The signal is modeled as an AR(1) process; )()1()( 1 nnXanX  For a given value of X(n-1), and assuming )(n is Gaussian with zero mean and variance 2  , the conditional pdf  )1(/)( nXnXf becomes:     2 2 2 )1()( 2 1 exp 2 1 )1(/)(     naXnX nXnXf For the maximum likelihood method, the basic idea is to find the jpdf, ))(),...2(),1(),0(( NXXXXf , of the observations and find the estimates for the unknowns that maximize this likelihood function.      ...)0(),1(),2(/)3()0(),1(/)2())0(/)1(()0())(),...2(),1(),0(( XXXXfXXXfXXfXfNXXXXf  Define    T NXXXXN )(),...2(),1(),0(X , using Baye’s theory, the joint pdf is:
  • 7. 7            1)0(),...1(/)( )0(),...1()0(),...1(/)())(),...2(),1(),0((   NfXNXNXf XNXfXNXNXfNfNXXXXf X X Similarly              2)0(),...2(/)1( )0(),...2()0(),...2(/)1(1)0(),...1(   NfXNXNXf XNXfXNXNXfNfXNXf X X And we continue till we get to the initial conditions at X(m), X(m-1),…,X(0). This will yield            1 /)1()(),...1(),0())(),...2(),1(),0(( N mi iiXfmXXXfNfNXXXXf XX Where  )(),...1(),0( mXXXf is considered to be initial conditions and could be known or be neglected for large amount of data i.e. N>>m. For AR(1) model, X(n) is related only to X(n-1) and we use the Baye’s formula to find a reasonable expression for the jpdf as follows:      ...)2(/)3()1(/)2())0(/)1(()0())(),...2(),1(),0(( XXfXXfXXfXfNXXXXf       N n nXnXfXf 1 )1(/)()0( Substituting for the derived expression for  )1(/)( nXnXf , we get:        N n naXnX XfNXXXXf 1 2 2 2 )1()( 2 1 exp 2 1 )0())(),...2(),1(),0((            N n N naXnX Xf 1 2 2 2/2 )1()( 2 1 exp2)0(     Taking the log of both sides we get the log likelihood function  as:            N n naXnX NN XfNXXXXf 1 2 2 2 )1()( 2 1 log 2 2log 2 )0(log))(),...2(),1(),0((log    
  • 8. 8 Maximizing this quantity, ignoring  )0(log Xf , with respect to the unknowns, a and 2  , we get the desired results. Thus,        N naa nXnXanX a 1 2 ˆ,ˆ )1()1(ˆ)( ˆ2 2 0 22     which is reduced to :   0)1()1(ˆ)( 1  N n nXnXanX Rearrange we get:        N n N n nX nXnX a 1 2 1 )1( )1()( ˆ Similarly, for 2  , we get:        N naa nXanX N 1 2 222 ˆ,ˆ 2 )1(ˆ)( ˆ2 1 ˆ2 0 22     Rearrange we get:      N n nXanX N 1 22 )1(ˆ)( 1 ˆ We use the same approach to find the parameters of MA(1), “1” means only one lag in  . Assume that the signal is modeled as: )1()()( 1  nbnnX  where )(n are independent identically distributed random variables with zero mean and variance 2  . We also assume that 0)0(  . Since X(n) is a sum of two independent zero mean Gaussian random variables, then it is also Gaussian with zero mean and variance equals the sum of the two variances. Thus, )(nX ~   22 1 2 ,0   bN  , where ~ means the distribution, N stands for normal distribution.
  • 9. 9 We need to find the joint pdf ))(),...2(),1(),0(( NXXXXf . In many situations, the pdf of X(0) will be ignored. Using Baye’s theory, the joint pdf is:      ...)0(),1(),2(/)3()0(),1(/)2())0(/)1(()0())(),...2(),1(),0(( XXXXfXXXfXXfXfNXXXXf  We need to find each element of the joint pdf. Let us start with  )0(/)1( XXf n=1: )1()0()1()1( 1   bX , since )0( is zero by assumption Thus,   2 2 2 2 )1( exp 2 1 ))1(()0(/)1(   X XfXXf  n=2: )1()2()2( 1 bX  , )1()2()1()2()2( 11 XbXbX   Rearrange we get: )2()1()2( 1  XbX For a given value of X(1) and X(0),       2 2 1 22 2 1 2 2 )1()2( exp 2 1 2 )1()2( exp 2 1 )0(),1(/)2(     XbXbX XXXf     n=3:  )1()2()3()2()3()3( 111 XbXbbX   , i.e.   )1()2()3()1()2()3()2()3()3( 2 11111 XbXbXXbXbXbX   For a given value of X(2), X(1) and X(0), and we also know for certainty )2( because )1()2()2( 1 XbX  , we get:      2 2 11 2 2 )1()2()3( exp 2 1 )0(),1(),2(/)3(   XbXbX XXXXf   n=4:  )1()2()3()4()3()4()4( 2 1111 XbXbXbbX   , i.e.    )1()2()3()4()1()2()3()4()3()4()4( 3 1 2 111111 XbXbXbXXbXbXbXbX   For a given value of X(3), X(2), X(1), and X(0)
  • 10. 10      2 22 111 2 2 )1()2()3()4( exp 2 1 )0(),1(),2(),3(/)4(   XbXbXbX XXXXXf   and in general:     2 2 1 2 2 )1()( exp 2 1 )0(),...2(),1(/)(       nbnX XnXnXnXf where the expression for )1( n will be more complicated. We can continue in this way and find the joint pdf. Instead we use the independence of )(n to find the joint pdf of the observations as function of the joint pdf of )(n as follows:                                      )( ... )2( )1( 1...0 ...01 0...01 )( )2( )1( 1 1 Nb b NX X X    i.e. BX  The jpdf of  is given as:                     1 2/ 2 1 exp det2 1 )( T N f where  T N)(...)2()1(   ,  T NXXXX )(...)2()1( and I2   . Since BX  i.e. a linear function of a Gaussian vector, then it is also Gaussian and     0 BEXE , T X BB  Thus,                XXXf X T X N 1 2/ 2 1 exp det2 1 )( 
  • 11. 11 where T X BB  . Maximizing the jpdf )(Xf with respect to the unknowns 1b and 2  we get the desired maximum likelihood estimates. The process could be repeated for higher order MA models. We now develop the maximum likelihood method for ARMA(1,1). Assume that the data is modeled as: )()1()1()( 11 nnbnXanX   Define )1()()( 1  nXanXny , and assume that X(0)=0, then                                        )( ... )2( )1( 1...0 ...01 0...01 )( )2( )1( 1 1 NX X X a a Ny y y i.e. XAy  or yAX 1  If we are able to get the joint pdf of y(n) we could get the joint pdf of X(n) as we shall see. As before, it is assumed that 0)0(  and X(0) = 0. n=1: )1()1()0()0()1()1( 11   bXaXy n=2: )2()1()1()2()2( 11   bXaXy and in general we get:                                      )( ... )2( )1( 1...0 ...01 0...01 )( )2( )1( 1 1 Nb b Ny y y    i.e. By  Thus, BAyAX 11   is a zero mean Gaussian vector. and                XXXf X T X N 1 2/ 2 1 exp det2 1 )( 
  • 12. 12 where    T X BABA 11    . Maximizing the jpdf )(Xf with respect to the unknowns 1a , 1b and 2  we get the desired maximum likelihood estimates. The process could be repeated for higher order ARMA models. Matrix inversion: In general, matrix inversion is time consuming and not easy to find. In some special cases, as we have, inversion of a matrix is straightforward. Example: consider the upper triangular matrix                 1000 100 010 001 1 2 3 a a a A , its inverse is              1000 100 10 1 1 122 123233 1 a aaa aaaaaa A and                1 01 001 0001 112123 223 31 aaaaaa aaa a A T Since                 100 010 001 0001 1 2 3 a a a AT is lower triangular matrix, and since we know that    TT AA 11   , and   IAA TT  1 i.e.                               1000 0100 0010 0001 100 010 001 0001 1 2 31 a a a AT then  1T A is also lower triangular matrix=             1 01 001 0001 434241 3231 21 bbb bb b . By inspection, second row times first column yields: 0321  ab . Then 321 ab  Third row times second column yields: 0232  ab . Then 232 ab 
  • 13. 13 Fourth row times third column yields: 0143  ab . Then 143 ab  Third row times first column yields: 0323133231  aababb . Then 3231 aab  Fourth row times second column yields: 0214224342  aababb . Then 2142 aab  Fourth row times first column yields: 03214134241  aaababb . Then 32141 aaab  This gives the complete inverse. ARCH and GARCH Models: In some situations, the random quantity is not independent from the observations and depends on the data itself. This happens when the variance of the error term is not constant. Consider the AR(p) model with ARCH(m) disturbance given as: )()()(...)2()1()( 21 nnhpnXanXanXanX p  and )(...)2()1()( 22 2 2 10 MnXcnXcnXccnh M  How to Choose a Model; the Akaike information criterion (AIC): Assuming a stationary signal, one is usually confronted with question of which is the best model for the data. The Akaike information criterion (AIC) was developed for this purpose. Let M be the number of estimated parameters of the model. Let  be the maximum value of the log likelihood function (log of joint pdf of observations after maximization), then the AIC is defined as: 22  MAIC Given a set of data, we try several models and we select the model that minimizes the AIC.
  • 14. 14 Hypothesis Testing: In its simplest form, we have two sources of a signal and we receive only one noisy version. The basic component of a simple decision theory problem is to choose between the two sources based on the observations we receive. If we decide on the null hypothesis 0H this means that the source is the first signal. If we decide on the alternative hypothesis 1H this means that the source is the second signal. For example, we could receive an EKG signal and we need to decide whether the patient is normal 0H or sick 1H . In EEG we need to identify the presence of Epileptic danger 1H or the patient is normal 0H … etc. Consider a set of N observations  T NXXXX )(...)2()1( and we need, based on these observations, to decide which hypothesis is true 0H or 1H . To do that we find the joint pdf of the observations under both hypothesis i.e. we need )/( 0HXf and )/( 1HXf . We then develop the likelihood function )(X defined as )/( )/( )( 0 1 HXf HXf X  . Notice that )(X is a random quantity but scalar. The likelihood ratio test is to decide 1H if  )(X where  is a threshold value, and to decide 0H otherwise. If both hypothesis are equally likely,  is set to 1. Example: We receive a set of N noisy measurements that are independent and identically distributed Gaussian random variables with known mean m under the hypothesis 1H , and zero mean under the hypothesis 0H . This is stated as follows:
  • 15. 15 )()(:1 inmiXH  , i=1, 2, …, N )()(:0 iniXH  , i=1, 2, …, N Where )(in is Gaussian white noise with zero mean and 2  variance i.e.          2 2 2 2 )( exp 2 1 )(  in inf Thus,        2 2 20 2 )( exp 2 1 )/)((  iX HiXf and             2 2 21 2 )( exp 2 1 )/)((  miX HiXf Since the observations are independent we get:         N i iX HXf 1 2 2 20 2 )( exp 2 1 )/(  and              N i miX HXf 1 2 2 21 2 )( exp 2 1 )/(  The likelihood ratio is:                         N i N i iX miX HXf HXf X 1 2 2 2 1 2 2 2 0 1 2 )( exp 2 1 2 )( exp 2 1 )/( )/( )(   After some manipulations and taking the natural log we get:
  • 16. 16 2 2 1 2 2 )()(ln  Nm iX m X N i   When a set of new data arrives, we substitute their values in the expression of )(ln X . If )(ln X > ln we decide 1H else we decide 0H . End of example. Composite Hypothesis or the Generalized Likelihood Test: In some situations, some parameters are unknown and we still need to make a decision about the source of the signal. Specifically assume that, under 0H , the vector of unknown parameters is 0 and under 1H , the vector of unknown parameters is 1 . In this case we have a set a training data from which we estimate the unknown parameters under both hypothesis. When a new data arrives and we need to decide which hypothesis is true, we simply substitute the values of the new data in the generalized likelihood ratio )(Xg . In summary: ),/( ),/( )( 00 11 max max 0 1 HXf HXf Xg      The likelihood ratio test is to decide 1H if  )(Xg where  is a threshold value, and to decide 0H otherwise. If both hypothesis are equally likely,  is set to 1.
  • 17. 17 Example: Here the mean, under 1H , is unknown. Under 0H the mean is zero. When the data arrives, we estimate the mean, m, as:   N i iX N m 1 )( 1 ˆ . The generalized likelihood ratio becomes                                     N i N i N i m g iX iX N iX HXf HmXf X 1 2 2 2 1 2 2 1 2 0 1 2 )( exp 2 1 2 )( 1 )( exp 2 1 )/( ),/( )( max   The likelihood ratio test is to decide 1H if  )(Xg where  is a threshold value, and to decide 0H otherwise. Hypothesis Testing for Stochastic Processes: In some applications, as in brain computer interface (BCI), we receive a signal X(t) and we need to decide whether the signal represents a forward or a backward command or others. Usually we have a training set of data for each hypothesis. If the signal is stationary, one option is to expand the training signals using Karhunen-Loeve expansion that converts the signals into a set of random variables (see the section on Karhunen-Loeve expansion). When a new set of data arrives and we need to decide to which hypothesis it belongs. We substitute their values in the likelihood ratio formula and decide to which group the new set of data belongs. In this analysis, we assume that we have zero mean stochastic process. A stochastic process, under hypothesis jH , )(tX j is expanded in terms of orthonormal basis )(tj i as:
  • 18. 18  TtttX i j i j i j ,0,)()( 1      where  T jj i j i dttXt 0 )()( ik T j k j i dttt  0 )()( The basis )(tj i are chosen such that the random coefficients j i are uncorrelated; Viz:   ik j i j k j iE   For the process )(tX j , define the covariance function ),( utK j as:  )()(),( uXtXEutK jjj  The covariance function, for uncorrelated j i , satisfies the Fredholm integral equation of the second kind: 0),()(),( 0  j i j i j i T j i j tduuutK  For a stationary process,   )()()(),( utKuXtXEutK jjjj  , and it is much easier to find the orthogonal basis. Assuming that j i is a normal random variable and we know that it has zero mean and variance j i , the pdf is given as:
  • 19. 19                  2 2 2 2 exp 2 1 j i j i j i j if     Define  Tj N jj  ,...,1 ,the joint pdf of the observations, under jH , is given as:                N i j i j i j i j j Hf 1 2 2 2 2 exp 2 1 )/(     When a new set of data, X(t), arrives and we need to know to which hypothesis it belongs, we simply find the different values  T j i j i dttXt 0 )()( for the different hypothesis jH . We then calculate the maximum of the joint pdf                N i j i j i j i j j j Hf 1 2 2 2 2 exp 2 1 )/(max     . The maximum of the joint pdfs is where the hypothesis is true.
  • 20. 20 Least Square Estimates Suppose that we have two random variables X and Y that are related to each other with joint pdf      yfyxfyxfYXf YYXYX /,),( /,  . When Y=y the random variable X=x, where y and x are deterministic values. Our estimate of X, xˆ , is some nonlinear function of y; h(y) i.e. )(ˆ yhx  , and the error in the estimate is )(ˆ yhxxx  . The mean square error (m.s.e) is thus given as:            dxyhxyxfdyyfdxdyyhxyxfesm YXYYX 2 / 2 , )(/)(,.. By conditioning on Y, the term  2 )(yhx  becomes deterministic in y, but x is still random, and we would be able to use the Riemann calculus to find the minimum w.r.t h(y). If we have conditioned on X, we would not be able to minimize w.r.t. h(y) using Riemann calculus and we need to use another calculus that deals with random quantities. We need to find an estimate for h(y) that minimizes the m.s.e. This is done by looking for     dxyhxyxf YX yh 2 / )( )(/min w.r.t. h(y). The result is:     0)(/ )( .. )( 2 /        dxyhxyxf yh esm yh YX Which yields        0)(/2/2)(/2 ///   dxyhyxfdxyxxfdxyhxyxf YXYXYX And since     )(/)()(/ // yhdxyxfyhdxyhyxf YXYX   then      YXEdxyxxfyh YX //)( / If we have two random variables 1Y and 2Y and we need to find an estimate of X based on observations of these two random variables i.e. ),(ˆ 21 yyhx  . As before, by minimizing the m.s.e we getan estimate of the function ),( 21 yyh as:
  • 21. 21      2121,/21 ,/,/),( 21 YYXEdxyyxxfyyh YYX For n observed values of the n random variables 1Y , 2Y , …, nY we have:      nnYYYXn YYYXEdxyyyxxfyyyh n ,...,,/,...,,/),...,( 2121,...,,/21 21 For a stochastic process  Y and sampling at intervals, then on the limit:     n n YYYXfYXf ,...,,/lim/ 21    )(ˆ yhx  . As before, by minimizing the m.s.e we get an estimate of the function  )( yh as:              YXEdxyxxfyh YX //)( / Linear Least Squares Estimates [Kailath; 2000] In many useful applications, obtaining the conditional expectation is very difficult. Thus, one has to resort to some other suboptimal approaches for the estimation. A common approach is the linear least square estimate where the estimated value Xˆ of a random variable X is linearly related to another random variable Y. Specifically; ghYX ˆ Where h and g are unknown but deterministic values and chosen to minimize the mean square error m.s.e given by:              XYhEYhgEXgEgYEhXEghYXEesm 222.. 22222  Minimizing the m.s.e. w.r.t g and h yields:     YX mhmYEhXEg ˆˆˆ          YmgXYEYEgXYEYEh ˆˆˆ 2 
  • 22. 22 The above two equations have the matrix format:                      XYE m h g YEm m X Y Y ˆ ˆ1 2 Since       YXXYXY mmXYEmXmYE  , then   YXXY mmXYE   . Similarly since      2222 YYY mYEmYE  ,then   222 YY mYE   Substitutingwe get:                       YXXY X YYY Y mm m h g mm m  ˆ ˆ1 22 Invertingthe matrix we get:                        YXXY X Y YYY Y mm m m mm h g    1 1 ˆ ˆ 22 2 Solving we get:    2 2 / 1ˆ YXYYXXYXY Y mmmmh          and            22 2 22 2 / 11 ˆ YXYYXXYYXY Y YXXYYXYY Y mmmmmmmmmg                  Thus,   YYXYX mYmX  2 /ˆ  The corresponding m.s.e. is:  222 /... YXesm   Under general conditions, we could interchange expectations and derivations. Thus, we could use the derivative operator inside the expectations as follows:
  • 23. 23       ghYXEghYX g E g esm              20 .. 2 Which yields       YhEgghYEXE  i.e.     YX mhmYEhXEg ˆˆˆ  Similarly       ghYXYEghYX h E h esm              20 .. 2 Which yields         2 YhEYgEghYYEYXE  i.e.         YmgXYEYEgXYEYEh ˆˆˆ 2  For a vector of observations  T nYYYY ,..., 21 , and a vector of coefficients  T nhhhh ,..., 21 , a scalar X is estimated as: gYhX T ˆ The least m.s.e. is derived as before as:  YYYYXX mYRRmX  1ˆ Where    T YXYX mYmXER  ,    T YYYY mYmYER  The corresponding m.s.e. is given as: XYYYYXXX RRRResm 1 ...   Assume that we have an observation period  ba, where we measure a scalar stochastic process Y(t). We need to find linear least square estimate Xˆ of the random variable X based on this observation. In this analysis we shall assume zero mean value for all random variables involved. As before,  b a dYhX  )()(ˆ
  • 24. 24 The filter h(t) is obtained as before as:  b a YYXY dtRhtR  ),()()(  bat , Where  )()(),(  YtYEtRYY  , and assuming zero mean for Y(t). The m.s.e. is now given as:  b a b a YYX dvdvtRvhhesm  ),()()(... 2 Geometric Interpretation of Random Variables: We shall assume that all the random variables we are dealing with are zero mean. This will only facilitate the analysis. The results, with minor changes, will be valid for non zero mean random variables. We could think of a random variable X as a vector in some abstract space with inner product defined as:  XYEYX , For stochastic processes X(t) and Y(t), defined on the interval  ba, , the inner product is defined as:   b a YXEYX )()(,  Thus, for  i iiYhXˆ we need to find an estimate of ih such that the error  XX ˆ is the orthogonal to the observation space made from the observations iY . i.e. j i ii YYhX         , where  means orthogonal. In terms of the inner product
  • 25. 25 j i ii YYhX         means 0,                      j i iij i ii YYhXEYYhX j i.e.  i jiij YYhYX ,, If the observations are a stochastic process Y(t), then the estimate of X is obtained as:  b a dYhX  )()(ˆ Again, the error is orthogonal to the observations. Thus,   )(ˆ tYXX  i.e. 0)(,)()(           tYdYhX b a   b a dYhtYtYX  )()(),()(, Which means                    b a b a b a dtYYEhdtYYhEdYhtYEtXYE  )()()()()()()()()()( In terms of correlation, the above expression is:  b a YYXY dtRhtR  ),()()( This is exactly the same results obtained before. The Multivariate Case: For the estimation of a vector of random variables  T nXXXX ,...,, 21 based on the observations of m stochastic processes         T m tYtYtYtY ,...,, 21 , we follow the same route as before but now )(H is a matrix that satisfies the equation:
  • 26. 26  b a YYYX dtRHtR  ),()()( And the linear least square estimate Xˆ satisfies the equation:  b a dYHX  )()(ˆ Gram-Schmidt Orthogonalization: Assume that we have a set of random variables 1Y , … MY that could be correlated. We need to find the orthogonal basis of the space spanned by these random variables. We call these basis 1 , … M , and they are found from the observations 1Y , … MY . The basic idea is to select 1 = 1Y . We then take 2Y and decompose it into two components 2 plus an error term such that 2 is orthogonal to 1 i.e.   0, 2121   E . We then move to 3Y and decompose it into 3 plus two terms such that   0, 3131   E , and   0, 3232   E . We repeat this process M times till we get the total new M basis that are orthogonal to each other. Specifically 21212  YhY , 1211/22 ˆˆ YhYY  i.e. 121212122  hYYhY  Where 1/2 ˆY is the estimate of 2Y given the observation 1Y and 21h is unknown to be estimated in what to follow. Since the error is orthogonal to the observations, then   112122 YYhY  i.e.      0112122  YYhYEE  Using linear least square estimation and 11 Y we get:          12 112 2 11221 /    EYEYEYYEh Thus,      1 12 11222    EYEY
  • 27. 27 For 3Y we find an estimate 2/33 ˆˆ YY  as a linear combination of 1 and 2 Viz; 32321313   hhY , 2321313 ˆ  hhY  i.e. 23213133  hhY  Since the error is orthogonal to the observations, then   2123213133 ,  hhY i.e.    012321313   hhYE , and    022321313   hhYE Using linear least square estimation we get:          12 113 2 11331 /    EYEYEYYEh and          12 223 2 22332 /    EYEEYEh Thus,           2 12 2231 12 11333    EYEEYEY In general we get:      MnEYEY n i iiinnn      2, 1 1 12  Notice that           1 1 12 1/ ˆˆ n i iiinnnn EYEYY  i.e. nnnn YY  1/ ˆ In matrix notations, one could represent the observations in terms of the orthogonal basis as:                                      MMMM hh hh h Y Y Y    ... 0...... 0...1 0...01 0......01 ... 2 1 21 3231 212 1 Define the vectors  T nn YYYY ,...,, 21 and  T nn  ,...,, 21 , then nnn HY  , and nnn YH 1 
  • 28. 28         1 1 111 1 1 12 1/ ˆˆ          n T nn T nn n i iiinnnn YEEYEYY  Substituting 1 1 11     nnn YH , we get:       1 1 1 1 1 111 1 1 1 111/ ˆ            nn T n T nnn T n T nnnn YHHYYHHYYEY           1 1 1 11 1 1 11 1 1 1 1 11             nnn T nn T n T n T nn YHHYYHHYYE    1 1 111    n T nn T nn YYYYYE And    1 1 1111/ ˆ     n T nn T nnnnnnn YYYYYEYYY Thus, we were able to get the orthogonal basis, n , in terms of the observations  T nn YYYY ,...,, 21 . Discrete Time Recursive Estimation: We have a random variable X that we need to estimate based on observations 0Y , 1Y , … In this approach we, recursively, estimate X and update the estimate as new data comes in. Assume that we have the linear least square estimate 1/ˆ,..,/ˆ 110    kXYYYX k based on observations 0Y , 1Y , … 1kY . We now have a new observation kY and we need to update our estimate and find kX /ˆ in terms of 1/ˆ kX . In the updating of the estimate we only use new information in the new data. This new information is called innovation and is obtained as follows: Define the innovation 1/ ˆ  kkkk YY , with 01/000 ˆ YYY   , where 1/ ˆ kkY is the estimate of kY given the previous observations 0Y , 1Y , … 1kY . Clearly k is uncorrelated with the
  • 29. 29 previous random variables 0Y , 1Y , … 1kY and k is regarded as the new information in the random variable kY . This suggests that: 1/ˆ/ˆ  kXkX + linear least square estimate of X given only k . As before, linear least square estimate of X given only      kkkkk EXE  1  . Collecting terms we get:      kkkk EXEkXkX  1 1/ˆ/ˆ   We could also derive the same results starting from the innovation sequence 0 , 1 , … k which are uncorrelated with each other and form the basis for the space spanned by 0Y , 1Y , … kY . Thus, kX /ˆ linear least square estimate of X given 0 , 1 , … k .        k i iiii EXE 0 1            kkkk k i iiii EXEEXE  1 1 0 1            kkkk EXEkX  1 1/ˆ   This result was obtained before. Instead of estimating just a single random variable X, we need to estimate the values of the stochastic process lX given the observations of the stochastic process 0Y , 1Y , … kY . We use the above equation to get:        k i iiiill EXEkX 0 1 /ˆ 
  • 30. 30           kkkkl k i iiiil EXEEXE  1 1 0 1       Thus we are interested in klX / ˆ linear least square estimate of X given 0 , 1 , … k .      kkkklkl EXEX  1 1/ ˆ    Notice that l could be greater than, less than or equal to the value k. We need to find the relation  klXE  and this could come from the observation equation or others. For example, it is not uncommon to have the observation equation: kkkk vXHY  Where kv is additive white Gaussian noise, and kH is a matrix with proper dimensions. In order to find  klXE  , we use   kkkkkkkkkkkkkkk vXXHvXHXHYY   1/1/1/ ˆˆˆ and thus         T kkkkkkkkkkkk vXXHvXXHEE   1/1/ ˆˆ      kk T k T kkkkkkk vvEHXXXXEH   1/1/ ˆˆ Also        T kkkkkkkk T kkkk XXvXXHEXXE 1/1/1/ ˆˆˆ     T kkkkkkk XXvEPH 1/1/ ˆ   01/  kkk PH where the error variance    T kkkkkkkk XXXXEP 1/1/1/ ˆˆ   Notice that since 0ˆ 1/0 X , then        TT XXEXXXXEP 001/001/001/0 ˆˆ   For l =k, we have:        k i iiiikkk EXEX 0 1 / ˆ  , and           1 0 1 1/ ˆ k i iiiikkk EXEX 
  • 31. 31      T kkkkkkkk vXXHXEXE  1/ ˆ Substitute 1/1/ ˆˆ   kkkkkk XXXX , we get:        T kkkkkkkkkkkk vXXHXXXEXE   1/1/1/ ˆˆˆ         T kkkkkkk T kkkkkkkk vXXHXvXXHXXE   1/1/1/1/ ˆˆˆˆ    T kkkkkkk T kkk vXXHXEHP   1/1/1/ ˆˆ Notice that 1/ ˆ kkX is, by assumption, independent of kv and, by derivation, is orthogonal to the error  1/ ˆ  kkk XX i.e.     000ˆˆ 1/1/   T kkkkkkk vXXHXE . Thus,   k T kkkkk KHPXE  1/ We now need to find a recursive relation for the error covariance 1/ kkP . We know that:      kkkkkkkkk EXEXX  1 1// ˆˆ       kkkkkk EKX  1 1/ ˆ    , 0ˆ 1/0 X     1/ 1 1/1/1/ ˆˆ     kkkkkk T kkkk T kkkkk XHYvvEHPHHPX The above is the updating equation of the estimate when new data, kY , arrives. We need an updating equation for the error covariance, kkP / , when new data arrives. We know that:    T kkkk T kkkk KPHXXE   1/1/ ˆ     kk T kkkkkk vvEHPHE  1/ Since    T kkkkkkkk XXXXEP /// ˆˆ  We substitute    kkkkkkkk EKXX  1 1// ˆˆ    and we get:
  • 32. 32            T kkkkkkkkkkkkkkkk EKXXEKXXEP  1 1/ 1 1// ˆˆ                           T kkkkkkk T kkkkkkkk T kkkkkkkkk XXEKE EKEKEEKXXEP 1/ 1 111 1/1/ ˆ ˆ                  T kkkk T kkkk T kkkkkk KEKKEKKEKP 111 1/        T kkkkkk KEKP 1 1/     Substitute     kk T kkkkkk vvEHPHE  1/ we get:    T kkk T kkkkkkkkk KvvEHPHKPP 1 1/1//    , T kkkk HPK 1/  ,    T XXEP 001/0  This is the desired result. Thus, we were able to find the updated estimate kkX / ˆ and its updated covariance kkP / . We now summarize the estimation steps: (1) kkkk vXHY  (linear observation model) (2) 1/ ˆ  kkkk YY (3)    1 1 1111/ ˆ     k T kk T kkkkkkk YYYYYEYYY ,  T nn YYYY ,...,, 21 (4)        k i iiiilkl EXEX 0 1 / ˆ  (5)           1 0 1 1/ ˆ k i iiiikkk EXEX  (6)         k i iiiikkk EXEX 0 1 1/1 ˆ  ?? we need a recursive equation. (7)    T kkkkkkkk XXXXEP 1/1/1/ ˆˆ   ?? we need a recursive equation.
  • 33. 33 (8)    T kkk T kkkkkkkkk KvvEHPHKPP 1 1/1//    , T kkkk HPK 1/  ,    T XXEP 001/0  (9)     1/ 1 1/1/1// ˆˆˆ     kkkkkk T kkkk T kkkkkkk XHYvvEHPHHPXX , 0ˆ 1/0 X Thus, for every step, we first find 1/ ˆ kkX and its covariance 1/ kkP . We then update the estimates to get kkX / ˆ and its covariance kkP / . To have a recursive relation, we need to find a predictive equation, kkX /1 ˆ  , for kX and its covariance kkP /1 . This could be obtained if we know the system dynamics. Assume that the system dynamics is linear with the equation: kkkkk wXX 1 where kw is zero mean white Gaussian noise independent of all other noises and has covariance kQ . kX is zero mean with covariance  kkk XXE which has the recursive relation: T kkk T kkkk Q  1 The linear least square estimate given observations uptil time k kkX /1 ˆ  , is thus, kkkkkkkkkkk XwXX ////1 ˆˆˆˆ  Substituting for the equation of kkX / ˆ , we get:     1/ 1 1/1/1//1 ˆˆˆ     kkkkkk T kkkk T kkkkkkkkk XHYvvEHPHHPXX Remember that  1/ ˆ  kkkkk XHY which has zero mean and covariance   kk T kkkk vvEHPH 1/ . Since all the terms involved are Gaussian with zero mean and 1/ ˆ kkX is independent of k , then the covariance  kkkkkk XXE /1/1/1 ˆˆ   of kkX /1 ˆ  is obtained as:
  • 34. 34    T k T kkkkk T kkkk T kkkk T kkkkkk PHvvEHPHHP     1/ 1 1/1/1//1 , 01/0   Notice that the covariance of 1kX , 1k , is the sum of the covariance of kkX /1 ˆ  , kk /1 , and the covariance of the estimated error kkP /1 . This is because the error in the estimate is orthogonal to the observations or the estimate. Thus, we have kkkkkP /11/1   . Using the fact that kw is independent of kX and older values we get:        T kkkkkkkkkkkkkk T kkkkkkkk XwXXwXEXXXXEP ///11/11/1 ˆˆˆˆ         T kkkkkkkkkkkk wXXwXXE  // ˆˆ             T k T kkkkk T k T kkkkk T k T kkk T k T kkkkkkk XXwEwXXEwwEXXXXE  //// ˆˆˆˆ And since         0ˆˆ //  T kkkk T kkkk XXEwEXXwE because kw is independent of kX , by assumption, and independent of ,..., 1kk YY , by assumption, then we get: T kkk T kkkkkk QPP  //1 , 01/0 P This is the covariance of the predicted estimate kkX /1 ˆ  . We now summarize our finding so far: (1) kkkk vXHY  (linear observation model) (2) kkkkk wXX 1 (linear system dynamics model) (3) 1/ ˆ  kkkk YY (4)    1 1 1111/ ˆ     k T kk T kkkkkkk YYYYYEYYY ,  T nn YYYY ,...,, 21 (5)        k i iiiilkl EXEX 0 1 / ˆ 
  • 35. 35 (6)           1 0 1 1/ ˆ k i iiiikkk EXEX  (7)         k i iiiikkk EXEX 0 1 1/1 ˆ  kkkkk XX //1 ˆˆ  or 1/11/ ˆˆ   kkkkk XX , this is prediction step for states (8)    T kkkkkkkk XXXXEP 1/1/1/ ˆˆ     T k T kkk T kkkkkk wwEPP  //1 , or   T k T kkk T kkkkkk wwEPP   1/11/ 01/0 P , this is prediction step for the covariance of the state estimate (9)    T kkk T kkkkkkkkk KvvEHPHKPP 1 1/1//    , T kkkk HPK 1/  ,    T XXEP 001/0  , update of the covariance matrix of the states estimate (10)     1/ 1 1/1/1// ˆˆˆ     kkkkkk T kkkk T kkkkkkk XHYvvEHPHHPXX , update of states estimate So far we were able to get an estimate of the stochastic process X(k)= kX , kkX / ˆ , and its covariance matrix kkP / based on a train of observations of Y(0), Y(1), …Y(k). The above procedures were developed by R. Kalman. We now summarize the estimation steps or the Kalman filter equations in a vector format: (1) kkkkk wXX 1 (linear system dynamics model), kw ~  kQN ,0 , white Gaussian noise (2) kkkk vXHY  (linear observation model), kv ~  kRN ,0 , white Gaussian noise (3) Prediction step: 1/11/ ˆˆ   kkkkk XX , T kk T kkkkkk QPP   k1/11/ (4) Updating step: 1/ ˆ  kkkk YY
  • 36. 36 T kkkkkk HPHRS 1/  1 1/   k T kkkk SHPK  1/ 1 1// ˆˆˆ     kkkkkkkkk XHYKXX     T kkk T kkkkkkkk KRKHKIPHKIP  1//
  • 37. 37 Karhunen-Loeve Expansion Karhunen-Loeve Expansion; the Scalar Case [Van Trees; 1968]: A stochastic process )(tX is expanded in terms of orthonormal basis )(ti as:  TtttX i ii ,0,)()( 1      where  T ii dttXt 0 )()( ij T ji dttt  0 )()( The basis )(ti are chosen such that the random coefficients i are uncorrelated; Viz: If   ii mE  then     ijijjii mmE   For the process X(t) with mean m(t), define the covariance function ),( utK as:    )()()()(),( umuXtmtXEutK  The covariance function, for uncorrelated i , satisfies the Fredholm integral equation of the second kind: 0),()(),( 0  iii T i tduuutK  and could be expanded in terms of the orthonormal basis as:  TutututK i iii ,0,,)()(),( 1     
  • 38. 38 It is this equation that we use to find the orthonormal basis. Proof of the Fredholm integral equation of the second kind: Assuming zero mean stochastic process and thus zero mean random variables i , we use the equation:     ijijjii mmE   Now           T j T iijiji duuXudttXtEE 00 )()()()(  Exchanging expectation and integration we get:     T j T iijiji duuuXtXEdttE 00 )()()()(   T j T i duuutKdtt 00 )(),()(  A necessary and sufficient condition is that in the right hand side of the equation: )()(),( 0 tduuutK jj T j   In this case we get: ijj T ijj T j T i dtttduuutKdtt    000 )()()(),()( which is the left hand side of the equation. Example: Wiener process. The Wiener process is a zero mean process with covariance       utt tuu tuutK , , ),min(),( 2 2 2    Thus,   T t i t i T iii duutduuuduuutKt )()()(),()( 2 0 2 0 
  • 39. 39 Taking the derivative w.r.t. “t” of both sides we get:  T t i i i duu dt td )( )( 2    Taking the derivative one more time we get: )( )( 2 2 2 t dt td i i i     This yields the solution: 2 2 22 2 1            n T n              t T n T tn   2 1 sin 2 )( Example: Stationary process; Assume that the stochastic process X(t) is zero mean and stationary with correlation         PetXtXER )()( and spectrum   222 2 2)( )(     w P wD wN wS . Thus, )(),( utRutK  . The Fredholm integral equation becomes:           T t i tu t T i ut T T i T T iii duuePduuePduuutRduuutKt )()()()()(),()(   Differentiating w.r.t “t” we get:      T t i ut t T i uti i duueePduueeP dt td )()( )(     Differentiating again w.r.t “t” we get:
  • 40. 40   )(2)(2)()(2)( )( 222 2 2 tPtPttPduueP dt td iiiiii T T i t-ui i        which has a solution: tjbtjb i ii ecect   21)( ,   i i i P b   /22 2   , 22 2 i i b P      After some manipulations we end up with the expressions:                          evenis,sin 2 2sin 1 1 oddis,cos 2 2sin 1 1 )( 2/1 2/1 2/1 2/1 itb Tb Tb T itb Tb Tb T t i i i i i i i Karhunen-Loeve expansion for Stationary process: For the spectrum given by  2 2 )( )( wD wN wS  where the numerator order is q and the denominator order is p and q<p, and assuming that the data is available for long observation time   ,t , the Fredholm integral equation has the form:      ,),()()()()( tttRduuutRt iiii  where “ ” is a the convolution operator. Using the Fourier transform, one could obtain a solution to the above equation as:  2 2 )( )()()()( wD wN wwwSw iiii  
  • 41. 41 or    )()(0 22 wwNwD ii   For each i , there are p homogeneous solutions corresponding to the roots of   )( 22 wNwDi  . We denote these solutions as plt ihl ,...,1),,(  . Thus,   p l ihlli tct 1 ),()(  Substitute in the Fredholm equation, to find lc and i , we get:            ,,),()(),()(),( 111 tduucutRduucutRtc p l ihll p l ihll p l ihlli  or      ,,,...,1,),()(),( tplduuutRt ihlihli  Example: In the above we used   222 2 2)( )(     w P wD wN wS , we need to solve       )(2)()(0 2222 wPwwwNwD iiii   . The roots are located at   0222  Pwi  i.e.   i i i i i PPP w        /222 22 2     . Thus, the two homogeneous solutions 2,1),,( lt ihl  are given by jwt ih jwt ih etet   ),(,),( 21  , and we get jwtjwt i ecect   21)(
  • 42. 42 Karhunen-Loeve Expansion; the Vector Case: A stochastic vector process  T N tXtXtX )(),...,()( 1 is expanded in terms of orthonormal basis vector  T iNii ttt )(),...,()( 1   as:  TtttX i ii ,0,)()( 1      where  T T ii dttXt 0 )()( ij T j T i dttt  0 )()( The basis )(ti  are chosen such that the random coefficients i are uncorrelated; Viz: If   ii mE  then     ijijjii mmE   For the process X(t) with mean m(t), define the covariance matrix ),( utK as:    T umuXtmtXEutK )()()()(),(  The covariance matrix, for uncorrelated i , satisfies the Fredholm integral equation: 0),()(),( 0  iii T i tduuutK  and could be expanded in terms of the orthonormal basis as:  TutututK i T iii ,0,,)()(),( 1      It is this equation that we use to find the orthonormal basis.
  • 43. 43 The Wiener Filter We shall develop the Wiener filter for stationary signals. Nonstationary signals could, in many cases, be reduced to stationary signals by focusing on a small window of the signal. In this window, the statistical properties do not change much i.e. we have a stationary signal. Assume that we receive/observe a signal y(n) that is correlated with another signal X(n). We use y(n) to find an estimate, )(ˆ nX , of X(n) according to the following equation:   I i inyihnX 0 )()()(ˆ Define )(ˆ)()( nXnXne  ,     22 )(ˆ)()()( nXnXEneEn  where h(0), h(1), …, h(I) are the unknown filter parameters. In order to find the filter parameters, we need to minimize the expected value of the squared error w.r.t. the unknowns.                   )()()()(2)()(ˆ)(20 )( )( 0 lnyinyihnXElnynXnXE lh n I i  This yields:     IlinylnyEihinylnyihElnynXE I i I i ,...,1,0,)()()()()()()()( 00           i.e. IlliRihlR I i yyXy ,...,1,0,)()()( 0  
  • 44. 44 These are a set of (I+1) equations in the filter coefficients h(0), h(1),…, h(I). We usually have the observations y(n). This gives us an estimate of )(iRyy . We need to find a relation between y(n) and X(n) in order to get an estimate of )(lRXy . In some situations, as in noise cancelling, we have the signal X(n). We could get a numerical Estimate for )(lRXy as        lN n Xy lnynX lN lR 0 )()( 1 1 )(ˆ Assume that y(n) is a linear observation of the process X(n); Viz: )()()( nvncXny  where )(nv is additive zero mean 2 v variance Gaussian noise which is independent of X(n). In this case in order to get )(lRXy , we multiply by )( lny  and take the expectations as follows: l=0:      )()()()()()()0( nvnyEnXnycEnynyERyy    )()()()0( nvnvncXEcRXy  2 )0( vXycR  Thus,  2 )0( 1 )0( vyyXy R c R  where we used       0)()()()(  nvEnXEnvnXE because X(n) and v(n) are independent and   0)( nvE .
  • 45. 45 l=1:      )()1()()1()1()()1( nvnyEnXnycEnynyERyy  )1(XycR i.e. )1( 1 )1( yyXy R c R  and in general )( 1 )( lR c lR yyXy  The Observations are convolution of the state: Assume that y(n) is a linear observation of the process X(n); Viz: )()1()()( 10 nvnXcnXcny  where )(nv is additive zero mean 2 v variance Gaussian noise which is independent of X(n). In this case in order to get )(lRXy , we multiply by )( lny  and take the expectations as follows: l=0:        )()()1()()()()()()0( 10 nvnyEnXnyEcnXnyEcnynyERyy    )()()1()()1()0( 1010 nvnvnXcnXcERcRc XyXy  2 10 )1()0( vXyXy RcRc  Similarly
  • 46. 46 l=1:        )()1()1()1()()1()()1()1( 10 nvnyEnXnyEcnXnyEcnynyERyy    )()1()2()1()0()1( 1010 nvnvnXcnXcERcRc XyXy  )1()0( 01 XyXy RcRc  In matrix format we have                       )1( )0( )1( )0( 01 10 2 Xy Xy yy vyy R R cc cc R R  Thus,                       )1( )0( )1( )0( 21 01 10 yy vyy Xy Xy R R cc cc R R  For )()()( 2 0 nvknXcny k k            )()()2()()1()()()()()()0( 210 nvnyEnXnyEcnXnyEcnXnyEcnynyERyy    )()()1()()2()1()0( 10210 nvnvnXcnXcERcRcRc XyXyXy  2 210 )2()1()0( vXyXyXy RcRcRc  Similarly          )()1()2()1()1()1()()1()()1()1( 210 nvnyEnXnyEcnXnyEcnXnyEcnynyERyy    )()1()3()2()1()1()0()1( 210210 nvnvnXcnXcnXcERcRcRc XyXyXy 
  • 47. 47   )1()0( 201 XyXy RccRc           )()2()2()2()1()2()()2()()2()2( 210 nvnyEnXnyEcnXnyEcnXnyEcnynyERyy    )()1()4()3()2()0()1()2( 210210 nvnvnXcnXcnXcERcRcRc XyXyXy  )0()1()2( 210 XyXyXy RcRcRc  In matrix format we have                                   )2( )1( )0( 0 )2( )1( )0( 012 201 210 2 Xy Xy Xy yy yy vyy R R R ccc ccc ccc R R R  Thus,                                    )2( )1( )0( 0 )2( )1( )0( 21 012 201 210 yy yy vyy Xy Xy Xy R R R ccc ccc ccc R R R  Example: Assume that we have a first order filter i.e. I=1, h(0) and h(1) are unknowns. We also have the observations y(n) related to the signal X(n) by )()()( nvncXny  . Thus,  2 )0( 1 )0( vyyXy R c R  and )1( 1 )1( yyXy R c R  )1()1()()0()(ˆ  nyhnyhnX                   )()()()(2)()(ˆ)(20 )0( )( 1 0 nyinyihnXEnynXnXE h n i  This yields:
  • 48. 48        )()1()1()()()0()()1()1()()()0()()( nynyEhnynyEhnynyhnynyhEnynXE  i.e. )1()1()0()0()0( yyyyXy RhRhR  or   )1()1()0()0()0( 1 2 yyyyvyy RhRhR c  Similarly                   )1()()()(2)1()(ˆ)(20 )1( )( 1 0 nyinyihnXEnynXnXE h n i  This yields:        )1()1()1()1()()0()1()1()1()1()()0()1()(  nynyEhnynyEhnynyhnynyhEnynXE i.e. )0()1()1()0()1( yyyyXy RhRhR  or )0()1()1()0()( 1 yyyyyy RhRhlR c  In matrix format we have:                       )1( )0( )0()1( )1()0( )1( )0(1 2 h h RR RR R R c yyyy yyyy yy vyy  Inverting the matrix we get                       )1( )0( )0()1( )1()0(1 )1( )0( 21 yy vyy yyyy yyyy R R RR RR ch h  This is the desired result.
  • 49. 49 Adaptive Filters Adaptive Frequency Estimation: Assume that the stochastic process X(t) is the output of a Linear filter H(z) driven by white Gaussian noise. Assume further that the filter is an AR process with real coefficients ka i.e. it has only poles. In this case the filter is given by:          M k jkw k jMw M jw jw ea eaea eH 1 1 1 1 ...1 1 For zero mean Gaussian process )(nv with variance 2 v , the output spectrum )(wSAR becomes:   2 1 2 22 1 )(     M k jkw k vjw vAR ea eHwS   )()()( 1 nvknXanX M k k   If we assume adaptive parameters i.e. we have )(ˆ nak , the spectrum becomes also adaptive, ),( nwSAR , and is given by: 2 1 2 )(ˆ1 ),(     M k jkw k v AR ena nwS  The parameters are updated using, e.g., the LMS algorithm as follows:
  • 50. 50 )()()(ˆ)1(ˆ neknXnana kk     M k k knXnanXnXnXne 1 )(ˆ)(ˆ)()(ˆ)()( Example: Single sinusoid with random frequency. Assume that the receive signal is made of single sinusoid with unknown frequency that is behaving according to OU process. The sinusoid X(n) is modeled as and AR(2). The initial conditions X(-2), X(-1), X(0), determine the phase, and the values of the coefficients determine the frequency. )()2()1()( 21 nvnXanXanX  which has the transfer function:          222 22 21 2 2 2 2 1 1 21 1 )( )(           zz z jzjz z azaz z zazazv zX i.e. 2/1a , 2 1 2 2        a a               2/arctan    f where is the sampling interval. Notice that the arctan operation yields the phase between “  ” and “  ”. The frequency )(tf is modeled as an Ornestein-Uhlenbeck (OU) process;
  • 51. 51   )()()( tdWdttftdf fff   where W(t) is the Wiener process. We use the LMS algorithm to find an estimate of the frequency. We first estimate the changing coefficients according to the equation: )()()(ˆ)1(ˆ neknXnana kk     M k k knXnanXnXnXne 1 )(ˆ)(ˆ)()(ˆ)()( The frequency is estimated as              2/ )(ˆ )(ˆ arctan)(ˆ n n nf where 2/)(ˆ)(ˆ 1 nan  , 2 1 2 2 )(ˆ )(ˆ)(ˆ        na nan Adaptive Noise Cancelling to Remove Sinusoidal Interference: Assume that the signal X(t) is modeled as:  000 cos)()(  nwAnSnX where  000 cos nwA is the sinusoidal interference and S(n) is the desired unknown signal. We also have another reference signal, y(n), that has the same frequency as the interference but different in amplitude and phase; Viz:
  • 52. 52   000 ,cos)(   andAAnwAny The estimated signal, )(ˆ nX , and the error, e(n), are modeled as:         1 0 0 1 0 cos)(ˆ)()(ˆ)(ˆ M i i M i i inwAnainynanX                 1 0 0000 cos)(ˆcos)()(ˆ)()( M i i inwAnanwAnSnXnXne  The Wiener filter weights, )(ˆ nai , are updated through the LMS algorithm as: )()()(ˆ)1(ˆ inynenana ii   Notice that the signal X(n) is made of two parts; (1) e(n) which is not correlated with y(n) and (2) )(ˆ nX which is correlated with y(n). The adaptive filter output must converge to the correlated part and thus the remainder is the desired signal. Adaptive Line Enhancement (ALE): In some situations there is only one signal X(n) that is available. This signal is made of two parts; (1) Slowly varying signal )(1 nX with slowly decaying correlation, and (2) rapidly varying signal )(2 nX independent of )(1 nX and that could be nonstationary or has short duration correlation. Define another signal   nXny )( , where  is a delay. We choose the delay  such that y(n) is highly correlated with the slowly varying part of X(n) but is weakly correlated
  • 53. 53 with the rapidly varying part of X(n). Thus, X(n) has two components (1) )()(ˆ 1 nXnX  that is highly correlated with y(n), and (2) )()( 2 nXne  that is orthogonal or independent of y(n) . Let )()()( 21 nXnXnX  then      )()()()()()( 2121   nXnXnXnXEnXnXE        )()()()()()()()( 22122111   nXnXEnXnXEnXnXEnXnXE   )()()( 1111  RnXnXE  Similarly )()()()( 21  nXnXnXny        )()()()()()()()( 2121 nXnXnXnXEnXnXEnXnyE         )()()()()()()()( 22122111 nXnXEnXnXEnXnXEnXnXE    )()()( 1111  RnXnXE Define )1()1()()0()1()1()()0()(ˆ  nXhnXhnyhnyhnX    )1()1()1()()()0( 2121  nXnXhnXnXh  )1()1()()0()()(ˆ)()(  nXhnXhnXnXnXne i.e.  )1()1()()0()()(ˆ)()(  nXhnXhnenXnenX The signal X(n) has two parts; (1) e(n) which is uncorrelated with y(n), and (2) )(ˆ nX which is highly correlated with y(n).
  • 54. 54      22 )1()1()()0()()()(  nXhnXhnXEneEn To find the unknown coefficients, h(0) and h(1), we minimize )(n and equate the derivative to zero:  )()(20 )0( )(    nXneE h n    )()1()1()()0()(2  nXnXhnXhnXE i.e.      )()1()1()()0()()( 2  nXnXEhnXEhnXnXE Similarly  )1()(20 )1( )(    nXneE h n    )1()1()1()()0()(2  nXnXhnXhnXE i.e.      )1()1()1()()0()1()( 2  nXEhnXnXEhnXnXE These are two equations in the two unknowns h(0) and h(1). Since   )1()1()()0()()1()1()()0()()(ˆ)()(  nyhnyhnenXhnXhnenXnenX , the signal X(n) has two parts; (1) e(n) which is uncorrelated with y(n), and (2) )(ˆ nX which is highly correlated with y(n). In our case, )(ˆ nX is an estimate of )(1 nX and e(n) is an estimate of )(2 nX .
  • 55. 55 Example: The signal is a product of stationary and nonstationary components i.e. )()()( 21 nZnZnZ  . Taking the log operation, assuming nonnegative quantities, we get )(log)(log)(log 21 nZnZnZ  . Define )(log)( nZnX  , )(log)( 11 nZnX  , )(log)( 22 nZnX  , and we follow the same steps as before.
  • 56. 56 Blind Deconvolution Assume that we have one source of signals e.g. the Aortic pressure )(nPA . We measure the pressure at the Femoral artery )(nPF and the Illiac artery )(nPI through the unknown filters )(nFF and )(nFI : )(*)()( nPnFnP AFF  )(*)()( nPnFnP AII  where “*” is the convolution operation. Convolve the first equation by )(nFI and the second equation by )(nFF , we get: )(*)(*)()(*)( nPnFnFnPnF AFIFI  )(*)(*)()(*)( nPnFnFnPnF AIFIF  Notice that the right hand side of both equations are equal. Thus, )(*)(*)()(*)()(*)( nPnFnFnPnFnPnF AFIIFFI  The two pressures )(nPF and )(nPI are highly correlated. If we use a Wiener filter, as correlation canceller, we should get the output an exact replica of the other signal i.e. we should get an estimate of )(nPI if we pass )(nPF through the Wiener filter.
  • 57. 57 Instead, if we use two Wiener filters )(nHF and )(nHI such that the error between the two outputs is zero, then the output of each filter is )(nPA i.e. )(nHF is actually the inverse of )(nFF and )(nHI is the inverse of )(nFI Thus,   FI i FFFFAF inPiHnPnHnP 0 )()()(*)()(   IJ j IIIIAI jnPjHnPnHnP 0 )()()(*)()( The error is      IF J j II I i FFAIAF jnPjHinPiHnPnPne 00 )()()()(0)()()( Define                    2 00 2 )()()()()()( IF J j II I i FF jnPjHinPiHEneEn To find the coefficients of the two Wiener filters, we minimize )(n w.r.t. the unknowns as follows: FF J j II I i FF F IkknPjnPjHinPiHE kH n IF ,...,0,)()()()()(0 )( )( 00                     II J j II I i FF I JllnPjnPjHinPiHE lH n IF ,...,0,)()()()()(0 )( )( 00                     This will yield FI + II -1 independent equations in the FI + II unknowns. Thus, another equation is needed to find a unique solution.
  • 58. 58 Example: Assume that we have first order filters. Thus, )1()1()()0()()()(*)()( 1 0     nPHnPHinPiHnPnHnP FFFF I i FFFFAF F )1()1()()0()()()(*)()( 1 0     nPHnPHjnPjHnPnHnP IIII J j IIIIAI I and         1 0 1 0 )()()()(0)()()( IF J j II I i FFAIAF jnPjHinPiHnPnPne                    2 00 2 )()()()()()( IF J j II I i FF jnPjHinPiHEneEn To find the coefficients of the two Wiener filters, we minimize )(n w.r.t. the unknowns as follows: FF J j II I i FF F IkknPjnPjHinPiHE kH n IF ,...,0,)()()()()(0 )( )( 00                     for k=0                      )()()()()(0 )0( )( 1 0 1 0 nPjnPjHinPiHE H n F J j II I i FF F IF  which yields 0)1()1()0()0()1()1()0()0(  IFIIFIFFFFFF RHRHRHRH for k=1                      )1()()()()(0 )1( )( 1 0 1 0 nPjnPjHinPiHE H n F J j II I i FF F IF  which yields 0)0()1()1()0()0()1()1()0(  IFIIFIFFFFFF RHRHRHRH
  • 59. 59 Similarly for the Illiac artery we get: for l=0                      )()()()()(0 )0( )( 1 0 1 0 nPjnPjHinPiHE H n I J j II I i FF I IF  which yields 0)1()1()0()0()1()1()0()0(  IIIIIIFIFFIF RHRHRHRH for l=1                      )1()()()()(0 )1( )( 1 0 1 0 nPjnPjHinPiHE H n I J j II I i FF I IF  which yields 0)0()1()1()0()0()1()1()0(  IIIIIIFIFFIF RHRHRHRH Collecting terms and putting the above equations into matrix format we get: 0 )1( )0( )1( )0( )0()1()0()1( )1()0()1()0( )0()1()0()1( )1()0()1()0(                              I I F F IIIIFIFI IIIIFIFI FIFIFFFF FIFIFFFF H H H H RRRR RRRR RRRR RRRR where )()( lRlR IFFI  These are three independent equations in four unknowns. One should be able to get another independent equation to find a unique solution. Otherwise, we use one of the unknowns as a numeraire. Assume that the numeraire is )0(FH . In this case the above equation is reduced to:                                      )1( )0( )1( )0( )1( )0( )1( )0()1()0( )1()0()1( )0()1()0( FI FI FF F I I F IIIIFI IIIIFI FIFIFF R R R H H H H RRR RRR RRR This yields the estimates of the unknown parameters as function of the value of )0(FH .
  • 60. 60
  • 61. 61 Blind Deconvolution for more than One Source: In many applications there is more than one source that is generating signals e.g. Mother EKG and Fetus EKG. The receivers are usually more than the number of sources. We shall focus on the case of two sources, )(1 nu and )(2 nu , and three receivers, )(1 ny , )(2 ny , and )(3 ny , and explain how to use the Wiener filter to find the desired sources. Later on we shall generalize the analysis. In all the analysis, it is assumed that the sources are independent and stationary signals and the transmission media/modulation are linear time invariant filters. Specifically                            )( )( )()( )()( )()( )( )( )( 2 1 3231 2221 1211 3 2 1 zu zu zFzF zFzF zFzF zy zy zy which could be split into three equations as:                   )( )( )()( )()( )( )( 2 1 2221 1211 2 1 zu zu zFzF zFzF zy zy                   )( )( )()( )()( )( )( 2 1 3231 1211 3 1 zu zu zFzF zFzF zy zy and                   )( )( )()( )()( )( )( 2 1 3231 2221 3 2 zu zu zFzF zFzF zy zy Inverting the square matrices (assuming invertiblity) we get:                              )( )( )()( )()( )()()()( 1 )( )( 2 1 1121 1222 211222112 1 zy zy zFzF zFzF zFzFzFzFzu zu                            )( )( )()( )()( )()()()( 1 )( )( 3 1 1131 1232 311232112 1 zy zy zFzh zFzF zFzFzFzFzu zu and
  • 62. 62                            )( )( )()( )()( )()()()( 1 )( )( 3 2 2131 2232 312232212 1 zy zy zFzF zFzF zFzFzFzFzu zu For the source signal )(2 zu , we have the equation:  )()()()( )()()()( 21122211 211121 zFzFzFzF zyzFzyzF    )()()()( )()()()( 31123211 311131 zFzFzFzF zyzFzyzF      )( )()()()( )()()()( 2 31223221 321231 zu zFzFzFzF zyzFzyzF     The above equation is actually a set of three equations in the unknown filter coefficients. To see this, we take the first two terms of equation i.e.:     )( )()()()( )()()()( )()()()( )()()()( 2 31123211 311131 21122211 211121 zu zFzFzFzF zyzFzyzF zFzFzFzF zyzFzyzF       Rearrange we get:      )()()()()()()()( )()()()()()()()( 31113121122211 21112131123211 zyzFzyzFzFzFzFzF zyzFzyzFzFzFzFzF   The above equation represents a linear relation between the observations, delayed observations, and the unknown coefficients of the mixing FIR filters. Solving this equation using for example regression analysis, estimates for the filters could be found. Once the filter coefficients are found, we use them to find an estimate for the source signal )(2 zu . The same approach could be applied for )(1 zu . Instead, we use the Wiener filter approach where we assume that the signals are stationary. Each observed signal has two Wiener filters. The parameters of each are estimated from the minimization of a squared error criterion. Specifically, for )(nyi we have the two Wiener filters )(1 nHi and )(2 nHi . They are related to the data through the equations:
  • 63. 63  )()()()( )()()()( 21122211 211121 zFzFzFzF zyzFzyzF    )()()()( )()()()( 31123211 311131 zFzFzFzF zyzFzyzF      )( )()()()( )()()()( 2 31223221 321231 zu zFzFzFzF zyzFzyzF     In terms of the Wiener filters we get the three equations:  )()()()()()()()()( 3311122211111 zyzHzyzHzyzHzyzHze   )()()()()()()()()( 3322222211112 zyzHzyzHzyzHzyzHze  and  )()()()()()()()()( 3322223311123 zyzHzyzHzyzHzyzHze  For the first equation we have:  )(*)()(*)()(*)()(*)()( 3311122211111 nynHnynHnynHnynHne  and        2 331112221111 2 11 )(*)()(*)()(*)()(*)()()( nynHnynHnynHnynHEneEn  Minimizing )(1 n with respect to the unknown coefficients of the filters, we get an estimate of these coefficients as function of the correlation between the observed signals )(1 ny , )(2 ny , and )(3 ny . In a Similar manner we obtain the equations for )(2 ne and )(3 ne . In vector notations we have:                                   )( )( )( * )()()()( )()()()( )()()()( )( )( )( 3 2 1 32312212 32222111 31211211 3 2 1 ny ny ny nHnHnHnH nHnHnHnH nHnHnHnH ne ne ne or )(*)()( nynHne  Example: Assume all the Wiener filters are first order. Thus, )1()0()( 1 ijijij HzHzH   and
  • 64. 64        2 331112221111 2 11 )(*)()(*)()(*)()(*)()()( nynHnynHnynHnynHEneEn                          2 331331 2212211121111211 )1()1()()0( )1()1()()0()1()1()1()()0()0( nyHnyH nyHnyHnyHHnyHH E minimizing )(1 n w.r.t the unknown coefficients we get:  )()(0 )0( )( 11 11 1 nyneE H n    ,  )1()(0 )1( )( 11 11 1    nyneE H n  )()(0 )0( )( 11 12 1 nyneE H n    ,  )1()(0 )1( )( 11 12 1    nyneE H n Notice that the above two equations are identical. Thus, only one equation will be useful.  )()(0 )0( )( 21 21 1 nyneE H n    ,  )1()(0 )1( )( 21 21 1    nyneE H n  )()(0 )0( )( 31 31 1 nyneE H n    ,  )1()(0 )1( )( 31 31 1    nyneE H n The above are a total of 6 equations.  )()(0 )0( )( 31 31 1 nyneE H n    ,  )1()(0 )1( )( 31 31 1    nyneE H n        2 331112221111 2 11 )(*)()(*)()(*)()(*)()()( nynHnynHnynHnynHEneEn  In a similar way we have: minimizing )(2 n w.r.t the unknown coefficients we get:  )()()()()()()()()( 3322222211112 zyzHzyzHzyzHzyzHze   )()(0 )0( )( 22 22 2 nyneE H n    ,  )1()(0 )1( )( 22 22 2    nyneE H n
  • 65. 65  )()(0 )0( )( 32 32 2 nyneE H n    ,  )1()(0 )1( )( 32 32 2    nyneE H n The above are a total of 4 equations. minimizing )(3 n w.r.t the unknown coefficients we get:  )()()()()()()()()( 3322223311123 zyzHzyzHzyzHzyzHze   )()(0 )0( )( 33 31 3 nyneE H n    ,  )1()(0 )1( )( 33 31 3    nyneE H n The above are a total of 2 equations. Thus, we have a total of 12 equations in 12 unknowns. Unfortunately, these are not independent equations. Actually we only have 11 independent equations. Thus, we need to assume a value for one of the unknowns and calculate the other values of the Wiener filters in terms of this quantity; same as we did with the single source and two measurements. In vector notations we have:                                                                                                                                )1( )0( )1( )0( )1( )0( )1( )0( )1( )0( )1( )0( )0()1(00)0()1(00)0()1(00 )1()0(00)1()0(00)1()0(00 )0()1(00)0()1()0()1(00)0()1( )1()0(00)1()0()1()0(00)1()0( )0()1(00)0()1()0()1(00)0()1( )1()0(00)1()0()1()0(00)1()0( 00)0()1(00)0()1()0()1()0()1( 00)1()0(00)1()0()1()0()1()0( 00)0()1(00)0()1()0()1()0()1( 00)1()0(00)1()0()1()0()1()0( 00)0()1(00)0()1()0()1()0()1( 00)1()0(00)1()0()1()0()1()0( 0 0 0 0 0 0 0 0 0 0 0 0 32 32 31 31 22 22 21 21 12 12 11 11 333332323131 333332323131 3333323232323131 3333323232323131 3232222222222121 3232222222222121 3333323231313131 3333323231313131 3232222221212121 3232222221212121 3131212111111111 3131212111111111 H H H H H H H H H H H H RRRRRR RRRRRR RRRRRRRR RRRRRRRR RRRRRRRR RRRRRRRR RRRRRRRR RRRRRRRR RRRRRRRR RRRRRRRR RRRRRRRR RRRRRRRR yyyyyyyyyyyy yyyyyyyyyyyy yyyyyyyyyyyyyyyy yyyyyyyyyyyyyyyy yyyyyyyyyyyyyyyy yyyyyyyyyyyyyyyy yyyyyyyyyyyyyyyy yyyyyyyyyyyyyyyy yyyyyyyyyyyyyyyy yyyyyyyyyyyyyyyy yyyyyyyyyyyyyyyy yyyyyyyyyyyyyyyy
  • 66. 66 Once we have estimates for the Wiener filters, we go back and substitute to get an estimate for the unknown input source signal )(2 nu . The same process could be repeated for the other signal )(1 nu . Example: A simpler example would be to assume that all the filters involved are constant values.                            )( )( )( )( )( 2 1 3231 2221 1211 3 2 1 zu zu FF FF FF zy zy zy which could be split into three equations as:                   )( )( )( )( 2 1 2221 1211 2 1 zu zu FF FF zy zy                   )( )( )( )( 2 1 3231 1211 3 1 zu zu FF FF zy zy and                   )( )( )( )( 2 1 3231 2221 3 2 zu zu FF FF zy zy Inverting the square matrices (assuming invertiblity) we get:                              )( )(1 )( )( 2 1 1121 1222 211222112 1 zy zy FF FF FFFFzu zu                            )( )(1 )( )( 3 1 1131 1232 311232112 1 zy zy FF FF FFFFzu zu and                            )( )(1 )( )( 3 2 2131 2232 312232212 1 zy zy FF FF FFFFzu zu For the source signal )(1 zu , we have the equation:  21122211 212122 )()( FFFF zyFzyF    31123211 312132 )()( FFFF zyFzyF      )( )()( 1 31223221 322232 zu FFFF zyFzyF    
  • 67. 67 For the source signal )(2 zu , we have the equation:  21122211 211121 )()( FFFF zyFzyF    31123211 311131 )()( FFFF zyFzyF      )( )()( 2 31223221 321231 zu FFFF zyFzyF     The above equation is actually a set of three equations in the unknown filter coefficients. To see this, we take the first two terms of equation i.e.:     )( )()()()( 2 31123211 311131 21122211 211121 zu FFFF zyFzyF FFFF zyFzyF       Rearrange we get:      )()()()( 3111312112221121112131123211 zyFzyFFFFFzyFzyFFFFF  The above equation represents a linear relation between the observations, and the unknown coefficients of the mixing FIR filters. Solving this equation using for example regression analysis, estimates for the filters could be found. Once the filter coefficients are found, we use them to find an estimate for the source signal )(2 zu . The same approach could be applied for )(1 zu . Instead, we use the Wiener filter approach where we assume that the signals are stationary. Each observed signal has two Wiener filters. The parameters of each are estimated from the minimization of a squared error criterion. Specifically, for )(nyi we have the two Wiener filters )(1 nHi and )(2 nHi which are constant values. They are related to the data through the equations:  21122211 211121 )()( FFFF zyFzyF    31123211 311131 )()( FFFF zyFzyF      )( )()( 2 31223221 321231 zu FFFF zyFzyF     In terms of the Wiener filters we get the three equations:
  • 68. 68  )()()()()( 3311122211111 zyHzyHzyHzyHze   )()()()()( 3322222211112 zyHzyHzyHzyHze  and  )()()()()( 3322223311123 zyHzyHzyHzyHze  where  21122211 21 11 FFFF F H    ,  31123211 31 12 FFFF F H     21122211 11 21 FFFF F H   ,  31223221 31 22 FFFF F H    and  31123211 11 31 FFFF F H   ,  31223221 21 32 FFFF F H   If one is able to get estimates for the Wiener filter coefficients ijH , we could find the values of the filter coefficients ijF . This will yield an estimate for both inputs )(1 nu and )(2 nu . We now move ahead and find the Wiener filter coefficients. For the first error equation )(1 ne we have:  )()()()()( 3311122211111 nyHnyHnyHnyHne  and        2 331112221111 2 11 )()()()()()( nyHnyHnyHnyHEneEn  Minimizing )(1 n with respect to the unknown coefficients of the filters, we get an estimate of these coefficients as function of the correlation between the observed signals )(1 ny , )(2 ny , and )(3 ny . In a Similar manner we obtain the equations for )(2 ne and )(3 ne . In vector notations we have:
  • 69. 69                                   )( )( )( )( )( )( 3 2 1 32312212 32222111 31211211 3 2 1 ny ny ny HHHH HHHH HHHH ne ne ne or )()( nyHne         2 331112221111 2 11 )()()()()()( nyHnyHnyHnyHEneEn  minimizing )(1 n w.r.t the unknown coefficients we get:  )()(0 )( 11 11 1 nyneE H n     )()(0 )( 11 12 1 nyneE H n    Notice that the above two equations are identical. Thus, only one equation will be useful.  )()(0 )( 21 21 1 nyneE H n    ,  )()(0 )( 31 31 1 nyneE H n    , The above are a total of 3 equations.  )()(0 )( 31 31 1 nyneE H n    , In a similar way we have: minimizing )(2 n w.r.t the unknown coefficients we get:  )()()()()( 3322222211112 zyHzyHzyHzyHze   )()(0 )( 22 22 2 nyneE H n    ,
  • 70. 70  )()(0 )( 32 32 2 nyneE H n    , The above are a total of 2 equations. minimizing )(3 n w.r.t the unknown coefficients we get:  )()()()()( 3322223311123 zyHzyHzyHzyHze   )()(0 )( 33 31 3 nyneE H n    , The above are a total of 1 equations. Thus, we have a total of 6 equations in 6 unknowns. Unfortunately, these are not independent equations. Actually we only have 5 independent equations. Thus, we need to assume a value for one of the unknowns and calculate the other values of the Wiener filters in terms of this quantity; same as we did with the single source and two measurements.  )()()()()( 3322223311123 zyHzyHzyHzyHze   )()(0 )( 33 31 3 nyneE H n    , In vector notations we have:                                                                    32 31 22 21 12 11 )0()0()0(0)0(0 )0(0)0()0(0)0( )0(0)0()0(0)0( 0)0(0)0()0()0( 0)0(0)0()0()0( 0)0(0)0()0()0( 0 0 0 0 0 0 33333231 33323231 32222221 33323131 32222121 31211111 H H H H H H RRRR RRRR RRRR RRRR RRRR RRRR yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy Once we have estimates for the Wiener filters, we go back and substitute to get an estimate for the unknown input source signal )(2 nu . The same process could be repeated for the other signal
  • 71. 71 )(1 nu . Instead we use the estimated values of ijH to get estimates for ijF and consequently get estimates for the inputs )(1 nu and )(2 nu . Notice that the above matrix equation does not have a unique solution. We could always take one of the unknowns as a numeraire. For example if we take 11H as the numeraire, the above equation becomes:                                                                       32 31 22 21 12 11 )0()0()0(0)0( )0(0)0()0(0 )0(0)0()0(0 0)0(0)0()0( 0)0(0)0()0( 0 )0( )0( )0( )0( 33333231 333232 322222 333231 322221 31 21 31 21 H H H H H RRRR RRR RRR RRR RRR H R R R R yyyyyyyy yyyyyy yyyyyy yyyyyy yyyyyy yy yy yy yy Taking the inverse of the right hand side matrix we get: 11 1 32 31 22 21 12 0 )0( )0( )0( )0( )0()0()0(0)0( )0(0)0()0(0 )0(0)0()0(0 0)0(0)0()0( 0)0(0)0()0( 31 21 31 21 33333231 333232 322222 333231 322221 H R R R R RRRR RRR RRR RRR RRR H H H H H yy yy yy yy yyyyyyyy yyyyyy yyyyyy yyyyyy yyyyyy                                                                        These are the estimated coefficients ijH as function of the numeraire 11H . If one of the source signals, )(1 nu or )(2 nu , is much higher than the other, the estimate of the stronger signal is usually very good. In this case, we use the techniques of noise cancelling to get the other signal. Specifically, assume that we have found good estimate (high SNR) of )(1 nu , we could use )()()( 2121111 nuFnuFny  as the source signal and use Wiener filter, as correlation canceller, to find the component in )(1 ny that is correlated with )(1 nu . Effectively we are getting new and more accurate estimate for 11F . Similarly we could use )()()( 2221212 nuFnuFny  and
  • 72. 72 )()()( 2321313 nuFnuFny  to get better estimates for 21F and 31F . We then repeat the above process again to find estimates for ijH but now only 12F , 22F , and 23F are unknowns. Assume that we receive/observe a signal )(1 nu that is correlated with another signal )(1 ny . We use )(1 nu to find an estimate, )(ˆ1 ny , of )(1 ny according to the following equation: )()0()(ˆ 111 nuhny  Define )(ˆ)()( 11 nynyne  ,     2 11 2 )(ˆ)()()( nynyEneEn  where )0(1h is the unknown filter parameter. In order to find the filter parameter, we need to minimize the expected value of the squared error w.r.t. the unknowns.      )()()0()(2)()(ˆ)(20 )0( )( 1111111 1 nunuhnyEnunynyE h n   This yields:      )()0()()0()()( 2 11 2 1111 nuEhnuhEnunyE  i.e. )0()0()0( 1111 1 uuuy RhR  and )0( )0( )0( 11 11 1 uu uy R R h  )0(1h is the revised more accurate estimate of 11F . The same process could be repeated for the other unknowns and we get: )0( )0( )0( 11 12 212 uu uy R R Fh  , and )0( )0( )0( 11 13 313 uu uy R R Fh 
  • 73. 73 We then use the equations for the estimated ijH to get the revised values for the rest of ijF according to:  21122211 21 11 FFFF F H    ,  31123211 31 12 FFFF F H     21122211 11 21 FFFF F H   ,  31223221 31 22 FFFF F H    and  31123211 11 31 FFFF F H   ,  31223221 21 32 FFFF F H   Finally we used the improved estimates of ijF to get improved estimates for )(2 nu .
  • 74. 74 The Kalman Filter In some situations, one has a difference equation describing the evolution of the Mx1 state vector or signal )(tX and we have the Nx1 vector )(ty measurements of the vector )(tX . In the vector format we have: )()()()( nvnXncny  )()1()()( nnXnTnX  where )(nv is Gaussian with zero mean and variance v , )(n is independent of )(nv and is Gaussian with zero mean and variance  . Once a model is put into state space form, the Kalman filter can be used to estimate the state vector by filtering. The Kalman filter will provide estimates of the unobserved variable )(nX . The purpose of filtering is to update our knowledge of the state vector as soon as a new observation )(ny becomes available. Therefore, Kalman filter can be described as an algorithm for the unobserved components at time n based on the available information at the same date. The estimates of any other desired parameters including what is called hyper parameters, v and  , can be obtained by Maximum Likelihood Estimation (MLE) algorithm as adapted by [Shumway and Stoffer; 1982]. Estimating the states through Kalman filter encompasses three step processes: the initial states, the predict states and the update states. Initial state: )0/0(X , )0/0(P
  • 75. 75 Predict states: )1/1()()1/(  nnXnTnnX  )()1/1()()1/( nTnnPnTnnP T where )1/( nnX is the estimate of )(nX given observations up till time “n-1” )1( ny , and )1/( nnP is the covariance of the estimate. Update states:  )1/()()()()1/()/(  nnXncnynKnnXnnX  1 )()1/()()()1/()(   v TT ncnnPncncnnPnK   )1/()()()/(  nnPncnKInnP The )0/0(X and )0/0(P are the vectors of initial state and covariance matrix respectively. The covariance matrix )0/0(P depicts noise of the )0/0(X . If the vector of )0/0(X and the covariance matrix )0/0(P are not given prior, )0/0(X is assumed zero and we assume large number for diagonal elements of the matrix )0/0(P .
  • 76. 76 Principal Component Analysis (PCA) Assume that the data is given in a matrix format e.g. EEG channels, two dimensional image, … etc. We represent this data as the mxn matrix X where each row represents for example a single EEG channel. Before any calculations, we subtract from each row its mean value. PCA is a method to express this data as a linear combination of the data basis vectors. Specifically; let X and Y be m×n matrices related by a linear transformation P. X is the original recorded data set and Y is a re-representation of that data set. Thus, we have:                       mm y y Y x x X ...,... 11 where Y=PX or                                 mmm x x p p y y Y ......... 111 Also let us define the mxm matrix            m p p P ... 1 where i p are the 1xm rows of P. The equation “PX = Y” represents a change of basis i.e. the rows of P,  m pp ,...,1 are a set of new basis vectors. Define the covariance matrix of X, after removing the mean from each row, as XX , and the covariance matrix of Y as YY : T XX XX n 1 1   ,      TTTT YY PXXP n PXPX n YY n 1 1 1 1 1 1      
  • 77. 77 T XX PP We know that a symmetric matrix  T XX could be diagonalized by an orthogonal matrix of its eigenvectors i.e.   TT EDEXX  where D is a diagonal matrix and E is a matrix of the eigenvectors of  T XX arranged as columns, and 1  EET . The matrix  T XX has r orthonormal eigenvectors each of dimensions mx1, where r is the rank of the matrix. The rank r of  T XX is usually less than m. We select the matrix P to be a matrix where each row i p is an eigenvector of  T XX . By this selection, 1  EEP T and IEEEEPP TT  1 . Thus,   DPPEDEXX TTT  and      TTTT YY PXXP n PXPX n YY n 1 1 1 1 1 1         D n PDPPP n TT 1 1 1 1     It is obvious that 1  EEP T results in a diagonal form of YY . This was the goal for PCA. We can summarize the results of PCA in the matrices P and YY as follows: (1) The principal components of X are the eigenvectors of  T XX or the rows of P, (2) The ith diagonal value of YY is the variance of X along i p . One benefit of PCA is that we can examine the variances of YY associated with the principal components. Often one finds that large variances associated with the first k < m principal components, and then a precipitous drop-off. One can conclude that the most interesting dynamics occur only in the first k dimensions. Example: We are given the two vectors 1x10 1x and 2x as follows:
  • 78. 78  1.1,5.1,1,2,3.2,1.3,9.1,2.2,5.0,5.21 x ,  9.0,6.1,1.1,6.1,7.2,0.3,2.2,9.2,7.0,4.22 x . The average of 1x is estimated as   81.11 xE and for 2x we have   91.12 xE . We subtract the means and get new zero mean data sets. The covariance matrix is calculated as               0.64490.5539 0.55390.5549 2211 xExxExE . The Eigen values are 2 0.049 and 1 1.284 with the corresponding Eigen vectors  735.0,6778.01 p and  6778.0,735.02 p . Notice that the Eigen vectors are normalized to have unity length. It is clear that there is one big Eigen value and the other is small. Thus, one could remove the space with the small Eigen value and retain only the space with the big Eigen value. Using the equation                                 mmm x x p p y y Y ......... 111 , we get the value of the transformed matrix as:                6778.0735.0 735.06778.0 2 1 y y Y       1.0181,-0.31,-,-0.31,-0.,1.09,0.79,0.99,0.290.49,-1.21 .711,-0.31,-0,0.19,-0.8,1.29,0.49,0.39,0.090.69,-1.31        165,0.02,-0.,-0.35,0.0-0.21,0.180.38,0.13,0.18,0.14,- .22.14,0.44,1.91,0.10,17,-1.68,-0-0.99,-0.20.83,1.78,- Notice that the values of the vector 1 y are much higher than the values of the vector 2 y . This is in agreement with the fact that the Eigen values corresponding to 1 p is much higher than the Eigen values corresponding to 2 p . Thus, we could retain the vector 1 y as a representative of the data and ignore the vector 2 y . Linearization: