1. Stochastic Section # 6
Linear Estimation, Model Selection & Hypothesis Test
Eslam Adel
April 10, 2018
1 Parameters Estimation of Model
Estimation of model parameters is an optimization problem. There is different methods for parameter estima-
tion. We talked about minimizing mean square error criteria. Here we have another one which is maximum
likelihood criteria.
1.1 Maximum Likelihood method (ML)
• Maximize the likelihood function or joint probability density function (jPDF).
• jPDF f(x) = f(x0, x1, x2, . . . , xN−1)
• For uncorrelated gaussian vector (independent)
f(x) = f(x0, x1, x2, . . . , xN ) = f(x0)f(x1)f(x2) . . . f(xN−1)
• For correlated signals use conditional probability
f(x) = f(x0, x1, x2, . . . , xN ) = f(xn|xn−1)f(xn−1|xn−2) . . . f(x2|x1)f(x1|x0)f(x0)
• We actually maximize the log(f(x)) where log operator converts multiplication to summation.
• Note: Solving optimization problem to find values of unknowns is out of study.
1.1.1 Example 1
Find the jPDF for the following MA(1) model.
x(n) = (n) + b1 (n − 1) (1)
Solution
n = 0, x0 = 0
n = 1, x1 = 1 + b1 0
n = 2, x2 = 2 + b1 1
In Matrix form
x0
x1
x2
...
xN−1
=
1 0 0 . . . 0
b1 1 0 . . . 0
0 b1 1 . . . 0
...
...
...
...
...
0 0 . . . b1 1
0
1
2
...
N−1
B =
1 0 0 . . . 0
b1 1 0 . . . 0
0 b1 1 . . . 0
...
...
...
...
...
0 0 . . . b1 1
1
2. So the model will be x = B where ∼ N(0, ) (uncorrelated White gaussian noise).
So x ∼ N(mx, x)
mx = E[x] = E[B ] = BE[ ] = 0
x = E[xxT
] − mxmT
x
x = E[B (B )T
] − 0
x = BE[ T
]BT
= B BT
Therefore
x ∼ N(0, x), x = B BT
And jPDF (likelihood function) :
f(x) =
1
(2π)
N
2 (det( x))
1
2
e
−1
2 (xT −1
x x)
(2)
1.1.2 Example 2
Find the jPDF for the following ARMA(1, 1) model.
x(n) = a1x(n − 1) + (n) + b1 (n − 1) (3)
Solution
Let y(n) = x(n) − a1x(n − k)
n = 0, y0 = x0
n = 1, y1 = x1 − a1x0
n = 2, y2 = x2 − a1x1
In Matrix form
y0
y1
y2
...
yN−1
=
1 0 0 . . . 0
−a1 1 0 . . . 0
0 −a1 1 . . . 0
...
...
...
...
...
0 0 . . . −a1 1
x0
x1
x2
...
xN−1
A =
1 0 0 . . . 0
−a1 1 0 . . . 0
0 −a1 1 . . . 0
...
...
...
...
...
0 0 . . . −a1 1
So y = Ax
And also y(n) = (n) + b1 (n − 1)
n = 0, y0 = 0
n = 1, y1 = 1 + b1 0
n = 2, y2 = 2 + b1 1
In Matrix form
y0
y1
y2
...
yN−1
=
1 0 0 . . . 0
b1 1 0 . . . 0
0 b1 1 . . . 0
...
...
...
...
...
0 0 . . . b1 1
0
1
2
...
N−1
B =
1 0 0 . . . 0
b1 1 0 . . . 0
0 b1 1 . . . 0
...
...
...
...
...
0 0 . . . b1 1
2
3. y = Ax = B → x = A−1
B where ∼ N(0, ) (uncorrelated White gaussian noise).
So x ∼ N(mx, x)
mx = E[x] = E[A−1
B ] = A−1
BE[ ] = 0
x = E[xxT
] − mxmT
x
x = E[A−1
B (A−1
B )T
] − 0
x = A−1
BE[ T
]BT
(A−1
)T
= A−1
B BT
(A−1
)T
Therefore
x ∼ N(0, x), x = A−1
B BT
(A−1
)T
And jPDF (likelihood function) :
f(x) =
1
(2π)
N
2 (det( x))
1
2
e
−1
2 (xT −1
x x)
(4)
2 Model Selection
Model selection is the next step after estimation of different models for the same signal. We will study a method
to compare different models and select best one of them.
2.1 AIC
• Akaik’s Information Criteria
• Gives a measure of relative quality of statistical models.
• It gives an estimate of relative information loss using such model.
• Best model has minimum information loss (Minimum AIC value)
AIC = 2k − 2ln(ˆL) (5)
where k is number of model parameters. ˆL is an estimate value of likelihood.
3 Hypothesis test
We have two or more different classes of signals. Each class is modeled with a specific model. For a new un-
known signal. We need to determine which class it belongs to. The idea is to calculate likelihood (probability)
of this signal given models of all classes and select most probable class (Max likelihood value).
For unknown signal x with N class data calculate
f1(x|Model1), f2(x|Model2), f3(x|Model3) , . . . , fN (x|ModelN )
signal x belongs to class with maximum likelihood value.
4 Demo
This is a simple example that demonstrates how to estimate different models for your signal, select on of
estimated models based on its AIC value, and apply hypothesis test to determine the class of unknown signal
using predefined models. So steps are
• Model estimation
• Model selection
• Hypothesis test
3
4. 1 c l o s e a l l
2 clc , c l e a r
3 %% Load EEG Dataset
4 % t h i s i s a motor imagery dataset . We have two c l a s s e s .
5 % Class 1 where subject imagines moving h i s r i g h t arm .
6 % Class 2 where subject imagines moving h i s l e f t arm .
7 % There i s d i f f e r e n t EEG channels C3 , C4 and Cz
8 % We w i l l s e l e c t only one of them .
9 load ( ’ dataset BCIcomp1 . mat ’ )
10
11 % We w i l l use f i r s t Channel from c l a s s 1
12 % Note : detrend −> makes the s i g n a l zero mean
13 dataClass1 = detrend ( x t r a i n (500:800 ,1 ,1) ) ;
14 %f i r s t Channel from c l a s s 2
15 dataClass2 = detrend ( x t r a i n (500:800 ,1 ,2) ) ;
16 data = [ dataClass1 , dataClass2 ] ;
17 %Test data from c l a s s 1
18 testData = detrend ( x t r a i n (801:1000 ,1 ,1) ) ;
19
20 % Empty c e l l array to be updated with s e l e c t e d models f o r both c l a s s 1 and 2
21 selectedModels = c e l l (2 ,1) ;
22 %% [ 1 ] Models Estimation
23 % MA(2) , AR(2) , ARMA(1 ,2) Models
24 % We have two c l a s s e s so we w i l l estimate models and s e l e c t one f o r each c l a s s .
25 f o r i = 1:2
26 [ model MA2 , ˜ , logL MA2 ] = estimate ( arima (0 ,0 ,2) , data ( : , i ) ) ;
27 [ model AR2 , ˜ , logL AR2 ] = estimate ( arima (2 ,0 ,0) , data ( : , i ) ) ;
28 [ model ARMA12 , ˜ , logL ARMA12 ] = estimate ( arima (1 ,0 ,2) , data ( : , i ) ) ;
29 models = {model MA2 , model AR2 , model ARMA12 };
30
31 %% [ 2 ] Model S e l e c t i o n based on AIC value
32 %Calculte Akiak ’ s information c r i t e r i a f o r a l l models
33 aic MA2 = 2∗2 − 2∗logL MA2 ;
34 aic AR2 = 2∗2 − 2∗logL AR2 ;
35 aic ARMA12 = 2∗3 − 2∗logL ARMA12 ;
36
37 % S e l e c t model with minimum a i c
38 [ ˜ , idx ] = min ( [ aic MA2 , aic AR2 , aic ARMA12 ] ) ;
39 % Update s e l e c t e d models
40 selectedModels { i } = models{ idx };
41 end
42
43 % I n i t i a l i z e an array to be updated with l i k e l i h o o d values f o r each c l a s s
44 l i k e l i h o o d V a l s = zeros (1 ,2) ;
45 %% [ 3 ] Hypothesis t e s t
46 f o r i = 1:2
47 % Calculate l i k e l i h o o d estimate f o r testData using s e l e c t e d model of each
48 % c l a s s
49 [ ˜ , ˜ , logL ] = estimate ( selectedModels { i } , testData ) ;
50 l i k e l i h o o d V a l s ( i ) = logL ;
51 end
52 %S e l e c t c l a s s with maximum l i k e l i h o o d value
53 [ ˜ , testDataClass ] = max( l i k e l i h o o d V a l s ) ;
54 display ( testDataClass )
Listing 1: Section Demo
You can download section demo from repository
5 Useful links
• System Identification Toolbox examples
• Different Model estimation methods
• estimate function
4