SlideShare a Scribd company logo
1 of 125
Download to read offline
SECTION – I




            REVIEW OF
RANDOM VARIABLES & RANDOM PROCESS




                                    1
REVIEW OF RANDOM VARIABLES..............................................................................3
  1.1 Introduction..............................................................................................................3
  1.2 Discrete and Continuous Random Variables ........................................................4
  1.3 Probability Distribution Function ..........................................................................4
  1.4 Probability Density Function .................................................................................5
  1.5 Joint random variable .............................................................................................6
  1.6 Marginal density functions......................................................................................6
  1.7 Conditional density function...................................................................................7
  1.8 Baye’s Rule for mixed random variables...............................................................8
  1.9 Independent Random Variable ..............................................................................9
  1.10 Moments of Random Variables ..........................................................................10
  1.11 Uncorrelated random variables..........................................................................11
  1.12 Linear prediction of Y from X ..........................................................................11
  1.13 Vector space Interpretation of Random Variables...........................................12
  1.14 Linear Independence ...........................................................................................12
  1.15 Statistical Independence......................................................................................12
  1.16 Inner Product .......................................................................................................12
  1.17 Schwary Inequality ..............................................................................................13
  1.18 Orthogonal Random Variables...........................................................................13
  1.19 Orthogonality Principle.......................................................................................14
  1.20 Chebysev Inequality.............................................................................................15
  1.21 Markov Inequality ...............................................................................................15
  1.22 Convergence of a sequence of random variables ..............................................16
  1.23 Almost sure (a.s.) convergence or convergence with probability 1 .................16
  1.24 Convergence in mean square sense ....................................................................17
  1.25 Convergence in probability.................................................................................17
  1.26 Convergence in distribution................................................................................18
  1.27 Central Limit Theorem .......................................................................................18
  1.28 Jointly Gaussian Random variables...................................................................19




                                                                                                                               2
REVIEW OF RANDOM VARIABLES

1.1 Introduction
    •       Mathematically a random variable is neither random nor a variable
    •       It is a mapping from sample space into the real-line ( “real-valued” random
            variable) or the complex plane ( “complex-valued ” random variable) .
Suppose we have a probability space {S , ℑ, P} .
Let X : S → ℜ be a function mapping the sample space S into the real line such that
For each s ∈ S , there exists a unique X ( s ) ∈ ℜ. Then X is called a random variable.
Thus a random variable associates the points in the sample space with real numbers.



                                   ℜ             Notations:
                                       X (s)        • Random variables are represented by
   s •                                                  upper-case letters.
                                                    • Values of a random variable are
                                                        denoted by lower case letters
        S                                           • Y = y means that y is the value of a
                                                        random variable X .

            Figure Random Variable


Example 1: Consider the example of tossing a fair coin twice. The sample space is S= {
HH,HT,TH,TT} and all four outcomes are equally likely. Then we can define a random
variable X as follows
                      Sample Point        Value of the random     P{ X = x}

                                               Variable X = x
                           HH                        0            1
                                                                  4

                           HT                        1            1
                                                                  4

                           TH                        2            1
                                                                  4

                           TT                        3            1
                                                                  4




                                                                                             3
Example 2: Consider the sample space associated with the single toss of a fair die. The
sample space is given by S = {1, 2,3, 4,5,6} . If we define the random variable                    X    that
associates a real number equal to the number in the face of the die, then X = {1, 2,3, 4,5,6}




1.2 Discrete and Continuous Random Variables
         •      A random variable X is called discrete if there exists a countable sequence of

                distinct real number               xi   such that   ∑ P ( x ) = 1.
                                                                     i
                                                                         m   i
                                                                                     Pm ( xi ) is called the

                probability mass function. The random variable defined in Example 1 is a
                discrete random variable.
         •      A continuous random variable X can take any value from a continuous
                interval
         •      A random variable may also de mixed type. In this case the RV takes
                continuous values, but at each finite number of points there is a finite
                probability.




1.3 Probability Distribution Function
We can define an event { X ≤ x} = {s / X ( s ) ≤ x, s ∈ S}
The probability
    FX ( x) = P{ X ≤ x} is called the probability distribution function.
Given         FX ( x),   we can determine the probability of any event involving values of the
random variable X .
    •        FX (x) is a non-decreasing function of X .
    •        FX (x) is right continuous
        => FX (x) approaches to its value from right.

    •        FX (−∞) = 0
    •        FX (∞) = 1
    •        P{x1 < X ≤ x} = FX ( x) − FX ( x1 )




                                                                                                           4
Example 3: Consider the random variable defined in Example 1. The distribution
function FX ( x) is as given below:


                        Value of the random Variable X = x                     FX ( x)

                                                  x<0                            0
                                                 0 ≤ x <1                        1
                                                                                 4

                                             1≤ x < 2                            1
                                                                                 2

                                                 2≤ x<3                          3
                                                                                 4

                                                  x≥3                            1



                            FX ( x)




                                                                                             X =x

1.4 Probability Density Function
                                                 d
If FX (x) is differentiable f X ( x) =              FX ( x) is called the probability density function and
                                                 dx
has the following properties.

        •    f X (x)        is a non- negative function

             ∞
        •    ∫f
            −∞
                  X   ( x)dx = 1


                                      x2

        •    P ( x1 < X ≤ x 2 ) =      ∫f
                                      − x1
                                             X   ( x)dx

Remark: Using the Dirac delta function we can define the density function for a discrete
random variables.


                                                                                                        5
1.5 Joint random variable
X and Y are two random variables defined on the same sample space                                                   S.
P{ X ≤ x, Y ≤ y} is called the joint distribution function and denoted by FX ,Y ( x, y ).

Given FX ,Y ( x, y ), -∞ < x < ∞, -∞ < y < ∞,                            we have a complete description of the

random variables X and Y .
     •    P{0 < X ≤ x, 0 < Y ≤ y} = FX ,Y ( x, y ) − FX ,Y ( x,0) − FX ,Y (0, y ) + FX ,Y ( x, y )

     •    FX ( x) = FXY ( x,+∞).
     To prove this

                ( X ≤ x ) = ( X ≤ x ) ∩ ( Y ≤ +∞ )
              ∴ F X ( x ) = P (X ≤ x ) = P (X ≤ x , Y ≤ ∞                                         )=   F   XY   ( x , +∞ )
                FX ( x) = P( X ≤ x ) = P( X ≤ x, Y ≤ ∞ ) = FXY ( x,+∞)


Similarly FY ( y ) = FXY (∞, y ).

     • Given FX ,Y ( x, y ), -∞ < x < ∞, -∞ < y < ∞,                            each of FX ( x) and FY ( y ) is

          called a marginal distribution function.
We can define joint probability density function f X ,Y ( x, y ) of the random variables X and Y

by
                              ∂2
         f X ,Y ( x, y ) =        FX ,Y ( x, y ) , provided it exists
                             ∂x∂y
     •     f X ,Y ( x, y ) is always a positive quantity.
                                x y
     •    FX ,Y ( x, y ) =     ∫∫
                               − ∞− ∞
                                        f X ,Y ( x, y )dxdy




1.6 Marginal density functions
                         fX (x) =           d
                                            dx    FX (x)
                                        =   d
                                            dx    FX ( x,∞ )
                                                  x       ∞
                                        =    d
                                             dx   ∫ ( ∫        f X ,Y ( x , y ) d y ) d x
                                                  −∞     −∞
                                             ∞
                                        = ∫           f X ,Y ( x , y ) d y
                                             −∞
                                            ∞
           and         fY ( y ) = ∫               f X ,Y ( x , y ) d x
                                          −∞




                                                                                                                     6
1.7 Conditional density function
f Y / X ( y / X = x) = f Y / X ( y / x) is called conditional density of Y given X .
Let us define the conditional distribution function.
We cannot define the conditional distribution function for the continuous random
variables X and Y by the relation
 F         ( y / x) = P(Y ≤ y / X = x)
  Y/X

                      P(Y ≤ y, X = x)
                  =
                         P ( X = x)

as both the numerator and the denominator are zero for the above expression.
The conditional distribution function is defined in the limiting sense as follows:
 F         ( y / x) = lim∆x →0 P (Y ≤ y / x < X ≤ x + ∆x)
     Y/X

                                 P (Y ≤ y , x < X ≤ x + ∆x)
                   =lim∆x →0
                                     P ( x < X ≤ x + ∆x)
                                  y
                                  ∫ f X ,Y ( x, u )∆xdu
                                  ∞
                   =lim∆x →0
                                        f X ( x ) ∆x
                      y
                      ∫ f X ,Y ( x, u )du
                   =∞
                            f X ( x)

The conditional density is defined in the limiting sense as follows
f Y / X ( y / X = x ) = lim ∆y →0 ( FY / X ( y + ∆y / X = x ) − FY / X ( y / X = x )) / ∆y
                          = lim ∆y→0,∆x→0 ( FY / X ( y + ∆y / x < X ≤ x + ∆x ) − FY / X ( y / x < X ≤ x + ∆x )) / ∆y    (1)


Because ( X = x) = lim ∆x →0 ( x < X ≤ x + ∆x)
The right hand side in equation (1) is
             lim ∆y →0,∆x→0 ( FY / X ( y + ∆y / x < X < x + ∆x) − FY / X ( y / x < X < x + ∆x)) / ∆y
                                 = lim ∆y →0, ∆x →0 ( P( y < Y ≤ y + ∆y / x < X ≤ x + ∆x)) / ∆y
                                 = lim ∆y →0, ∆x →0 ( P( y < Y ≤ y + ∆y, x < X ≤ x + ∆x)) / P ( x < X ≤ x + ∆x)∆y
                                 = lim ∆y →0, ∆x →0 f X ,Y ( x, y )∆x∆y / f X ( x)∆x∆y
                                 = f X ,Y ( x, y ) / f X ( x)

               ∴ fY / X ( x / y ) = f X ,Y ( x, y ) / f X ( x )                                        (2)

Similarly, we have
               ∴ f X / Y ( x / y ) = f X ,Y ( x, y ) / fY ( y )                                        (3)




                                                                                                                    7
From (2) and (3) we get Baye’s rule


                             f X ,Y (x, y)
    ∴ f X /Y (x / y) =
                                fY ( y)
                            f (x) fY / X ( y / x)
                          = X
                                    fY ( y)
                              f (x, y)
                          = ∞ X ,Y
                                                                                    (4)
                            ∫ f X ,Y (x, y)dx
                             −∞
                              f ( y / x) f X (x)
                          = ∞ Y/X
                            ∫ f X (u) fY / X ( y / x)du
                             −∞

Given the joint density function we can find out the conditional density function.


Example 4:
For random variables X and Y, the joint probability density function is given by
                                    1 + xy
                    f X ,Y ( x, y ) =       x ≤ 1, y ≤ 1
                                       4
                                   = 0 otherwise
Find the marginal density f X ( x), fY ( y ) and fY / X ( y / x). Are X and Y independent?

                  1 + xy
               1
                              1
  f X ( x) =   ∫
               −1
                     4
                         dy =
                              2
 Similarly
               1
  fY ( y ) =              -1 ≤ y ≤ 1
               2
 and
                     f X ,Y ( x, y ) 1 + xy
fY / X ( y / x ) =                  =       , x ≤ 1, y ≤ 1
                        f X ( x)        4
                   = 0 otherwise
∴ X and Y are not independent


1.8 Baye’s Rule for mixed random variables
Let X be a discrete random variable with probability mass function PX ( x) and Y be a
continuous random variable. In practical problem we may have to estimate X from
observed Y . Then



                                                                                             8
PX   /Y   ( x / y ) = li m                 ∆y→ 0         PX       /Y   ( x / y < Y ≤ y + ∆ y )∞
                                                         PX       ,Y   (x, y < Y ≤ y + ∆y)
                           = li m          ∆y→ 0
                                                                 PY ( y < Y ≤ y + ∆ y )
                                                       PX ( x ) fY / X ( y / x ) ∆ y
                         = lim           ∆y→ 0
                                                               fY ( y ) ∆ y
                               PX ( x ) fY / X ( y / x )
                       =
                                       fY ( y )
                                PX ( x ) fY / X ( y / x )
                    ==
                               ∑ PX ( x ) fY / X ( y / x )
                                  x




Example 5:
      X                                                                   Y
                             +


                        V

                X        is a binary random variable with
               ⎧                    1
               ⎪1 with probability 2
               ⎪
            X =⎨
               ⎪ −1 with probability 1
               ⎪
               ⎩                      2
                                                                              0 and variance σ .2
     V is the Gaussian noise with mean

     Then
                                                 PX ( x) fY / X ( y / x)
            PX / Y ( x = 1/ y ) =
                                                ∑ PX ( x) fY / X ( y / x)
                                                  x
                                  − ( y −1) 2 / 2σ 2
                              e
            =
                    − ( y −1)2 / 2σ 2                        / 2σ 2
                                        + (e − ( y +1)
                                                         2
                e




1.9 Independent Random Variable
     Let X and Y be two random variables characterised by the joint density function
             FX ,Y ( x, y ) = P{ X ≤ x, Y ≤ y}

     and f X ,Y ( x, y ) =               ∂2
                                        ∂x∂y
                                                FX ,Y ( x, y )

     Then X and Y are independent if f X / Y ( x / y ) = f X ( x) ∀x ∈ ℜ
     and equivalently
            f X ,Y ( x, y ) = f X ( x ) fY ( y ) , where                        f X (x)   and fY ( y) are called the marginal

     density functions.


                                                                                                                           9
1.10 Moments of Random Variables
     •    Expectation provides a description of the random variable in terms of a few
          parameters instead of specifying the entire distribution function or the density
          function
     •    It is far easier to estimate the expectation of a R.V. from data than to estimate its
          distribution
First Moment or mean
         The mean           µ X of a random variable X is defined by
          µ X = EX = ∑ xi P( xi ) for a discrete random variable X
                            ∞
                        = ∫ xf X ( x)dx for a continuous random variable X
                            −∞

     For any piecewise continuous function y = g ( x ) , the expectation of the R.V.
                                                           −∞

     Y = g ( X ) is given by EY = Eg ( X ) =               ∫ g ( x) f
                                                           −∞
                                                                        x   ( x )dx

Second moment
                        ∞
            EX 2 = ∫ x 2 f X ( x)dx
                     −∞
Variance
                    ∞
            σ X = ∫ ( x − µ X ) 2 f X ( x) dx
              2

                   −∞

     •     Variance is a central moment and measure of dispersion of the random variable
           about the mean.
     • σ x is called the standard deviation.
     •
For two random variables X and Y the joint expectation is defined as
                    ∞ ∞
E ( XY ) = µ X ,Y = ∫ ∫ xyf X ,Y ( x, y )dxdy
                    −∞ −∞

The correlation between random variables X and Y , measured by the covariance, is
given by
Cov( X , Y ) = σ XY = E ( X − µ X )(Y − µY )
                        = E ( XY -X µY - Y µ X + µ X µY )
                        = E ( XY ) -µ X µY

                                 E( X − µX )(Y − µY )           σ XY
The ratio          ρ=                                      =
                                E( X − µX )2 E(Y − µY )2       σ XσY

is called the correlation coefficient. The correlation coefficient measures how much two
random variables are similar.



                                                                                              10
1.11 Uncorrelated random variables
Random variables X and Y are uncorrelated if covariance
Cov ( X , Y ) = 0
Two random variables may be dependent, but still they may be uncorrelated. If there
exists correlation between two random variables, one may be represented as a linear
regression of the others. We will discuss this point in the next section.


1.12 Linear prediction of             Y     from   X

         Y = aX + b
         ˆ                          Regression
        Prediction error Y − Yˆ

Mean square prediction error
         E (Y − Y ) 2 = E (Y − aX − b) 2
                 ˆ

For minimising the error will give optimal values of a and b.               Corresponding to the
optimal solutions for a and b, we have
  ∂ E (Y − aX − b) 2 = 0
 ∂a
 ∂ E (Y − aX − b) 2 = 0
 ∂b
                                       1
Solving for a and b ,         Y − µY = 2 σ X ,Y ( x − µ X )
                               ˆ
                                           σX

                       σ                                σ XY
so that Y − µY = ρ X ,Y y ( x − µ X ) , where ρ X ,Y =
         ˆ                                                         is the correlation coefficient.
                       σx                              σ XσY
If ρ X ,Y = 0 then X and Y are uncorrelated.

         => Y − µ Y = 0
             ˆ
         => Y = µ Y is the best prediction.
             ˆ

Note that independence => Uncorrelatedness. But uncorrelated generally does not imply
independence (except for jointly Gaussian random variables).


Example 6:
    Y = X 2 and f X (x) is uniformly distributed between (1,-1).
   X and Y are dependent, but they are uncorrelated.
                    Cov( X , Y ) = σ X = E ( X − µ X )(Y − µ Y )
 Because                               = EXY = EX 3 = 0
                                       = EXEY (∵ EX = 0)

In fact for any zero- mean symmetric distribution of X, X and X 2 are uncorrelated.

                                                                                                11
1.13 Vector space Interpretation of Random Variables
The set of all random variables defined on a sample space form a vector space with
respect to addition and scalar multiplication. This is very easy to verify.


1.14 Linear Independence
Consider the sequence of random variables X 1 , X 2 ,.... X N .

If c1 X 1 + c2 X 2 + .... + cN X N = 0 implies that

            c1 = c2 = .... = cN = 0, then X 1 , X 2 ,.... X N . are linearly independent.


1.15 Statistical Independence
X 1 , X 2 ,.... X N are statistically independent if

    f X1, X 2 ,.... X N ( x1, x2 ,.... xN ) = f X1 ( x1 ) f X 2 ( x2 ).... f X N ( xN )
Statistical independence in the case of zero mean random variables also implies linear
independence


1.16 Inner Product
If x and y are real vectors in a vector space V defined over the field , the inner product
< x, y > is a scalar such that
∀x, y, z ∈ V and a ∈

                         1. < x, y > = < y, x >
                                             2
                         2. < x, x > = x ≥ 0
                         3. < x + y, z > = < x, z > + < y, z >
                         4. < ax, y > = a < x, y >

In the case of RVs, inner product between X and Y is defined as
                         ∞ ∞
< X, Y > = EXY = ∫ ∫ xy f X ,Y (x, y )dy dx.
                         −∞ −∞

Magnitude / Norm of a vector

                      x =< x, x >
                          2



So, for R.V.
                         2              ∞
                     X        = EX 2 = ∫ x 2 f X (x)dx
                                        −∞


•       The set of RVs along with the inner product defined through the joint expectation
           operation and the corresponding norm defines a Hilbert Space.


                                                                                            12
1.17 Schwary Inequality
         For any two vectors x and y belonging to a Hilbert space V

                     | < x, y > | ≤ x y

For RV X and Y
          E 2 ( XY ) ≤ EX 2 EY 2


Proof:
Consider the random variable Z = aX + Y
                           E (aX + Y ) 2 ≥ 0
                                                                   .
                       ⇒ a 2 EX 2 + EY 2 + 2aEXY ≥ 0
Non-negatively of the left-hand side => its minimum also must be nonnegative.
For the minimum value,
            dEZ 2              EXY
                  = 0 => a = −
             da                EX 2
                                                   E 2 XY            E 2 XY
so the corresponding minimum is                           + EY 2 − 2
                                                   EX 2              EX 2
Minimum is nonnegative =>
        E 2 XY
EY 2 −         ≥0
        EX 2
=> E 2 XY < EX 2 EY 2
                Cov( X , Y )        E ( X − µ X )(Y − µY )
 ρ ( X ,Y ) =                  =
                 σ Xxσ X           E ( X − µ X ) 2 E (Y − µY ) 2

From schwarz inequality
    ρ ( X ,Y ) ≤ 1


1.18 Orthogonal Random Variables
Recall the definition of orthogonality. Two vectors                     x and y are called orthogonal if
< x, y > = 0
Similarly two random variables X and Y are called orthogonal if EXY = 0
If each of X and Y is zero-mean
Cov( X ,Y ) = EXY
Therefore, if EXY = 0 then Cov ( XY ) = 0 for this case.
For zero-mean random variables,
                     Orthogonality            uncorrelatedness
                                                                                                     13
1.19 Orthogonality Principle
X is a random variable which is not observable. Y is another observable random variable
which is statistically dependent on X . Given a value of Y what is the best guess for X ?
(Estimation problem).
Let the best estimate be                         ˆ
                                                 X (Y ) . Then         E ( X − X (Y )) 2
                                                                               ˆ           is a minimum with respect
   ˆ
to X (Y ) .
And the corresponding estimation principle is called minimum mean square error
principle. For finding the minimum, we have
         ∂
        ∂Xˆ   E ( X − X (Y ))2 = 0
                      ˆ
              ∞ ∞
   ⇒     ∂
        ∂Xˆ   ∫ ∫ ( x − X ( y )) f X ,Y ( x, y )dydx = 0
                        ˆ       2

              −∞ −∞
              ∞ ∞
   ⇒     ∂
        ∂Xˆ   ∫ ∫ ( x − X ( y )) fY ( y ) f X / Y ( x )dyd x = 0
                        ˆ       2

              −∞ −∞
              ∞            ∞
   ⇒     ∂
        ∂Xˆ   ∫ fY ( y )( ∫ ( x − X ( y )) f X / Y ( x )dx )dy = 0
                                  ˆ       2

              −∞           −∞


Since         fY ( y )        in the above equation is always positive, therefore the
minimization is equivalent to
                                     ∞
                                     ∫ ( x − X ( y )) f X / Y ( x)dx = 0
                            ∂                ˆ       2
                           ∂Xˆ
                                 −∞
                                 ∞
                      Or 2 ∫ ( x − X ( y )) f X / Y ( x)dx = 0
                                   ˆ
                               -∞
                          ∞                                ∞
                      ⇒ ∫ X ( y ) f X / Y ( x)dx = ∫ xf X / Y ( x)dx
                          ˆ
                         −∞                                −∞

                      ⇒ X ( y) = E ( X / Y )
                        ˆ

Thus, the minimum mean-square error estimation involves conditional expectation which
is difficult to obtain numerically.
Let us consider a simpler version of the problem. We assume that                                   X ( y ) = ay and the
                                                                                                   ˆ

estimation problem is to find the optimal value for a. Thus we have the linear
minimum mean-square error criterion which minimizes E ( X − aY ) 2 .
                               d
                               da    E ( X − aY ) 2 = 0
                              ⇒ E da ( X − aY ) 2 = 0
                                  d


                              ⇒ E ( X − aY )Y = 0
                              ⇒ EeY = 0
where e is the estimation error.




                                                                                                                    14
The above result shows that for the linear minimum mean-square error criterion,
estimation error is orthogonal to data. This result helps us in deriving optimal filters to
estimate a random signal buried in noise.


The mean and variance also give some quantitative information about the bounds of RVs.
Following inequalities are extremely useful in many practical problems.


1.20 Chebysev Inequality
Suppose         X          is a parameter of a manufactured item with known          mean
µ X and variance σ X . The quality control department rejects the item if the absolute
                   2



deviation of X from µ X is greater than 2σ X . What fraction of the manufacturing item
does the quality control department reject? Can you roughly guess it?
The standard deviation gives us an intuitive idea how the random variable is distributed
about the mean. This idea is more precisely expressed in the remarkable Chebysev
Inequality stated below. For a random variable X with mean µ and variance σ 2
                                                            X               X


                                        σX
                                         2
      P{ X − µ X ≥ ε } ≤
                                        ε2
Proof:
                   ∞
      σ x = ∫ ( x − µ X ) 2 f X ( x)dx
        2

                −∞

               ≥        ∫        ( x − µ X ) 2 f X ( x)dx
                    X − µ X ≥ε

            ≥          ∫        ε 2 f X ( x) dx
                   X − µ X ≥ε

            = ε 2 P{ X − µ X ≥ ε }

                                             σX
                                              2
      ∴ P{ X − µ X ≥ ε } ≤
                                             ε2


1.21 Markov Inequality
For a random variable X which take only nonnegative values
                            E( X )
     P{ X ≥ a} ≤                                  where a > 0.
                             a
                ∞
    E ( X ) = ∫ xf X ( x)dx
                0
            ∞
         ≥ ∫ xf X ( x)dx
            a
           ∞
         ≥ ∫ af X ( x)dx
           a

         = aP{ X ≥ a}

                                                                                        15
E( X )
∴ P{ X ≥ a} ≤
                      a

                                       E ( X − k )2
Result: P{( X − k ) 2 ≥ a} ≤
                                            a


1.22 Convergence of a sequence of random variables
Let        X 1 , X 2 ,..., X n be a sequence n independent and identically distributed random

variables. Suppose we want to estimate the mean of the random variable on the basis of
the observed data by means of the relation
                                                      1N
                                               µX = ∑ Xi
                                               ˆ
                                                      n i =1
How closely does µ X represent
                 ˆ                                    µ X as    n is increased? How do we measure the

closeness between µ X and µ X ?
                  ˆ

Notice that µ X is a random variable. What do we mean by the statement µ X converges
            ˆ                                                          ˆ

to µ   X   ?
Consider a deterministic sequence x1 , x 2 ,....x n .... The sequence converges to a limit x if

correspond to any ε > 0 we can find a positive integer m such that x − xn < ε for n > m.

Convergence of a random sequence X 1 , X 2 ,.... X n .... cannot be defined as above.
A sequence of random variables is said to converge everywhere to X if
 X (ξ ) − X n (ξ ) → 0 for n > m and ∀ξ .



1.23 Almost sure (a.s.) convergence or convergence with probability 1
For the random sequence X 1 , X 2 ,.... X n ....

{ X n → X } this is an event.
      P{s | X n ( s ) → X ( s )} = 1      as     n → ∞,
If
      P{s X n ( s ) − X ( s ) < ε for n ≥ m} = 1        as     m → ∞,

then the sequence is said to converge to X almost sure or with probability 1.
One important application is the Strong Law of Large Numbers:
                                                                   1 n
If X 1 , X 2 ,.... X n .... are iid random variables, then           ∑ X i → µ X with probability 1as n → ∞.
                                                                   n i =1




                                                                                                          16
1.24 Convergence in mean square sense
If E ( X n − X ) 2 → 0           as       n → ∞, we say that the sequence converges to X in mean

square (M.S).


Example 7:
If X 1 , X 2 ,.... X n .... are iid random variables, then
1N
  ∑ X i → µ X in the mean square 1as n → ∞.
n i =1
                             1N
 We have to show that lim E ( ∑ X i − µ X )2 = 0
                      n →∞   n i =1
Now,
   1N                    1 N
E ( ∑ X i − µ X ) 2 = E ( ( ∑ ( X i − µ X )) 2
   n i =1                n i =1
                      1 N                      1 n n
                    = 2 ∑ E ( X i − µ X ) 2 + 2 ∑ ∑ E ( X i − µ X )( X j − µ X )
                      n i=1                    n i=1 j=1,j≠i
                            nσ X2
                        =      2
                                  +0 ( Because of independence)
                             n
                            σX
                             2
                        =
                       n
         1N
∴ lim E ( ∑ X i − µ X ) 2 = 0
  n→∞    n i =1


1.25 Convergence in probability
P{ X n − X > ε } is a sequence of probability. X n is said to convergent to X in

probability if this sequence of probability is convergent that is
P{ X n − X > ε } → 0               as        n → ∞.

If a sequence is convergent in mean, then it is convergent in probability also, because
P{ X n − X   2
                 > ε 2 } ≤ E ( X n − X )2 / ε 2   (Markov Inequality)
We have
P{ X n − X > ε } ≤ E ( X n − X ) 2 / ε 2

If E ( X n − X ) 2 → 0              as        n → ∞, (mean square convergent) then

P{ X n − X > ε } → 0 as                  n → ∞.




                                                                                               17
Example 8:
Suppose { X n } be a sequence of random variables with

                      1
P ( X n = 1} = 1 −
                      n
and
                  1
P ( X n = −1} =
                  n
Clearly
                                      1
P{ X n − 1 > ε } = P{ X n = −1} =       →0
                                      n
  as n → ∞.

Therefore { X n } ⎯⎯ { X = 0}
                   P
                     →




1.26 Convergence in distribution
The sequence X 1 , X 2 ,.... X n .... is said to converge to X in distribution if

F X n ( x ) → FX ( x )    as n → ∞.

Here the two distribution functions eventually coincide.




1.27 Central Limit Theorem
Consider independent and identically distributed random variables X 1 , X 2 ,.... X n .

Let Y = X 1 + X 2 + ... X n

Then µ Y = µ X 1 + µ X 2 + ...µ X n

And σ Y = σ X1 + σ X 2 + ... + σ X n
      2     2      2             2



The central limit theorem states that under very general conditions Y converges to
N ( µ Y , σ Y ) as n → ∞. The conditions are:
            2



    1. The random variables X , X ,..., X              are independent with same mean and
                             1   2        n

          variance, but not identically distributed.
    2. The random variables X , X ,..., X are independent with different mean and
                             1   2       n

          same variance and not identically distributed.




                                                                                          18
1.28 Jointly Gaussian Random variables
Two random variables X and Y are called jointly Gaussian if their joint density function
is
                                                    ⎡ ( x−µ X   )2              ( x− µ    )( y − µ ) ( y − µ )2 ⎤
                               −         1
                                           2        ⎢      2
                                                                     − 2 ρ XY        σ
                                                                                         X
                                                                                           σ
                                                                                                  Y +       Y
                                                                                                            2   ⎥
                                   2 (1− ρ X ,Y )   ⎢ σX                                                 σY     ⎥
     f X ,Y ( x, y ) = Ae                           ⎣                                                           ⎦
                                                                                         X Y




              where     A=                    1
                               2πσ xσ y 1− ρ X ,Y
                                             2




Properties:
(1) If X and Y are jointly Gaussian, then for any constants a and b, then the random
variable
      Z , given by Z = aX + bY is Gaussian with mean µ Z = aµ X + bµY and variance

      σ Z 2 = a 2σ X 2 + b 2σ Y 2 + 2abσ X σ Y ρ X ,Y

(2) If two jointly Gaussian RVs are uncorrelated, ρ X ,Y = 0 then they are statistically

      independent.
       f X ,Y ( x, y ) = f X ( x ) f Y ( y ) in this case.

(3) If f X ,Y ( x, y ) is a jointly Gaussian distribution, then the marginal densities

     f X ( x) and f Y ( y ) are also Gaussian.


(4) If     X and Y are joint by Gaussian random variables then the optimum nonlinear
     estimator X of X that minimizes the mean square error ξ = E{[ X − X ] 2 } is a linear
               ˆ                                                       ˆ

     estimator X = aY
               ˆ




                                                                                                                    19
REVIEW OF RANDOM PROCESS

2.1 Introduction

Recall that a random variable maps each sample point in the sample space to a point in
the real line. A random process maps each sample point to a waveform.
   •   A random process can be defined as an indexed family of random variables
       { X (t ), t ∈ T } where T is an index set which may be discrete or continuous usually

       denoting time.
   •   The random process is defined on a common probability space {S , ℑ, P}.

   •          A random process is a function of the sample point ξ and index variable t and
       may be written as X (t , ξ ).
   •          For a fixed t (= t 0 ), X (t 0 , ξ ) is a random variable.

   •          For a fixed ξ (= ξ 0 ), X (t , ξ 0 ) is a single realization of the random process and
       is a deterministic function.
   •          When both t and ξ are varying we have the random process X (t , ξ ).
The random process X (t , ξ ) is normally denoted by X (t ).
We can define a discrete random process X [n] on discrete points of time. Such a random
process is more important in practical implementations.



                                                   X (t , s3 )


            S


       s3                                            X (t , s2 )


                 s1
         s2
                                                     X (t , s1 )
                                                                                   t
                                       Figure Random Process
2.2 How to describe a random process?

To describe X (t ) we have to use joint density function of the random variables at
different t .
For any positive integer n , X (t1 ), X (t 2 ),..... X (t n ) represents n jointly distributed

random variables. Thus a random process can be described by the joint distribution
function FX ( t1 ), X ( t2 )..... X ( tn ) ( x1 , x 2 .....x n ) = F ( x1 , x 2 .....x n , t1 , t 2 .....t n ), ∀n ∈ N and ∀t n ∈ T

Otherwise we can determine all the possible moments of the process.
E ( X (t )) = µ x (t ) = mean of the random process at t.

R X (t1 , t 2 ) = E ( X (t 1) X (t 2 )) = autocorrelation function at t1 ,t 2
R X (t1 , t 2 , t 3 ) = E ( X (t 1 ), X (t 2 ), X (t 3 )) = Triple correlation function at t1 , t 2 , t 3 , etc.

We can also define the auto-covariance function C X (t1 , t 2 ) of X (t ) given by
             C X (t1 , t 2 ) = E ( X (t 1 ) − µ X (t1 ))( X (t 2 ) − µ X (t 2 ))
                              = R X (t1 , t 2 ) − µ X (t1 ) µ X (t 2 )



Example 1:

(a) Gaussian Random Process
     For any positive integer n, X (t1 ), X (t 2 ),..... X (t n ) represent                                   n jointly             random

     variables.            These            n        random                 variables      define         a       random             vector
      X = [ X (t1 ), X (t2 ),..... X (tn )]'. The process X (t ) is called Gaussian if the random vector

     [ X (t1 ), X (t2 ),..... X (tn )]' is jointly Gaussian with the joint density function given by
                                                                  1
                                                                 − X'C−1X
                                                                e 2 X
      f X (t1 ), X (t2 )... X (tn ) ( x1 , x2 ,...xn ) =                                 where C X = E ( X − µ X )( X − µ X ) '
                                                           (        )
                                                                        n
                                                               2π            det(CX )

                                                                    and µ X = E ( X) = [ E ( X 1 ), E ( X 2 )......E ( X n ) ] '.

(b) Bernouli Random Process
(c) A sinusoid with a random phase.
2.3 Stationary Random Process

A random process X (t ) is called strict-sense stationary if its probability structure is
invariant with time. In terms of the joint distribution function


FX ( t1 ), X (t 2 )..... X ( t n ) ( x1 , x 2 ..... x n ) = FX ( t1 + t0 ), X ( t 2 + t0 )..... X ( t n + t0 ) ( x1 , x 2 .....x n ) ∀n ∈ N and ∀t 0 , t n ∈ T


For n = 1,
     FX ( t1 ) ( x1 ) = FX ( t1 +t0 ) ( x1 ) ∀t 0 ∈ T

 Let us assume t 0 = −t1

      FX (t1 ) ( x1 ) = FX ( 0 ) ( x1 )
 ⇒ EX (t1 ) = EX (0) = µ X (0) = constant

For n = 2,
    FX ( t1 ), X ( t 2 ) ( x1 , x 2 .) = FX ( t1 +t0 ), X ( t 2 +t0 ) ( x1 , x 2 )

Put t 0 = −t 2

FX ( t1 ), X ( t2 ) ( x1 , x2 ) = FX ( t1 −t2 ), X ( 0 ) ( x1 , x2 )
⇒ R X (t1 , t2 ) = R X (t1 − t2 )
A random process X (t ) is called wide sense stationary process (WSS) if
       µ X (t ) = constant
       R X (t1 , t2 ) = R X (t1 − t2 ) is a function of time lag.

For a Gaussian random process, WSS implies strict sense stationarity, because this
process is completely described by the mean and the autocorrelation functions.
The autocorrelation function                                R X (τ ) = EX (t + τ ) X (t ) is a crucial quantity for a WSS
process.
      •      RX (0) = EX 2 (t ) is the mean-square value of the process.

      •      RX ( −τ ) = RX (τ )for real process (for a complex process X(t), RX ( −τ ) = R* (τ )
                                                                                           X


      •       RX (τ ) <RX (0) which follows from the Schwartz inequality

                                  RX (τ ) = {EX (t ) X (t + τ )}2
                                   2


                                             = < X (t ), X (t + τ ) >
                                                                                 2


                                                                X (t + τ )
                                                            2                   2
                                              ≤ X (t )
                                              = EX 2 (t ) EX 2 (t + τ )
                                              = RX (0) RX (0)
                                                 2      2


                                 ∴ RX (τ ) <RX (0)
•       RX (τ ) is a positive semi-definite function in the sense that for any    positive
                                                n   n
       integer n and real a j , a j ,          ∑ ∑ ai a j RX (ti , t j )>0
                                               i =1 j =1


   •   If X (t ) is periodic (in the mean square sense or any other sense like with

       probability 1), then R X (τ ) is also periodic.
For a discrete random sequence, we can define the autocorrelation sequence similarly.
   •   If R X (τ ) drops quickly , then the signal samples are less correlated which in turn
       means that the signal has lot of changes with respect to time. Such a signal has
       high frequency components. If R X (τ ) drops slowly, the signal samples are highly
       correlated and such a signal has less high frequency components.
   •   R X (τ ) is directly related to the frequency domain representation of WSS process.



2.4 Spectral Representation of a Random Process

How to have the frequency-domain representation of a random process?
       •     Wiener (1930) and Khinchin (1934) independently discovered the spectral
             representation of a random process. Einstein (1914) also used the concept.
       •     Autocorrelation function and power spectral density forms a Fourier transform
             pair
Lets define
                    X T (t) = X(t)        -T < t < T
                           = 0              otherwise
as t → ∞, X T (t ) will represent the random process X (t ).
                      T
Define X T ( w) =     ∫X
                     −T
                           T   (t)e − jwt dt in mean square sense.
                                                 t2




                                     −τ                                      t1
                                       −τ
                                                    dτ
X T (ω ) X T * (ω )   | X (ω ) |2    1 TT                         − jωt + jωt
    E                        =E T          =     ∫ ∫ EX T (t1 ) X T (t 2 )e 1 e 2 dt1dt 2
                2T                 2T        2T −T −T
                                                                 1 T T                 − jω ( t1 −t2 )
                                                            =       ∫ ∫ RX (t1 − t2 )e                 dt1dt2
                                                                2T −T −T
                                                                1 2T          − jωτ
                                                           =       ∫ RX (τ )e       (2T − | τ |)dτ
                                                               2T −2T

   Substituting t1 − t 2 = τ so that                       t 2 = t1 − τ is a line, we get

        X T (ω ) X T * (ω ) 2T                      |τ |
    E                      = ∫ R x (τ )e − jωτ (1 −      ) dτ
               2T           − 2T
                                                    2T
   If R X (τ ) is integrable then as T → ∞,

                      E X T (ω )
                                     2
                                   ∞
            limT →∞              = ∫ RX (τ )e − jωτ dτ
                         2T       −∞


 E X T (ω )
               2

                   = contribution to average power at freq ω and is called the power spectral
       2T
density.
                           ∞
Thus
                           ∫ R (τ )e
                                              − jωτ
               SX (ω ) =         x                    dτ
                           −∞
                           ∞
        and R X (τ ) =     ∫
                           −∞
                                SX (ω )e jωτ dw



Properties
                                         ∞
   •         EX 2 (t) = R X (0) = ∫ S X (ω )dw = average power of the process.
                                         −∞

                                                                             w2
   •     The average power in the band ( w1 , w2) is ∫ S X ( w)dw
                                                                              w1


   •         R X (τ ) is real and even ⇒ S X (ω ) is real, even.

                                                                    E X T (ω )
                                                                                   2

   •     From the definition S X ( w) = limT →∞                                is always positive.
                                                                       2T
                         S x (ω )
   •        h X ( w) =             = normalised power spectral density and has properties of PDF,
                         EX 2 (t )
         (always +ve and area=1).
2.5 Cross-correlation & Cross power Spectral Density
Consider two real random processes X(t) and Y(t).
Joint stationarity of X(t) and Y(t) implies that the joint densities are invariant with shift
of time.
The cross-correlation function RX,Y (τ ) for a jointly wss processes        X(t) and Y(t) is

defined as
                 R X,Y (τ ) = Ε X (t + τ )Y (t )
     so that RYX (τ ) = Ε Y (t + τ ) X (t )
                              = Ε X (t )Y (t + τ )
                              = R X,Y ( − τ )
     ∴         RYX (τ ) = R X,Y ( − τ )
Cross power spectral density
                              ∞
           S X ,Y ( w) = ∫ RX ,Y (τ )e− jwτ dτ
                              −∞

For real processes X(t) and Y(t)

           S X ,Y ( w) = SY , X ( w)
                          *



The Wiener-Khinchin theorem is also valid for discrete-time random processes.
If we define R X [m] = E X [n + m] X [n]
Then corresponding PSD is given by
                    ∞
     S X ( w) = ∑ Rx [ m ] e − jω m              −π ≤ w ≤ π
                  m =−∞
                          ∞
    or S X ( f ) = ∑ Rx [ m] e− j 2π m               −1 ≤ f ≤ 1
                        m =−∞

                  1 π           jωm
 ∴ RX [m ] =         ∫ S X ( w)e dw
                 2π −π
For a discrete sequence the generalized PSD is defined in the z − domain as follows
                   ∞
    S X ( z) =    ∑ R [m ] z
                 m =−∞
                          x
                                     −m




If we sample a stationary random process uniformly we get a stationary random sequence.
Sampling theorem is valid in terms of PSD.
Examples 2:
                           −a τ
     (1)    R X (τ ) = e           a>0
                           2a
            S X ( w) =                     -∞ < w < ∞
                         a + w2
                           2

                         m
     (2) RX ( m) = a              a >0
                              1 − a2
            S X ( w) =                           -π ≤ w ≤ π
                         1 − 2a cos w + a 2




2.6 White noise process
                                                         S x (f )



                                                                             → f

A white noise process X (t ) is defined by
                               N
               SX ( f ) =                         −∞ < f < ∞
                               2
The corresponding autocorrelation function is given by
                             N
               RX (τ ) =       δ (τ )    where δ (τ ) is the Dirac delta.
                             2
The average power of white noise
                         ∞  N
              Pavg = ∫        df → ∞
                         −∞ 2


•   Samples of a white noise process are uncorrelated.
•   White noise is an mathematical abstraction, it cannot be realized since it has infinite
    power
•   If the system band-width(BW) is sufficiently narrower than the noise BW and noise
    PSD is flat , we can model it as a white noise process. Thermal and shot noise are well
    modelled as white Gaussian noise, since they have very flat psd over very wide band
    (GHzs
•   For a zero-mean white noise process, the correlation of the process at any lag τ ≠ 0 is
    zero.
•   White noise plays a key role in random signal modelling.
•   Similar role as that of the impulse function in the modeling of deterministic signals.

                                                     Linear
                           White Noise               System                 WSS Random Signal
2.7 White Noise Sequence
For a white noise sequence x[n],
                         N
            S X ( w) =                               −π ≤ w ≤ π
                         2
Therefore
                 N
         R X ( m) =δ ( m)
                 2
where δ (m) is the unit impulse sequence.
                                                                          S X (ω )
                                    RX [ m]                          NN
                                                                      N
                                                                      N
                             N
                                                                     22
                                                                      2
                                                                      2
                             2


                      • •        • • •               m→
                                               −π
2.8 Linear Shift Invariant System with Random Inputs                                 π   →ω
Consider a discrete-time linear system with impulse response h[n].

                             h[n]
                                         y[n]
         x[n]

        y[n] = x[n] * h[n]
        E y[n] = E x[n] * h[n]
For stationary input x[n]
                                               l
       µY = E y[n] = µ X * h[n] = µ X ∑ h[n]
                                              k =0

where l is the length of the impulse response sequence
       RY [m] = E y[n] y[n − m]
                = E ( x[n] * h[n]) * ( x[n − m] * h[n − m])
                = R X [m] * h[m] * h[−m]
 RY [m] is a function of lag m only.
 From above we get
   S Y (w) = | Η ( w) | 2 S X ( w)


 S XX (w)                                          SYY (w)
                             H(w )
                                     2
Example 3:
Suppose
             H ( w) = 1          − wc ≤ w ≤ wc
                       = 0 otherwise
                          N
             SX ( w) =              −∞ ≤ w≤ ∞                                     R X (τ )
                          2
                          N
Then        SY ( w) =             − wc ≤ w ≤ wc
                          2
                       N
                                                                                              τ
and         R Y (τ ) =   sinc(w cτ )
                       2


        •     Note that though the input is an uncorrelated process, the output is a correlated
              process.
Consider the case of the discrete-time system with a random sequence x[n ] as an input.

                                                          x[n ]                       y[n ]
                                                                       h[n ]
               RY [m] = R X [m] * h[m] * h[ −m]
Taking the z − transform, we get
               S Y ( z ) = S X ( z ) H ( z ) H ( z −1 )

Notice that if H (z ) is causal, then H ( z −1 ) is anti causal.

Similarly if H (z ) is minimum-phase then H ( z −1 ) is maximum-phase.

                          R XX [m ]                                            RYY [m ]
                                                H ( z)            H ( z −1 )
                              S XX ( z )                                       SYY [z ]



Example 4:
                      1
If     H ( z) =                  and x[n ] is a unity-variance white-noise sequence, then
                  1 − α z −1
                       SYY ( z ) = H ( z ) H ( z −1 )
                                     ⎛ 1 ⎞⎛ 1 ⎞ 1
                                    =⎜        −1 ⎟⎜       ⎟
                                     ⎝ 1 − α z ⎠⎝ 1 − α z ⎠ 2π
By partial fraction expansion and inverse z − transform, we get
                                             1
                           RY [m ] =             a |m|
                                           1−α 2
2.9 Spectral factorization theorem
A stationary random signal                                       X [n ]         that satisfies the Paley Wiener condition
 π
 ∫ | ln S X ( w) | dw < ∞ can be considered as an output of a linear filter fed by a white noise
−π

sequence.
If S X (w) is an analytic function of w ,
       π
and ∫ | ln S X ( w) | dw < ∞ , then S X ( z ) = σ v2 H c ( z ) H a ( z )
       −π

where
H c ( z ) is the causal minimum phase transfer function

H a ( z ) is the anti-causal maximum phase transfer function

and σ v2 a constant and interpreted as the variance of a white-noise sequence.
                 Innovation sequence
       v[n]                                                                     X [n ]
                                          H c ( z)

                      Figure Innovation Filter
Minimum phase filter => the corresponding inverse filter exists.

                                                       1                                 v[n]
                       X [n ]                        H c ( z)


                                     Figure whitening filter
                                                                                                1
Since ln S XX ( z ) is analytic in an annular region ρ < z <                                        ,
                                                                                                ρ
                              ∞
             ln S XX ( z ) = ∑ c[k ]z − k
                             k =−∞

                      1 π              iwn
where c[k ] =           ∫ ln S XX ( w)e dw is the kth order cepstral coefficient.
                     2π −π
For a real signal c[k ] = c[−k ]
                   1 π
and c[0] =           ∫ ln S XX ( w)dw
                  2π −π
                              ∞
                              ∑ c[ k ] z − k
            S XX ( z ) = e   k =−∞

                                         ∞                     −1
                                         ∑ c[ k ] z − k        ∑ c[ k ] z − k
                       =e    c[0]
                                     e   k =1             e   k =−∞

                               ∞
                               ∑ c[ k ] z − k
     Let      H C ( z ) = ek =1                                       z >ρ
                      = 1 + hc (1)z -1 + hc (2) z −2 + ......
(∵ hc [0] = Lim z →∞ H C ( z ) = 1
H C ( z ) and ln H C ( z ) are both analytic

=> H C ( z ) is a minimum phase filter.
Similarly let
                                 −1
                                 ∑ c( k ) z− k
               H a ( z) = e    k =−∞

                                 ∞
                                 ∑ c( k ) zk                                1
                          = e   k =1           = H C ( z −1 )        z<
                                                                            ρ
Therefore,
S XX ( z ) = σ V H C ( z ) H C ( z −1 )
               2



where σ V = ec ( 0)
        2




Salient points
     •     S XX (z ) can be factorized into a minimum-phase and a maximum-phase factors

           i.e. H C ( z ) and H C ( z −1 ).
     •     In general spectral factorization is difficult, however for a signal with rational
           power spectrum, spectral factorization can be easily done.
     •     Since is a minimum phase filter,                        1      exists (=> stable), therefore we can have a
                                                                H C (z)

                     1
           filter           to filter the given signal to get the innovation sequence.
                    HC ( z)

     •      X [n ] and v[n] are related through an invertible transform; so they contain the
           same information.



2.10 Wold’s Decomposition
     Any WSS signal X [n ] can be decomposed as a sum of two mutually orthogonal
     processes
     •     a regular process X r [ n] and a predictable process X p [n] , X [n ] = X r [n ] + X p [n ]

     •      X r [ n ] can be expressed as the output of linear filter using a white noise

           sequence as input.
     •      X p [n] is a predictable process, that is, the process can be predicted from its own

           past with zero prediction error.
RANDOM SIGNAL MODELLING

3.1 Introduction
   The spectral factorization theorem enables us to model a regular random process as
   an output of a linear filter with white noise as input. Different models are developed
   using different forms of linear filters.
      •   These models are mathematically described by linear constant coefficient
          difference equations.
      •   In statistics, random-process modeling using difference equations is known as
          time series analysis.

3.2 White Noise Sequence
   The simplest model is the white noise                v[n] . We shall assume that   v[n] is of 0-

   mean and variance σ V .
                       2




                                            RV [m]



                                        •    •          •    •             m

                                             SV ( w )
                                                            σV2

                                                            2π

                                                                  w




3.3 Moving Average model MA(q ) model

                                                FIR
                  v[n]                                                   X [ n ]
                                               filter


   The difference equation model is
                          q
             X [ n] = ∑ bi v[ n − i ]
                         i =o

    µe = 0 ⇒ µ X = 0
    and v[n] is an uncorrelated sequence means
            q
    σ X 2 = ∑ bi 2σ V
                    2
           i =0
The autocorrelations are given by
                 RX [ m] = E X [ n ] X [ n - m]
                                  q    q
                            = ∑ ∑ bib j Ev[n − i ]v[n − m − j ]
                                 i = 0 j =0
                                  q    q
                            = ∑ ∑ bib j RV [m − i + j ]
                                 i = 0 j =0


   Noting that RV [m] = σ V δ [m] , we get
                          2



    RV [m] = σ V when
               2


    m-i + j = 0
    ⇒ i=m+ j

   The maximum value for m + j is q so that
                q −m
    RX [m ] = ∑ b j b j + mσ V
                             2
                                              0≤m≤q
                 j =0

    and
    RX [ − m ] = R X [m ]
                                                                      q− m


   Writing the above two relations together
                                                          RX [m ] =   ∑bb
                                                                      j =0
                                                                             j   j+ m
                                                                                        σ v2   m ≤q

                                                                  = 0 otherwise
   Notice that, RX [ m ] is related by a nonlinear relationship with model parameters. Thus
   finding the model parameters is not simple.
   The power spectral density is given by
                σV2
                                                       − jw
                    B ( w) , where B( w) = = bο + b1 e      + ......bq e − jqw
                          2
   S X ( w) =
                2π
   FIR system will give some zeros. So if the spectrum has some valleys then MA will fit
   well.


   3.3.1 Test for MA process

   RX [m] becomes zero suddenly after some value of m.




RX [m]




                                                  m
            Figure: Autocorrelation function of a MA process
Figure: Power spectrum of a MA process



              S X ( w)                                 R X [ m]




                                                                       q   →m
                                    →ω



Example 1: MA(1) process
  X [n ] = b1v[n − 1] + b0v[n ]
  Here the parameters b0 and b1 are tobe determined.
  We have
  σ X = b12 + b0
    2          2


  RX [1] = b1b0

From above b0 and b1 can be calculated using the variance and autocorrelation at lag 1 of
the signal.



3.4 Autoregressive Model
     In time series analysis it is called AR(p) model.

                    v[n]
                                               IIR                X [n ]

                                              filter

    The model is given by the difference equation
                     p
          X [ n] = ∑ ai X [ n − i ] + v[ n]
                    i =1

   The transfer function A(w) is given by
1
          A( w) =            n
                        1 − ∑ ai e − jωi
                            i =1

                                                                      σ e2
      with a0 = 1 (all poles model) and S X ( w) =
                                                                  2π | A(ω ) |2

      If there are sharp peaks in the spectrum, the AR(p) model may be suitable.
                                             S X (ω )
                                                                                  R X [ m]




                                                               →ω                            →m

     The autocorrelation function RX [ m ] is given by
            RX [ m] = E X [ n ] X [ n - m]
                                   p
                            = ∑ ai EX [n − i ] X [ n − m] + Ev[n] X [n − m]
                                 i =1
                                   p
                            = ∑ ai RX [m − i ] + σ V δ [m]
                                                   2
                                 i =1

               p
∴ RX [ m] = ∑ ai RX [ m − i ] + σ V δ [ m]
                                  2
                                                        ∨m∈I
             i =1

The above relation gives a set of linear equations which can be solved to find a i s.

These sets of equations are known as Yule-Walker Equation.


Example 2: AR(1) process
    X [n] = a1 X [n − 1] + v[ n]
   RX [m] = a1RX [m − 1] + σ V δ [ m]
                             2


   ∴ RX [0] = a1RX [ −1] + σ V
                             2
                                              (1)
   and RX [1] = a1RX [0]
                        RX [1]
   so that a1 =
                        RX [0]
                                        σV
                                         2
   From (1) σ X = RX [0] =
              2
                          1- a 2
   After some arithmatic we get
               a σV
                    m2
   RX [ m] =
                1- a 2
3.5 ARMA(p,q) – Autoregressive Moving Average Model
Under the most practical situation, the process may be considered as an output of a filter
that has both zeros and poles.

                                                                                        X [ n]
                      v[n]                                            B ( w)
                                                           H ( w) =
                                                                      A( w)

    The model is given by
                  p                             q
         x[n] = ∑ ai X [ n − i ] + ∑ bi v[ n − i ]                                               (ARMA 1)
                 i =1                          i =0

    and is called the ARMA( p, q ) model.
    The transfer function of the filter is given by
                               B (ω )
             H ( w) =
                               A(ω )
                                B (ω ) σ V
                                         22

             S X ( w) =
                                A(ω ) 2π
                                          2



    How do get the model parameters?
    For m ≥ max( p, q + 1), there will be no contributions from bi terms to RX [ m].
                         p
         RX [ m] = ∑ ai RX [ m − i ]                            m ≥ max( p, q + 1)
                        i =1


    From a set of              p Yule Walker equations, ai parameters can be found out.

    Then we can rewrite the equation
                                                       p
                 X [n] = X [n] − ∑ ai X [n − i ]
                                                      i =1
                                         q
                ∴ X [n] = ∑ bi v[n − i ]
                                        i =0


    From the above equation bi s can be found out.

    The ARMA( p, q ) is an economical model. Only AR ( p ) only MA( q ) model may
    require a large number of model parameters to represent the process adequately. This
    concept in model building is known as the parsimony of parameters.
    The difference equation of the ARMA( p, q ) model, given by eq. (ARMA 1) can be
    reduced to p first-order difference equation give a state space representation of the
    random process as follows:
                                                             z[n] = Az[n − 1] + Bu[n]
                                                             X [n] = Cz[n]

    where
z[n] = [ x[ n] X [n − 1].... X [ n − p]]′
              ⎡ a1 a2 ......a p ⎤
              ⎢                 ⎥
           A= ⎢ 0 1......0 ⎥ , B = [1 0...0]′ and
              ⎢............... ⎥
              ⎢                 ⎥
              ⎢ 0 0......1 ⎥
              ⎣                 ⎦
           C = [b0 b1...bq ]

    Such representation is convenient for analysis.


3.6 General ARMA( p, q) Model building Steps
      •   Identification of p and q.
      •   Estimation of model parameters.
      •   Check the modeling error.
      •   If it is white noise then stop.
          Else select new values for p and q
          and repeat the process.


3.7 Other model: To model nonstatinary random processes
     •    ARIMA model: Here after differencing the data can be fed to an ARMA
          model.
     •    SARMA model: Seasonal ARMA model etc. Here the signal contains a
          seasonal fluctuation term. The signal after differencing by step equal to the
          seasonal period becomes stationary and ARMA model can be fitted to the
          resulting data.
CHAPTER – 4: ESTIMATION THEORY

4.1 Introduction
    •     For speech, we have LPC (linear predictive code) model, the LPC-parameters are
          to be estimated from observed data.
    •     We may have to estimate the correct value of a signal from the noisy observation.


In RADAR signal processing                             In sonar signal processing



                                                        Array of sensors

                                                                     Signals generated by
                                                                     the submarine due
                                                                     Mechanical movements
                                                                     of the submarine

           •    Estimate the target,
                target distance from
                the observed data                  •   Estimate     the    location   of   the
                                                       submarine.


Generally estimation includes parameter estimation and signal estimation.
We will discuss the problem of parameter estimation here.


We have a sequence of observable random variables X 1 , X 2 ,...., X N , represented by the
vector
                            ⎡ X1 ⎤
                            ⎢ ⎥
                              X
                          X=⎢ 2 ⎥
                            ⎢ ⎥
                            ⎢ ⎥
                            ⎢XN ⎥
                            ⎣ ⎦
 X is governed by a joint density junction which depends on some unobservable
parameter θ given by
f X ( x1 , x 2 ,..., x N | θ ) = f X ( x | θ )

where θ may be deterministic or random. Our aim is to make an inference on θ from an
observed sample of X 1 , X 2 ,...., X N .

An estimator θˆ(X) is a rule by which we guess about the value of an unknown θ on the
basis of X.
θˆ(X) is a random, being a function of random variables.
For a particular observation x1 , x2 ,...., xN , we get what is known as an estimate (not
estimator)
Let X 1 , X 2 ,...., X N be a sequence of independent and identically distributed (iid) random

variables with mean µ X and variance σ X .
                                       2


         1 N
µ=
ˆ          ∑ X i is an estimator for µ X .
         N i =1
          1 N
σX =
ˆ2          ∑ (Y1 − µ X ) is an estimator for
                    ˆ 2                                σX.
                                                        2
          N i =1
An estimator is a function of the random sequence X 1 , X 2 ,...., X N and if it does not
involve any unknown parameters. Such a function is generally called a statistic.



4.2 Properties of the Estimator
A good estimator should satisfy some properties. These properties are described in terms
of the mean and variance of the estimator.



4.3 Unbiased estimator
An estimator θˆ of θ is said to be unbiased if and only if Eθˆ = θ .
The quantity Eθˆ − θ is called the bias of the estimator.
Unbiased ness is necessary but not sufficient to make an estimator a good one.
                                 N
                             1
Consider            σ 12 =
                     ˆ
                             N
                                 ∑(X
                                 i =1
                                        i   − µ X )2
                                              ˆ

                               1 N
and                 σ 22 =
                     ˆ              ∑ ( X i − µ X )2
                              N − 1 i =1
                                              ˆ

for an iid random sequence X 1 , X 2 ,...., X N .

We can show that σ 2 is an unbiased estimator.
                  ˆ2
  N
E ∑ ( X i − µ X )2 = E ∑ ( X i − µ X + µ X − µ X )2
            ˆ                                ˆ
  i =1


            = E ∑ {( X i − µ X ) 2 + ( µ X − µ X ) 2 + 2( X i − µ X )( µ X − µ X )}
                                             ˆ                               ˆ

Now E ( X i − µ X ) 2 = σ 2
                                                  2
                               ⎛       ∑ Xi ⎞
and E (µ X − µ X )          = E⎜ µ X −
                        2
             ˆ                              ⎟
                               ⎝        N ⎠
E
             =  2
                  ( N µ X − ∑ X i )2
              N
               E
             = 2 (∑( X i − µ X )) 2
              N
               E
             = 2 ∑( X i − µ X ) 2 + ∑ ∑ E ( X i − µ X )( X j − µ X )
              N                      i j ≠i

               E
             = 2 ∑( X i − µ X ) 2 (because of independence)
              N
                 σX
                  2
             =
                   N

also E ( X i − µ X )( µ X − µ X ) = − E ( X i − µ X ) 2
                            ˆ
      N
∴ E ∑ ( X i − µ X ) 2 = Nσ 2 + σ 2 − 2σ 2 = ( N − 1)σ 2
              ˆ
      i =1


                      1
So Eσ 2 =
     ˆ2                   E ∑( X i − µ X ) 2 = σ 2
                                     ˆ
                     N −1
∴σ 2 is an unbiased estimator of σ .2
  ˆ2
Similarly sample mean is an unbiased estimator.
                 N
             1
µX =
ˆ
             N
                 ∑X
                 i =1
                            i



             1          N
                                    Nµ X
Eµ X =
 ˆ
             N
                   ∑ E{ X } =
                     i =1
                                i
                                     N
                                         = µX




4.4 Variance of the estimator

The variance of the estimator θˆ is given by
     var(θˆ) = E (θˆ − E (θˆ)) 2
For the unbiased case
       var(θˆ) = E (θˆ − θ ) 2
The variance of the estimator should be or low as possible.
An unbiased estimator θˆ is called a minimum variance unbiased estimator (MVUE) if
E (θˆ − θ ) 2 ≤ E (θˆ′ − θ ) 2

where θˆ′ is any other unbiased estimator.
4.5 Mean square error of the estimator
MSE = E (θˆ − θ ) 2
MSE should be as as small as possible. Out of all unbiased estimator, the MVUE has the
minimum mean square error.
MSE is related to the bias and variance as shown below.

MSE = E (θˆ − θ ) 2 = E (θˆ − Eθˆ + Eθˆ − θ ) 2
     = E (θˆ − Eθˆ) 2 + E ( Eθˆ − θ ) 2 + 2 E (θˆ − Eθˆ)( Eθˆ − θ )
       = E (θˆ − Eθˆ) 2 + E ( Eθˆ − θ ) 2 + 2( Eθˆ − Eθˆ)( Eθˆ − θ )
       = var(θˆ) + b 2 (θˆ) + 0 ( why ?)


         MSE = var(θˆ) + b 2 (θ )
                               ˆ
So



4.6 Consistent Estimators
As we have more data, the quality of estimation should be better.
This idea is used in defining the consistent estimator.

An estimator        θˆ   is called a consistent estimator of θ if θˆ converges in probability to θ.

lim   N →∞ P   ( θˆ-θ ≥ ε ) = 0 for any ε > 0
Less rigorous test is obtained by applying the Markov Inequality

                                              MSE

  (
P θˆ-θ ≥ ε      )   ≤
                         E (θˆ-θ ) 2
                             ε2

If θˆ is an unbiased estimator ( b(θˆ) = 0 ), then MSE = var(θˆ).
Therefore, if

 Lim E (θˆ−θ )2 =0 , then θˆ will be a consistent estimator.
     N →∞
Also note that
          MSE = var(θˆ) + b 2 (θˆ)

Therefore, if the estimator is asymptotically unbiased (i.e. b(θˆ) → 0 as N → ∞) and

var(θˆ) → 0 as N → ∞, then MSE → 0.

∴ Therefore for an asymptotically unbiased estimator θˆ, if var(θˆ) → 0, as N → ∞, then

θˆ will be a consistent estimator.
Example 1:              X 1 , X 2 ,..., X N is an iid random sequence with unknown µ X and known

variance σ X .
           2



                 1 N
Let µ X =
    ˆ              ∑ Xi be an estimator for µ X . We have already shown that µ X is unbiased.
                                                                             ˆ
                 N i=1

                        σX
                         2
Also var( µ X ) =
          ˆ                       Is it a consistent estimator?
                          N

                                   σ2
Clearly        lim var( µ X ) = lim X = 0. Therefore µ X is a consistent estimator of µ X .
                        ˆ                            ˆ
              N →∞             N →∞ N



4.7 Sufficient Statistic
The observations X 1 , X 2 .... X N contain information about the unknown parameter θ . An
estimator should carry the same information about θ as the observed data. This concept
of sufficient statistic is based on this idea.
                       ˆ
A measurable function θ ( X 1 , X 2 .... X N ) is called a sufficient statistic of θ if it

contains the              same information about                         θ   as contained in the random sequence
X 1 , X 2 .... X N . In other word the joint conditional density f X                                   ˆ                           ( x1 , x2 ,... x N )
                                                                                       1 , X 2 .. X N |θ ( X 1 , X 2 ... X N   )


does not involve θ .
There are a large number of sufficient statistics for a particular criterion. One has to select
a sufficient statistic which has good estimation properties.


A way to check whether a statistic is sufficient or not is through the Factorization
theorem which states:
θˆ( X 1 , X 2 .... X N ) is a sufficient statistic of θ if

f X1 , X 2 .. X N /θ ( x1 , x2 ,...xN ) = g (θ ,θˆ)h( x1 , x2 ,...xN )

where g (θ , θˆ ) is a non-constant and nonnegative function of                                 θ and θˆ and
h( x1 , x2 ,...x N ) does not involve θ and is a nonnegative function of x1 , x2 ,...xN .


Example 2:              Suppose X 1 , X 2 ,..., X N is an iid Gaussian sequence with unknown mean

µ X and known variance 1.

                   1 N
Then µ X =
     ˆ               ∑ Xi is a sufficient statistic of .
                   N i=1
(       )
                                                                              1        2
                                                         N     1          −     x −µ X
                                                                              2 i
              f X1 , X 2 .. X N / µ X ( x1 , x2 ,...xN ) = ∏      e
                                                        i =1   2π

                                                                              (       )
                                                                      1N         2
                                                     1            −     ∑ x −µ X
                                                                      2 i =1 i
                                                =          e
                                                  ( 2π ) N
                                                                              (               )
                                                                      1N              2
                                                     1            −     ∑ x −µ + µ −µ
                                                                               ˆ ˆ
                                                                      2 i =1 i
Because                                         =          e
                                                  ( 2π ) N
                                                       1          −
                                                                      1N
                                                                              ( ˆ 2 ˆ              2
                                                                        ∑ ( x − µ X ) + ( µ − µ X ) + 2( xi − µ )( µ − µ X )
                                                                      2 i =1 i
                                                                                                              ˆ ˆ                 )
                                                =            e
                                                    ( 2π ) N
                                                     1                −
                                                                          1N
                                                                                   ˆ 2
                                                                            ∑ ( x −µ X )
                                                                          2 i =1 i
                                                                                                  −
                                                                                                      1N
                                                                                                          (
                                                                                                        ∑ ( µ −µ X )
                                                                                                             ˆ       2
                                                                                                                         )
                                                =          e                                  e       2 i =1
                                                                                                                             e0 ( why ?)
                                                  ( 2π ) N

The first exponential is a function of x1 , x2 ,...x N and the second exponential is a

function of µ X and µ X . Therefore µ X is a sufficient statistics of µ X .
                    ˆ               ˆ




4.8 Cramer Rao theorem
We described about the Minimum Variance Unbiased Estimator (MVUE) which is a very
good estimator
θˆ is an MVUE if

        E (θˆ ) = θ

and     Var (θˆ) ≤ Var (θˆ′)
where   ˆ
        θ′    is any other unbiased estimator of θ .
Can we reduce the variance of an unbiased estimator indefinitely? The answer is given by
the Cramer Rao theorem.
Suppose θˆ is an unbiased estimator of random sequence. Let us denote the sequence by
the vector
               ⎡X1 ⎤
               ⎢X ⎥
             X=⎢ 2⎥
               ⎢ ⎥
               ⎢ ⎥
               ⎣X N ⎦
Let f X ( x1 ,......, x N / θ ) be the joint density function which characterises X. This function

is also called likelihood function. θ may also be random. In that case likelihood function
will represent conditional joint density function.
L( x / θ ) = ln f X ( x1 ...... x N / θ ) is called log likelihood function.
4.9 Statement of the Cramer Rao theorem

If θˆ is an unbiased estimator of θ , then
                       1
         Var (θˆ) ≥
                    I (θ )
                               ∂L 2
where I (θ ) = E (                ) and I (θ ) is a measure of average information in the random
                               ∂θ
sequence and is called Fisher information statistic.
                                                                   ∂L
The equality of CR bound holds if                                     = c(θˆ − θ ) where c is a constant.
                                                                   ∂θ
Proof: θˆ is an unbiased estimator of θ
             ∴ E (θˆ − θ ) = 0 .
                 ∞
              ˆ
        ⇒ ∫ (θ − θ ) f X (x / θ )dx = 0.
                 −∞

Differentiate with respect to θ , we get
             ∞
                  ∂
          −∞
             ∫ ∂θ {(θˆ − θ ) f            X   ( x / θ )}dx = 0.

                 (Since line of integration are not function of θ.)
             ∞                                        ∞
                               ∂
        = ∫       (θˆ − θ )      f (x / θ )}dy −      ∫    f X (x / θ )dx = 0.
                              ∂θ X
             −∞                                       −∞

             ∞                                                 ∞
                                ∂
        ∴    ∫    (θˆ − θ )       f (x / θ )}dy =              ∫   f X (x / θ )dx = 1.                 (1)
                               ∂θ X
             −∞                                             −∞

              ∂                 ∂
Note that       f X (x / θ ) =    {ln f X ( x / θ )} f X ( x / θ )
             ∂θ                ∂θ
                                          ∂L
                                      = ( ∂θ ) f X (x / θ )

    Therefore, from (1)
         ∞
                                 ∂
         ∫ (θˆ − θ ){∂θ L(x / θ )} f
        −∞
                                                           X   (x / θ )}dx = 1.

    So that
                                                                                         2
⎧∞ ˆ                       ∂                             ⎫
⎨ ∫ (θ − θ ) f X (x / θ )    L(x / θ ) f X (x / θ )dx dx ⎬ = 1 .                                     (2)
⎩ −∞                      ∂θ                             ⎭
since f X (x / θ ) is ≥ 0.
Recall the Cauchy Schawarz Ineaquality
                          2           2           2
         < a, b > < a                         b
where the equality holds when a = cb ( where c is any scalar ).
Applying this inequality to the L.H.S. of equation (2) we get
                                                                                  2
             ⎛∞ ˆ                       ∂                             ⎞
             ⎜ ∫ (θ − θ ) f X (x / θ )    L(x / θ ) f X (x / θ )dx dx ⎟
             ⎝ −∞                      ∂θ                             ⎠
                  ∞                               ∞                      2
                                                     ⎛ ∂              ⎞
             ≤ ∫ (θˆ-θ ) 2 f X (x / θ ) dx         ∫ ⎜ ( ∂θ L(x / θ ) ⎟ f X (x / θ ) dx
                  −∞                              -∞ ⎝                ⎠
             = var(θˆ ) I(θ )

            ∴ L.H .S ≤ var(θˆ ) I(θ )
 But R.H.S. = 1
            var(θˆ) I(θ ) ≥ 1.
                         1
          ∴ var(θˆ) ≥        ,
                      I (θ )
which is the Cramer Rao Inequality.
             The equality will hold when
                  ∂
                    {L( x / θ ) f X ( x / θ )} = c(θˆ − θ ) f X ( x / θ ) ,
                 ∂θ
so that
                              ∂L(x / θ )
                                         = c (θˆ-θ )
                                ∂θ

                         ∞
Also from                 ∫ f X (x / θ )dx = 1,   we get
                         −∞

∞
      ∂
∫ ∂θ
−∞
           f X (x / θ )dx = 0

     ∞
        ∂L
∴∫         f (x / θ )dx = 0
     −∞
        ∂θ X
Taking the partial derivative with respect to θ again, we get
∞  ⎧ ∂2L              ∂L ∂               ⎫
 ∫ ⎨ 2 f X (x / θ ) +       f X (x / θ ) ⎬ dx = 0
−∞ ⎩ ∂θ               ∂θ ∂θ              ⎭
    ⎧ ∂2L
    ⎪∞
                       ⎛ ∂L ⎞
                              2
                                              ⎫
                                              ⎪
∴ ∫ ⎨ 2 f X (x / θ ) + ⎜    ⎟   f X ( x / θ ) ⎬ dx = 0
 −∞ ⎪ ∂θ               ⎝ ∂θ ⎠                 ⎪
    ⎩                                         ⎭
             2
 ⎛ ∂L ⎞    ∂ 2L
E⎜    ⎟ =-E 2
 ⎝ ∂θ ⎠    ∂θ
If   θˆ   satisfies CR -bound with equality, then              θˆ   is called an efficient estimator.
Remark:
(1) If the information                I (θ )   is more, the variance of the estimator θ ˆ will be less.
(2) Suppose X 1 ............. X N are iid. Then
                                                   2
               ⎛ ∂                   ⎞
   I1 (θ ) = E ⎜    ln( f X /θ ( x)) ⎟
               ⎝ ∂θ        1
                                     ⎠
                                                                                2
                 ⎛ ∂                                         ⎞
  ∴ I N (θ ) = E ⎜      ln( f X , X .. X / θ ( x1 x2 ..xN )) ⎟
                 ⎝ ∂θ          1 2      N
                                                             ⎠
             = NI1 (θ )


Example 3:
Let X 1 ............. X N         are iid Gaussian random sequence with known variance σ 2 and

unknown mean µ .
                           N
                       1
Suppose µ =
        ˆ
                       N
                           ∑X
                           i =1
                                  i    which is unbiased.

Find CR bound and hence show that µ is an efficient estimator.
                                  ˆ
Likelihood function
 f X ( x1 , x 2 ,..... x N / θ ) will be product of individual densities (since iid)

                                                                                 1 N ( x − µ )2
                                                        1
                                                                            −         ∑
         ∴ f X ( x1 , x 2 ,..... x N / θ ) =                            e       2σ 2 i =1 i
                                                  ( ( 2π ) N σ N
                                                                 N
                                                         1
so that L( X / µ ) = −ln( 2π ) N σ N −
                                                       2σ    2   ∑(x
                                                                 i =1
                                                                            i   − µ )2

         ∂L        1      N
Now         = 0 - 2 ( -2) ∑ (X i − µ )
         ∂µ      2σ
                 /       i =1


               ∂2L     N
           ∴        = - 2
               ∂µ 2
                       σ
                ∂2L     N
  So that E          =- 2
                ∂µ 2
                       σ
                              1           1     1 σ2
∴ CR Bound =                      =           = N =
                           I (θ )         ∂2L       N
                                       - E 2 σ2
                                          ∂µ

∂L   1            N
                                          N ⎛ Xi     ⎞
   =
∂θ 2σ 2
               ∑ (X
                i =1
                           i   − µ) =      2 ∑
                                            ⎜
                                          σ ⎝ i N
                                                  - µ⎟
                                                     ⎠
              N
          =           (µ - µ )
                       ˆ
              σ2
∂L       ˆ
Hence -      = c (θ - θ )
          ∂θ
and µ is an efficient estimator.
    ˆ



4.10 Criteria for Estimation
The estimation of a parameter is based on several well-known criteria. Each of the criteria
tries to optimize some functions of the observed samples with respect to the unknown
parameter to be estimated. Some of the most popular estimation criteria are:
•      Maximum Likelihood
•      Minimum Mean Square Error.
•      Baye’s Method.
•      Maximum Entropy Method.



4.11 Maximum Likelihood Estimator (MLE)
Given a random sequence                             X 1 ................. X N      and the joint density function

f X1................. X N /θ ( x1 , x2 ...xN ) which depends on an unknown nonrandom parameter θ .

f X ( x 1 , x 2 , ........... x N / θ ) is called the likelihood function (for continuous function ….,
for discrete it will be joint probability mass function).
L( x / θ ) = ln f X ( x1 , x 2 ,........., x N / θ ) is called log likelihood function.
                                  ˆ
The maximum likelihood estimator θ MLE is such an estimator that

    f X ( x1 , x2 ........., xN / θˆMLE ) ≥ f X ( x1, x2 ,........, xN / θ ), ∀θ
                                                              ˆ
If the likelihood function is differentiable w.r.t. θ , then θ MLE is given by

        ∂
          f X ( x1 , … x N /θ ) θ = 0
                                ˆ
       ∂θ                         MLE



       ∂L(x | θ )
or                    ˆ       =0
         ∂θ           θ MLE


Thus the MLE is given by the solution of the likelihood equation given above.
                                                       ⎡θ1 ⎤
                                                       ⎢θ ⎥
If we have a number of unknown parameters given by θ = ⎢ ⎥
                                                          2

                                                       ⎢ ⎥
                                                       ⎢ ⎥
                                                       ⎣θ N ⎦
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes
Ssp notes

More Related Content

What's hot

DFT and its properties
DFT and its propertiesDFT and its properties
DFT and its propertiesssuser2797e4
 
Dsp U Lec08 Fir Filter Design
Dsp U   Lec08 Fir Filter DesignDsp U   Lec08 Fir Filter Design
Dsp U Lec08 Fir Filter Designtaha25
 
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsDSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsAmr E. Mohamed
 
CI19. Presentación 6.Fading and diversity (simplificada)
CI19. Presentación 6.Fading and diversity (simplificada)CI19. Presentación 6.Fading and diversity (simplificada)
CI19. Presentación 6.Fading and diversity (simplificada)Francisco Sandoval
 
RF Module Design - [Chapter 7] Voltage-Controlled Oscillator
RF Module Design - [Chapter 7] Voltage-Controlled OscillatorRF Module Design - [Chapter 7] Voltage-Controlled Oscillator
RF Module Design - [Chapter 7] Voltage-Controlled OscillatorSimen Li
 
Introduction to communication system lecture3
Introduction to communication system lecture3Introduction to communication system lecture3
Introduction to communication system lecture3Jumaan Ally Mohamed
 
Sampling Theorem, Quantization Noise and its types, PCM, Channel Capacity, Ny...
Sampling Theorem, Quantization Noise and its types, PCM, Channel Capacity, Ny...Sampling Theorem, Quantization Noise and its types, PCM, Channel Capacity, Ny...
Sampling Theorem, Quantization Noise and its types, PCM, Channel Capacity, Ny...Waqas Afzal
 
Digital modulation techniques
Digital modulation techniquesDigital modulation techniques
Digital modulation techniquesSakshi Verma
 
Digital modulation techniqes (Phase-shift keying (PSK))
Digital modulation techniqes (Phase-shift keying (PSK))Digital modulation techniqes (Phase-shift keying (PSK))
Digital modulation techniqes (Phase-shift keying (PSK))Mohamed Sewailam
 
Discrete Time Fourier Transform
Discrete Time Fourier TransformDiscrete Time Fourier Transform
Discrete Time Fourier TransformWaqas Afzal
 
DSP Lab Manual (10ECL57) - VTU Syllabus (KSSEM)
DSP Lab Manual (10ECL57) - VTU Syllabus (KSSEM)DSP Lab Manual (10ECL57) - VTU Syllabus (KSSEM)
DSP Lab Manual (10ECL57) - VTU Syllabus (KSSEM)Ravikiran A
 
Convolution linear and circular using z transform day 5
Convolution   linear and circular using z transform day 5Convolution   linear and circular using z transform day 5
Convolution linear and circular using z transform day 5vijayanand Kandaswamy
 
Orthogonal sets and basis
Orthogonal sets and basisOrthogonal sets and basis
Orthogonal sets and basisPrasanth George
 
Circular Convolution
Circular ConvolutionCircular Convolution
Circular ConvolutionSarang Joshi
 

What's hot (20)

DFT and its properties
DFT and its propertiesDFT and its properties
DFT and its properties
 
Dsp U Lec08 Fir Filter Design
Dsp U   Lec08 Fir Filter DesignDsp U   Lec08 Fir Filter Design
Dsp U Lec08 Fir Filter Design
 
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsDSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
 
CI19. Presentación 6.Fading and diversity (simplificada)
CI19. Presentación 6.Fading and diversity (simplificada)CI19. Presentación 6.Fading and diversity (simplificada)
CI19. Presentación 6.Fading and diversity (simplificada)
 
RF Module Design - [Chapter 7] Voltage-Controlled Oscillator
RF Module Design - [Chapter 7] Voltage-Controlled OscillatorRF Module Design - [Chapter 7] Voltage-Controlled Oscillator
RF Module Design - [Chapter 7] Voltage-Controlled Oscillator
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Amplitude modulation
Amplitude modulationAmplitude modulation
Amplitude modulation
 
Introduction to communication system lecture3
Introduction to communication system lecture3Introduction to communication system lecture3
Introduction to communication system lecture3
 
Eigen value and vectors
Eigen value and vectorsEigen value and vectors
Eigen value and vectors
 
Sampling Theorem, Quantization Noise and its types, PCM, Channel Capacity, Ny...
Sampling Theorem, Quantization Noise and its types, PCM, Channel Capacity, Ny...Sampling Theorem, Quantization Noise and its types, PCM, Channel Capacity, Ny...
Sampling Theorem, Quantization Noise and its types, PCM, Channel Capacity, Ny...
 
Digital modulation techniques
Digital modulation techniquesDigital modulation techniques
Digital modulation techniques
 
Digital modulation techniqes (Phase-shift keying (PSK))
Digital modulation techniqes (Phase-shift keying (PSK))Digital modulation techniqes (Phase-shift keying (PSK))
Digital modulation techniqes (Phase-shift keying (PSK))
 
Discrete Time Fourier Transform
Discrete Time Fourier TransformDiscrete Time Fourier Transform
Discrete Time Fourier Transform
 
DSP Lab Manual (10ECL57) - VTU Syllabus (KSSEM)
DSP Lab Manual (10ECL57) - VTU Syllabus (KSSEM)DSP Lab Manual (10ECL57) - VTU Syllabus (KSSEM)
DSP Lab Manual (10ECL57) - VTU Syllabus (KSSEM)
 
Fft
FftFft
Fft
 
Quantization
QuantizationQuantization
Quantization
 
Convolution linear and circular using z transform day 5
Convolution   linear and circular using z transform day 5Convolution   linear and circular using z transform day 5
Convolution linear and circular using z transform day 5
 
Orthogonal sets and basis
Orthogonal sets and basisOrthogonal sets and basis
Orthogonal sets and basis
 
Circular Convolution
Circular ConvolutionCircular Convolution
Circular Convolution
 
Lyapunov stability
Lyapunov stability Lyapunov stability
Lyapunov stability
 

Viewers also liked (19)

Quality stories feb
Quality stories   febQuality stories   feb
Quality stories feb
 
Pa tinatge sobre gel
Pa tinatge sobre gelPa tinatge sobre gel
Pa tinatge sobre gel
 
Презентация по европейской интеграции
Презентация по европейской интеграцииПрезентация по европейской интеграции
Презентация по европейской интеграции
 
P1111444352
P1111444352P1111444352
P1111444352
 
Market based cash balance plans
Market based cash balance plansMarket based cash balance plans
Market based cash balance plans
 
Quality story august
Quality story   augustQuality story   august
Quality story august
 
Gasb 16, 27, 45, & 47 masbo
Gasb 16, 27, 45, & 47 masboGasb 16, 27, 45, & 47 masbo
Gasb 16, 27, 45, & 47 masbo
 
Can zam
Can zamCan zam
Can zam
 
Animals de granja amb so
Animals de granja amb soAnimals de granja amb so
Animals de granja amb so
 
The element of art
The element of artThe element of art
The element of art
 
1.3 skl ki kd rev
1.3 skl ki kd rev1.3 skl ki kd rev
1.3 skl ki kd rev
 
"PHP from soup to nuts" -- lab exercises
"PHP from soup to nuts" -- lab exercises"PHP from soup to nuts" -- lab exercises
"PHP from soup to nuts" -- lab exercises
 
Service Gear April 27, 2010
Service Gear April 27, 2010Service Gear April 27, 2010
Service Gear April 27, 2010
 
Major_Proj_Ramya
Major_Proj_RamyaMajor_Proj_Ramya
Major_Proj_Ramya
 
Running YUI 3 on Node.js - JSConf 2010
Running YUI 3 on Node.js - JSConf 2010Running YUI 3 on Node.js - JSConf 2010
Running YUI 3 on Node.js - JSConf 2010
 
Acorde
AcordeAcorde
Acorde
 
7 elements of art, summary
7 elements of art, summary7 elements of art, summary
7 elements of art, summary
 
Industrial Revolution 7th Grade
Industrial Revolution 7th Grade Industrial Revolution 7th Grade
Industrial Revolution 7th Grade
 
Brunel Engineering
Brunel EngineeringBrunel Engineering
Brunel Engineering
 

Similar to Ssp notes

Intro probability 2
Intro probability 2Intro probability 2
Intro probability 2Phong Vo
 
Probability distribution
Probability distributionProbability distribution
Probability distributionManoj Bhambu
 
Module ii sp
Module ii spModule ii sp
Module ii spVijaya79
 
Quantitative Techniques random variables
Quantitative Techniques random variablesQuantitative Techniques random variables
Quantitative Techniques random variablesRohan Bhatkar
 
Intro probability 4
Intro probability 4Intro probability 4
Intro probability 4Phong Vo
 
02-Random Variables.ppt
02-Random Variables.ppt02-Random Variables.ppt
02-Random Variables.pptAkliluAyele3
 
Chapter 4 part3- Means and Variances of Random Variables
Chapter 4 part3- Means and Variances of Random VariablesChapter 4 part3- Means and Variances of Random Variables
Chapter 4 part3- Means and Variances of Random Variablesnszakir
 
Qt random variables notes
Qt random variables notesQt random variables notes
Qt random variables notesRohan Bhatkar
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and StatisticsMalik Sb
 
Intro probability 3
Intro probability 3Intro probability 3
Intro probability 3Phong Vo
 
Varian, microeconomic analysis, solution book
Varian, microeconomic analysis, solution bookVarian, microeconomic analysis, solution book
Varian, microeconomic analysis, solution bookJosé Antonio PAYANO YALE
 
Expectation of Discrete Random Variable.ppt
Expectation of Discrete Random Variable.pptExpectation of Discrete Random Variable.ppt
Expectation of Discrete Random Variable.pptAlyasarJabbarli
 
Random Variables and Distributions
Random Variables and DistributionsRandom Variables and Distributions
Random Variables and Distributionshussein zayed
 
Statistics lecture 6 (ch5)
Statistics lecture 6 (ch5)Statistics lecture 6 (ch5)
Statistics lecture 6 (ch5)jillmitchell8778
 
Interpolation techniques - Background and implementation
Interpolation techniques - Background and implementationInterpolation techniques - Background and implementation
Interpolation techniques - Background and implementationQuasar Chunawala
 

Similar to Ssp notes (20)

Intro probability 2
Intro probability 2Intro probability 2
Intro probability 2
 
Probability distribution
Probability distributionProbability distribution
Probability distribution
 
Module ii sp
Module ii spModule ii sp
Module ii sp
 
Quantitative Techniques random variables
Quantitative Techniques random variablesQuantitative Techniques random variables
Quantitative Techniques random variables
 
Intro probability 4
Intro probability 4Intro probability 4
Intro probability 4
 
02-Random Variables.ppt
02-Random Variables.ppt02-Random Variables.ppt
02-Random Variables.ppt
 
Chapter 4 part3- Means and Variances of Random Variables
Chapter 4 part3- Means and Variances of Random VariablesChapter 4 part3- Means and Variances of Random Variables
Chapter 4 part3- Means and Variances of Random Variables
 
Qt random variables notes
Qt random variables notesQt random variables notes
Qt random variables notes
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statistics
 
Intro probability 3
Intro probability 3Intro probability 3
Intro probability 3
 
Varian, microeconomic analysis, solution book
Varian, microeconomic analysis, solution bookVarian, microeconomic analysis, solution book
Varian, microeconomic analysis, solution book
 
Expectation of Discrete Random Variable.ppt
Expectation of Discrete Random Variable.pptExpectation of Discrete Random Variable.ppt
Expectation of Discrete Random Variable.ppt
 
Section3 stochastic
Section3 stochasticSection3 stochastic
Section3 stochastic
 
Random Variables and Distributions
Random Variables and DistributionsRandom Variables and Distributions
Random Variables and Distributions
 
Chapter7
Chapter7Chapter7
Chapter7
 
Statistics lecture 6 (ch5)
Statistics lecture 6 (ch5)Statistics lecture 6 (ch5)
Statistics lecture 6 (ch5)
 
Stats chapter 7
Stats chapter 7Stats chapter 7
Stats chapter 7
 
Interpolation techniques - Background and implementation
Interpolation techniques - Background and implementationInterpolation techniques - Background and implementation
Interpolation techniques - Background and implementation
 
patel
patelpatel
patel
 
functions review
functions reviewfunctions review
functions review
 

Recently uploaded

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 

Recently uploaded (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 

Ssp notes

  • 1. SECTION – I REVIEW OF RANDOM VARIABLES & RANDOM PROCESS 1
  • 2. REVIEW OF RANDOM VARIABLES..............................................................................3 1.1 Introduction..............................................................................................................3 1.2 Discrete and Continuous Random Variables ........................................................4 1.3 Probability Distribution Function ..........................................................................4 1.4 Probability Density Function .................................................................................5 1.5 Joint random variable .............................................................................................6 1.6 Marginal density functions......................................................................................6 1.7 Conditional density function...................................................................................7 1.8 Baye’s Rule for mixed random variables...............................................................8 1.9 Independent Random Variable ..............................................................................9 1.10 Moments of Random Variables ..........................................................................10 1.11 Uncorrelated random variables..........................................................................11 1.12 Linear prediction of Y from X ..........................................................................11 1.13 Vector space Interpretation of Random Variables...........................................12 1.14 Linear Independence ...........................................................................................12 1.15 Statistical Independence......................................................................................12 1.16 Inner Product .......................................................................................................12 1.17 Schwary Inequality ..............................................................................................13 1.18 Orthogonal Random Variables...........................................................................13 1.19 Orthogonality Principle.......................................................................................14 1.20 Chebysev Inequality.............................................................................................15 1.21 Markov Inequality ...............................................................................................15 1.22 Convergence of a sequence of random variables ..............................................16 1.23 Almost sure (a.s.) convergence or convergence with probability 1 .................16 1.24 Convergence in mean square sense ....................................................................17 1.25 Convergence in probability.................................................................................17 1.26 Convergence in distribution................................................................................18 1.27 Central Limit Theorem .......................................................................................18 1.28 Jointly Gaussian Random variables...................................................................19 2
  • 3. REVIEW OF RANDOM VARIABLES 1.1 Introduction • Mathematically a random variable is neither random nor a variable • It is a mapping from sample space into the real-line ( “real-valued” random variable) or the complex plane ( “complex-valued ” random variable) . Suppose we have a probability space {S , ℑ, P} . Let X : S → ℜ be a function mapping the sample space S into the real line such that For each s ∈ S , there exists a unique X ( s ) ∈ ℜ. Then X is called a random variable. Thus a random variable associates the points in the sample space with real numbers. ℜ Notations: X (s) • Random variables are represented by s • upper-case letters. • Values of a random variable are denoted by lower case letters S • Y = y means that y is the value of a random variable X . Figure Random Variable Example 1: Consider the example of tossing a fair coin twice. The sample space is S= { HH,HT,TH,TT} and all four outcomes are equally likely. Then we can define a random variable X as follows Sample Point Value of the random P{ X = x} Variable X = x HH 0 1 4 HT 1 1 4 TH 2 1 4 TT 3 1 4 3
  • 4. Example 2: Consider the sample space associated with the single toss of a fair die. The sample space is given by S = {1, 2,3, 4,5,6} . If we define the random variable X that associates a real number equal to the number in the face of the die, then X = {1, 2,3, 4,5,6} 1.2 Discrete and Continuous Random Variables • A random variable X is called discrete if there exists a countable sequence of distinct real number xi such that ∑ P ( x ) = 1. i m i Pm ( xi ) is called the probability mass function. The random variable defined in Example 1 is a discrete random variable. • A continuous random variable X can take any value from a continuous interval • A random variable may also de mixed type. In this case the RV takes continuous values, but at each finite number of points there is a finite probability. 1.3 Probability Distribution Function We can define an event { X ≤ x} = {s / X ( s ) ≤ x, s ∈ S} The probability FX ( x) = P{ X ≤ x} is called the probability distribution function. Given FX ( x), we can determine the probability of any event involving values of the random variable X . • FX (x) is a non-decreasing function of X . • FX (x) is right continuous => FX (x) approaches to its value from right. • FX (−∞) = 0 • FX (∞) = 1 • P{x1 < X ≤ x} = FX ( x) − FX ( x1 ) 4
  • 5. Example 3: Consider the random variable defined in Example 1. The distribution function FX ( x) is as given below: Value of the random Variable X = x FX ( x) x<0 0 0 ≤ x <1 1 4 1≤ x < 2 1 2 2≤ x<3 3 4 x≥3 1 FX ( x) X =x 1.4 Probability Density Function d If FX (x) is differentiable f X ( x) = FX ( x) is called the probability density function and dx has the following properties. • f X (x) is a non- negative function ∞ • ∫f −∞ X ( x)dx = 1 x2 • P ( x1 < X ≤ x 2 ) = ∫f − x1 X ( x)dx Remark: Using the Dirac delta function we can define the density function for a discrete random variables. 5
  • 6. 1.5 Joint random variable X and Y are two random variables defined on the same sample space S. P{ X ≤ x, Y ≤ y} is called the joint distribution function and denoted by FX ,Y ( x, y ). Given FX ,Y ( x, y ), -∞ < x < ∞, -∞ < y < ∞, we have a complete description of the random variables X and Y . • P{0 < X ≤ x, 0 < Y ≤ y} = FX ,Y ( x, y ) − FX ,Y ( x,0) − FX ,Y (0, y ) + FX ,Y ( x, y ) • FX ( x) = FXY ( x,+∞). To prove this ( X ≤ x ) = ( X ≤ x ) ∩ ( Y ≤ +∞ ) ∴ F X ( x ) = P (X ≤ x ) = P (X ≤ x , Y ≤ ∞ )= F XY ( x , +∞ ) FX ( x) = P( X ≤ x ) = P( X ≤ x, Y ≤ ∞ ) = FXY ( x,+∞) Similarly FY ( y ) = FXY (∞, y ). • Given FX ,Y ( x, y ), -∞ < x < ∞, -∞ < y < ∞, each of FX ( x) and FY ( y ) is called a marginal distribution function. We can define joint probability density function f X ,Y ( x, y ) of the random variables X and Y by ∂2 f X ,Y ( x, y ) = FX ,Y ( x, y ) , provided it exists ∂x∂y • f X ,Y ( x, y ) is always a positive quantity. x y • FX ,Y ( x, y ) = ∫∫ − ∞− ∞ f X ,Y ( x, y )dxdy 1.6 Marginal density functions fX (x) = d dx FX (x) = d dx FX ( x,∞ ) x ∞ = d dx ∫ ( ∫ f X ,Y ( x , y ) d y ) d x −∞ −∞ ∞ = ∫ f X ,Y ( x , y ) d y −∞ ∞ and fY ( y ) = ∫ f X ,Y ( x , y ) d x −∞ 6
  • 7. 1.7 Conditional density function f Y / X ( y / X = x) = f Y / X ( y / x) is called conditional density of Y given X . Let us define the conditional distribution function. We cannot define the conditional distribution function for the continuous random variables X and Y by the relation F ( y / x) = P(Y ≤ y / X = x) Y/X P(Y ≤ y, X = x) = P ( X = x) as both the numerator and the denominator are zero for the above expression. The conditional distribution function is defined in the limiting sense as follows: F ( y / x) = lim∆x →0 P (Y ≤ y / x < X ≤ x + ∆x) Y/X P (Y ≤ y , x < X ≤ x + ∆x) =lim∆x →0 P ( x < X ≤ x + ∆x) y ∫ f X ,Y ( x, u )∆xdu ∞ =lim∆x →0 f X ( x ) ∆x y ∫ f X ,Y ( x, u )du =∞ f X ( x) The conditional density is defined in the limiting sense as follows f Y / X ( y / X = x ) = lim ∆y →0 ( FY / X ( y + ∆y / X = x ) − FY / X ( y / X = x )) / ∆y = lim ∆y→0,∆x→0 ( FY / X ( y + ∆y / x < X ≤ x + ∆x ) − FY / X ( y / x < X ≤ x + ∆x )) / ∆y (1) Because ( X = x) = lim ∆x →0 ( x < X ≤ x + ∆x) The right hand side in equation (1) is lim ∆y →0,∆x→0 ( FY / X ( y + ∆y / x < X < x + ∆x) − FY / X ( y / x < X < x + ∆x)) / ∆y = lim ∆y →0, ∆x →0 ( P( y < Y ≤ y + ∆y / x < X ≤ x + ∆x)) / ∆y = lim ∆y →0, ∆x →0 ( P( y < Y ≤ y + ∆y, x < X ≤ x + ∆x)) / P ( x < X ≤ x + ∆x)∆y = lim ∆y →0, ∆x →0 f X ,Y ( x, y )∆x∆y / f X ( x)∆x∆y = f X ,Y ( x, y ) / f X ( x) ∴ fY / X ( x / y ) = f X ,Y ( x, y ) / f X ( x ) (2) Similarly, we have ∴ f X / Y ( x / y ) = f X ,Y ( x, y ) / fY ( y ) (3) 7
  • 8. From (2) and (3) we get Baye’s rule f X ,Y (x, y) ∴ f X /Y (x / y) = fY ( y) f (x) fY / X ( y / x) = X fY ( y) f (x, y) = ∞ X ,Y (4) ∫ f X ,Y (x, y)dx −∞ f ( y / x) f X (x) = ∞ Y/X ∫ f X (u) fY / X ( y / x)du −∞ Given the joint density function we can find out the conditional density function. Example 4: For random variables X and Y, the joint probability density function is given by 1 + xy f X ,Y ( x, y ) = x ≤ 1, y ≤ 1 4 = 0 otherwise Find the marginal density f X ( x), fY ( y ) and fY / X ( y / x). Are X and Y independent? 1 + xy 1 1 f X ( x) = ∫ −1 4 dy = 2 Similarly 1 fY ( y ) = -1 ≤ y ≤ 1 2 and f X ,Y ( x, y ) 1 + xy fY / X ( y / x ) = = , x ≤ 1, y ≤ 1 f X ( x) 4 = 0 otherwise ∴ X and Y are not independent 1.8 Baye’s Rule for mixed random variables Let X be a discrete random variable with probability mass function PX ( x) and Y be a continuous random variable. In practical problem we may have to estimate X from observed Y . Then 8
  • 9. PX /Y ( x / y ) = li m ∆y→ 0 PX /Y ( x / y < Y ≤ y + ∆ y )∞ PX ,Y (x, y < Y ≤ y + ∆y) = li m ∆y→ 0 PY ( y < Y ≤ y + ∆ y ) PX ( x ) fY / X ( y / x ) ∆ y = lim ∆y→ 0 fY ( y ) ∆ y PX ( x ) fY / X ( y / x ) = fY ( y ) PX ( x ) fY / X ( y / x ) == ∑ PX ( x ) fY / X ( y / x ) x Example 5: X Y + V X is a binary random variable with ⎧ 1 ⎪1 with probability 2 ⎪ X =⎨ ⎪ −1 with probability 1 ⎪ ⎩ 2 0 and variance σ .2 V is the Gaussian noise with mean Then PX ( x) fY / X ( y / x) PX / Y ( x = 1/ y ) = ∑ PX ( x) fY / X ( y / x) x − ( y −1) 2 / 2σ 2 e = − ( y −1)2 / 2σ 2 / 2σ 2 + (e − ( y +1) 2 e 1.9 Independent Random Variable Let X and Y be two random variables characterised by the joint density function FX ,Y ( x, y ) = P{ X ≤ x, Y ≤ y} and f X ,Y ( x, y ) = ∂2 ∂x∂y FX ,Y ( x, y ) Then X and Y are independent if f X / Y ( x / y ) = f X ( x) ∀x ∈ ℜ and equivalently f X ,Y ( x, y ) = f X ( x ) fY ( y ) , where f X (x) and fY ( y) are called the marginal density functions. 9
  • 10. 1.10 Moments of Random Variables • Expectation provides a description of the random variable in terms of a few parameters instead of specifying the entire distribution function or the density function • It is far easier to estimate the expectation of a R.V. from data than to estimate its distribution First Moment or mean The mean µ X of a random variable X is defined by µ X = EX = ∑ xi P( xi ) for a discrete random variable X ∞ = ∫ xf X ( x)dx for a continuous random variable X −∞ For any piecewise continuous function y = g ( x ) , the expectation of the R.V. −∞ Y = g ( X ) is given by EY = Eg ( X ) = ∫ g ( x) f −∞ x ( x )dx Second moment ∞ EX 2 = ∫ x 2 f X ( x)dx −∞ Variance ∞ σ X = ∫ ( x − µ X ) 2 f X ( x) dx 2 −∞ • Variance is a central moment and measure of dispersion of the random variable about the mean. • σ x is called the standard deviation. • For two random variables X and Y the joint expectation is defined as ∞ ∞ E ( XY ) = µ X ,Y = ∫ ∫ xyf X ,Y ( x, y )dxdy −∞ −∞ The correlation between random variables X and Y , measured by the covariance, is given by Cov( X , Y ) = σ XY = E ( X − µ X )(Y − µY ) = E ( XY -X µY - Y µ X + µ X µY ) = E ( XY ) -µ X µY E( X − µX )(Y − µY ) σ XY The ratio ρ= = E( X − µX )2 E(Y − µY )2 σ XσY is called the correlation coefficient. The correlation coefficient measures how much two random variables are similar. 10
  • 11. 1.11 Uncorrelated random variables Random variables X and Y are uncorrelated if covariance Cov ( X , Y ) = 0 Two random variables may be dependent, but still they may be uncorrelated. If there exists correlation between two random variables, one may be represented as a linear regression of the others. We will discuss this point in the next section. 1.12 Linear prediction of Y from X Y = aX + b ˆ Regression Prediction error Y − Yˆ Mean square prediction error E (Y − Y ) 2 = E (Y − aX − b) 2 ˆ For minimising the error will give optimal values of a and b. Corresponding to the optimal solutions for a and b, we have ∂ E (Y − aX − b) 2 = 0 ∂a ∂ E (Y − aX − b) 2 = 0 ∂b 1 Solving for a and b , Y − µY = 2 σ X ,Y ( x − µ X ) ˆ σX σ σ XY so that Y − µY = ρ X ,Y y ( x − µ X ) , where ρ X ,Y = ˆ is the correlation coefficient. σx σ XσY If ρ X ,Y = 0 then X and Y are uncorrelated. => Y − µ Y = 0 ˆ => Y = µ Y is the best prediction. ˆ Note that independence => Uncorrelatedness. But uncorrelated generally does not imply independence (except for jointly Gaussian random variables). Example 6: Y = X 2 and f X (x) is uniformly distributed between (1,-1). X and Y are dependent, but they are uncorrelated. Cov( X , Y ) = σ X = E ( X − µ X )(Y − µ Y ) Because = EXY = EX 3 = 0 = EXEY (∵ EX = 0) In fact for any zero- mean symmetric distribution of X, X and X 2 are uncorrelated. 11
  • 12. 1.13 Vector space Interpretation of Random Variables The set of all random variables defined on a sample space form a vector space with respect to addition and scalar multiplication. This is very easy to verify. 1.14 Linear Independence Consider the sequence of random variables X 1 , X 2 ,.... X N . If c1 X 1 + c2 X 2 + .... + cN X N = 0 implies that c1 = c2 = .... = cN = 0, then X 1 , X 2 ,.... X N . are linearly independent. 1.15 Statistical Independence X 1 , X 2 ,.... X N are statistically independent if f X1, X 2 ,.... X N ( x1, x2 ,.... xN ) = f X1 ( x1 ) f X 2 ( x2 ).... f X N ( xN ) Statistical independence in the case of zero mean random variables also implies linear independence 1.16 Inner Product If x and y are real vectors in a vector space V defined over the field , the inner product < x, y > is a scalar such that ∀x, y, z ∈ V and a ∈ 1. < x, y > = < y, x > 2 2. < x, x > = x ≥ 0 3. < x + y, z > = < x, z > + < y, z > 4. < ax, y > = a < x, y > In the case of RVs, inner product between X and Y is defined as ∞ ∞ < X, Y > = EXY = ∫ ∫ xy f X ,Y (x, y )dy dx. −∞ −∞ Magnitude / Norm of a vector x =< x, x > 2 So, for R.V. 2 ∞ X = EX 2 = ∫ x 2 f X (x)dx −∞ • The set of RVs along with the inner product defined through the joint expectation operation and the corresponding norm defines a Hilbert Space. 12
  • 13. 1.17 Schwary Inequality For any two vectors x and y belonging to a Hilbert space V | < x, y > | ≤ x y For RV X and Y E 2 ( XY ) ≤ EX 2 EY 2 Proof: Consider the random variable Z = aX + Y E (aX + Y ) 2 ≥ 0 . ⇒ a 2 EX 2 + EY 2 + 2aEXY ≥ 0 Non-negatively of the left-hand side => its minimum also must be nonnegative. For the minimum value, dEZ 2 EXY = 0 => a = − da EX 2 E 2 XY E 2 XY so the corresponding minimum is + EY 2 − 2 EX 2 EX 2 Minimum is nonnegative => E 2 XY EY 2 − ≥0 EX 2 => E 2 XY < EX 2 EY 2 Cov( X , Y ) E ( X − µ X )(Y − µY ) ρ ( X ,Y ) = = σ Xxσ X E ( X − µ X ) 2 E (Y − µY ) 2 From schwarz inequality ρ ( X ,Y ) ≤ 1 1.18 Orthogonal Random Variables Recall the definition of orthogonality. Two vectors x and y are called orthogonal if < x, y > = 0 Similarly two random variables X and Y are called orthogonal if EXY = 0 If each of X and Y is zero-mean Cov( X ,Y ) = EXY Therefore, if EXY = 0 then Cov ( XY ) = 0 for this case. For zero-mean random variables, Orthogonality uncorrelatedness 13
  • 14. 1.19 Orthogonality Principle X is a random variable which is not observable. Y is another observable random variable which is statistically dependent on X . Given a value of Y what is the best guess for X ? (Estimation problem). Let the best estimate be ˆ X (Y ) . Then E ( X − X (Y )) 2 ˆ is a minimum with respect ˆ to X (Y ) . And the corresponding estimation principle is called minimum mean square error principle. For finding the minimum, we have ∂ ∂Xˆ E ( X − X (Y ))2 = 0 ˆ ∞ ∞ ⇒ ∂ ∂Xˆ ∫ ∫ ( x − X ( y )) f X ,Y ( x, y )dydx = 0 ˆ 2 −∞ −∞ ∞ ∞ ⇒ ∂ ∂Xˆ ∫ ∫ ( x − X ( y )) fY ( y ) f X / Y ( x )dyd x = 0 ˆ 2 −∞ −∞ ∞ ∞ ⇒ ∂ ∂Xˆ ∫ fY ( y )( ∫ ( x − X ( y )) f X / Y ( x )dx )dy = 0 ˆ 2 −∞ −∞ Since fY ( y ) in the above equation is always positive, therefore the minimization is equivalent to ∞ ∫ ( x − X ( y )) f X / Y ( x)dx = 0 ∂ ˆ 2 ∂Xˆ −∞ ∞ Or 2 ∫ ( x − X ( y )) f X / Y ( x)dx = 0 ˆ -∞ ∞ ∞ ⇒ ∫ X ( y ) f X / Y ( x)dx = ∫ xf X / Y ( x)dx ˆ −∞ −∞ ⇒ X ( y) = E ( X / Y ) ˆ Thus, the minimum mean-square error estimation involves conditional expectation which is difficult to obtain numerically. Let us consider a simpler version of the problem. We assume that X ( y ) = ay and the ˆ estimation problem is to find the optimal value for a. Thus we have the linear minimum mean-square error criterion which minimizes E ( X − aY ) 2 . d da E ( X − aY ) 2 = 0 ⇒ E da ( X − aY ) 2 = 0 d ⇒ E ( X − aY )Y = 0 ⇒ EeY = 0 where e is the estimation error. 14
  • 15. The above result shows that for the linear minimum mean-square error criterion, estimation error is orthogonal to data. This result helps us in deriving optimal filters to estimate a random signal buried in noise. The mean and variance also give some quantitative information about the bounds of RVs. Following inequalities are extremely useful in many practical problems. 1.20 Chebysev Inequality Suppose X is a parameter of a manufactured item with known mean µ X and variance σ X . The quality control department rejects the item if the absolute 2 deviation of X from µ X is greater than 2σ X . What fraction of the manufacturing item does the quality control department reject? Can you roughly guess it? The standard deviation gives us an intuitive idea how the random variable is distributed about the mean. This idea is more precisely expressed in the remarkable Chebysev Inequality stated below. For a random variable X with mean µ and variance σ 2 X X σX 2 P{ X − µ X ≥ ε } ≤ ε2 Proof: ∞ σ x = ∫ ( x − µ X ) 2 f X ( x)dx 2 −∞ ≥ ∫ ( x − µ X ) 2 f X ( x)dx X − µ X ≥ε ≥ ∫ ε 2 f X ( x) dx X − µ X ≥ε = ε 2 P{ X − µ X ≥ ε } σX 2 ∴ P{ X − µ X ≥ ε } ≤ ε2 1.21 Markov Inequality For a random variable X which take only nonnegative values E( X ) P{ X ≥ a} ≤ where a > 0. a ∞ E ( X ) = ∫ xf X ( x)dx 0 ∞ ≥ ∫ xf X ( x)dx a ∞ ≥ ∫ af X ( x)dx a = aP{ X ≥ a} 15
  • 16. E( X ) ∴ P{ X ≥ a} ≤ a E ( X − k )2 Result: P{( X − k ) 2 ≥ a} ≤ a 1.22 Convergence of a sequence of random variables Let X 1 , X 2 ,..., X n be a sequence n independent and identically distributed random variables. Suppose we want to estimate the mean of the random variable on the basis of the observed data by means of the relation 1N µX = ∑ Xi ˆ n i =1 How closely does µ X represent ˆ µ X as n is increased? How do we measure the closeness between µ X and µ X ? ˆ Notice that µ X is a random variable. What do we mean by the statement µ X converges ˆ ˆ to µ X ? Consider a deterministic sequence x1 , x 2 ,....x n .... The sequence converges to a limit x if correspond to any ε > 0 we can find a positive integer m such that x − xn < ε for n > m. Convergence of a random sequence X 1 , X 2 ,.... X n .... cannot be defined as above. A sequence of random variables is said to converge everywhere to X if X (ξ ) − X n (ξ ) → 0 for n > m and ∀ξ . 1.23 Almost sure (a.s.) convergence or convergence with probability 1 For the random sequence X 1 , X 2 ,.... X n .... { X n → X } this is an event. P{s | X n ( s ) → X ( s )} = 1 as n → ∞, If P{s X n ( s ) − X ( s ) < ε for n ≥ m} = 1 as m → ∞, then the sequence is said to converge to X almost sure or with probability 1. One important application is the Strong Law of Large Numbers: 1 n If X 1 , X 2 ,.... X n .... are iid random variables, then ∑ X i → µ X with probability 1as n → ∞. n i =1 16
  • 17. 1.24 Convergence in mean square sense If E ( X n − X ) 2 → 0 as n → ∞, we say that the sequence converges to X in mean square (M.S). Example 7: If X 1 , X 2 ,.... X n .... are iid random variables, then 1N ∑ X i → µ X in the mean square 1as n → ∞. n i =1 1N We have to show that lim E ( ∑ X i − µ X )2 = 0 n →∞ n i =1 Now, 1N 1 N E ( ∑ X i − µ X ) 2 = E ( ( ∑ ( X i − µ X )) 2 n i =1 n i =1 1 N 1 n n = 2 ∑ E ( X i − µ X ) 2 + 2 ∑ ∑ E ( X i − µ X )( X j − µ X ) n i=1 n i=1 j=1,j≠i nσ X2 = 2 +0 ( Because of independence) n σX 2 = n 1N ∴ lim E ( ∑ X i − µ X ) 2 = 0 n→∞ n i =1 1.25 Convergence in probability P{ X n − X > ε } is a sequence of probability. X n is said to convergent to X in probability if this sequence of probability is convergent that is P{ X n − X > ε } → 0 as n → ∞. If a sequence is convergent in mean, then it is convergent in probability also, because P{ X n − X 2 > ε 2 } ≤ E ( X n − X )2 / ε 2 (Markov Inequality) We have P{ X n − X > ε } ≤ E ( X n − X ) 2 / ε 2 If E ( X n − X ) 2 → 0 as n → ∞, (mean square convergent) then P{ X n − X > ε } → 0 as n → ∞. 17
  • 18. Example 8: Suppose { X n } be a sequence of random variables with 1 P ( X n = 1} = 1 − n and 1 P ( X n = −1} = n Clearly 1 P{ X n − 1 > ε } = P{ X n = −1} = →0 n as n → ∞. Therefore { X n } ⎯⎯ { X = 0} P → 1.26 Convergence in distribution The sequence X 1 , X 2 ,.... X n .... is said to converge to X in distribution if F X n ( x ) → FX ( x ) as n → ∞. Here the two distribution functions eventually coincide. 1.27 Central Limit Theorem Consider independent and identically distributed random variables X 1 , X 2 ,.... X n . Let Y = X 1 + X 2 + ... X n Then µ Y = µ X 1 + µ X 2 + ...µ X n And σ Y = σ X1 + σ X 2 + ... + σ X n 2 2 2 2 The central limit theorem states that under very general conditions Y converges to N ( µ Y , σ Y ) as n → ∞. The conditions are: 2 1. The random variables X , X ,..., X are independent with same mean and 1 2 n variance, but not identically distributed. 2. The random variables X , X ,..., X are independent with different mean and 1 2 n same variance and not identically distributed. 18
  • 19. 1.28 Jointly Gaussian Random variables Two random variables X and Y are called jointly Gaussian if their joint density function is ⎡ ( x−µ X )2 ( x− µ )( y − µ ) ( y − µ )2 ⎤ − 1 2 ⎢ 2 − 2 ρ XY σ X σ Y + Y 2 ⎥ 2 (1− ρ X ,Y ) ⎢ σX σY ⎥ f X ,Y ( x, y ) = Ae ⎣ ⎦ X Y where A= 1 2πσ xσ y 1− ρ X ,Y 2 Properties: (1) If X and Y are jointly Gaussian, then for any constants a and b, then the random variable Z , given by Z = aX + bY is Gaussian with mean µ Z = aµ X + bµY and variance σ Z 2 = a 2σ X 2 + b 2σ Y 2 + 2abσ X σ Y ρ X ,Y (2) If two jointly Gaussian RVs are uncorrelated, ρ X ,Y = 0 then they are statistically independent. f X ,Y ( x, y ) = f X ( x ) f Y ( y ) in this case. (3) If f X ,Y ( x, y ) is a jointly Gaussian distribution, then the marginal densities f X ( x) and f Y ( y ) are also Gaussian. (4) If X and Y are joint by Gaussian random variables then the optimum nonlinear estimator X of X that minimizes the mean square error ξ = E{[ X − X ] 2 } is a linear ˆ ˆ estimator X = aY ˆ 19
  • 20. REVIEW OF RANDOM PROCESS 2.1 Introduction Recall that a random variable maps each sample point in the sample space to a point in the real line. A random process maps each sample point to a waveform. • A random process can be defined as an indexed family of random variables { X (t ), t ∈ T } where T is an index set which may be discrete or continuous usually denoting time. • The random process is defined on a common probability space {S , ℑ, P}. • A random process is a function of the sample point ξ and index variable t and may be written as X (t , ξ ). • For a fixed t (= t 0 ), X (t 0 , ξ ) is a random variable. • For a fixed ξ (= ξ 0 ), X (t , ξ 0 ) is a single realization of the random process and is a deterministic function. • When both t and ξ are varying we have the random process X (t , ξ ). The random process X (t , ξ ) is normally denoted by X (t ). We can define a discrete random process X [n] on discrete points of time. Such a random process is more important in practical implementations. X (t , s3 ) S s3 X (t , s2 ) s1 s2 X (t , s1 ) t Figure Random Process
  • 21. 2.2 How to describe a random process? To describe X (t ) we have to use joint density function of the random variables at different t . For any positive integer n , X (t1 ), X (t 2 ),..... X (t n ) represents n jointly distributed random variables. Thus a random process can be described by the joint distribution function FX ( t1 ), X ( t2 )..... X ( tn ) ( x1 , x 2 .....x n ) = F ( x1 , x 2 .....x n , t1 , t 2 .....t n ), ∀n ∈ N and ∀t n ∈ T Otherwise we can determine all the possible moments of the process. E ( X (t )) = µ x (t ) = mean of the random process at t. R X (t1 , t 2 ) = E ( X (t 1) X (t 2 )) = autocorrelation function at t1 ,t 2 R X (t1 , t 2 , t 3 ) = E ( X (t 1 ), X (t 2 ), X (t 3 )) = Triple correlation function at t1 , t 2 , t 3 , etc. We can also define the auto-covariance function C X (t1 , t 2 ) of X (t ) given by C X (t1 , t 2 ) = E ( X (t 1 ) − µ X (t1 ))( X (t 2 ) − µ X (t 2 )) = R X (t1 , t 2 ) − µ X (t1 ) µ X (t 2 ) Example 1: (a) Gaussian Random Process For any positive integer n, X (t1 ), X (t 2 ),..... X (t n ) represent n jointly random variables. These n random variables define a random vector X = [ X (t1 ), X (t2 ),..... X (tn )]'. The process X (t ) is called Gaussian if the random vector [ X (t1 ), X (t2 ),..... X (tn )]' is jointly Gaussian with the joint density function given by 1 − X'C−1X e 2 X f X (t1 ), X (t2 )... X (tn ) ( x1 , x2 ,...xn ) = where C X = E ( X − µ X )( X − µ X ) ' ( ) n 2π det(CX ) and µ X = E ( X) = [ E ( X 1 ), E ( X 2 )......E ( X n ) ] '. (b) Bernouli Random Process (c) A sinusoid with a random phase.
  • 22. 2.3 Stationary Random Process A random process X (t ) is called strict-sense stationary if its probability structure is invariant with time. In terms of the joint distribution function FX ( t1 ), X (t 2 )..... X ( t n ) ( x1 , x 2 ..... x n ) = FX ( t1 + t0 ), X ( t 2 + t0 )..... X ( t n + t0 ) ( x1 , x 2 .....x n ) ∀n ∈ N and ∀t 0 , t n ∈ T For n = 1, FX ( t1 ) ( x1 ) = FX ( t1 +t0 ) ( x1 ) ∀t 0 ∈ T Let us assume t 0 = −t1 FX (t1 ) ( x1 ) = FX ( 0 ) ( x1 ) ⇒ EX (t1 ) = EX (0) = µ X (0) = constant For n = 2, FX ( t1 ), X ( t 2 ) ( x1 , x 2 .) = FX ( t1 +t0 ), X ( t 2 +t0 ) ( x1 , x 2 ) Put t 0 = −t 2 FX ( t1 ), X ( t2 ) ( x1 , x2 ) = FX ( t1 −t2 ), X ( 0 ) ( x1 , x2 ) ⇒ R X (t1 , t2 ) = R X (t1 − t2 ) A random process X (t ) is called wide sense stationary process (WSS) if µ X (t ) = constant R X (t1 , t2 ) = R X (t1 − t2 ) is a function of time lag. For a Gaussian random process, WSS implies strict sense stationarity, because this process is completely described by the mean and the autocorrelation functions. The autocorrelation function R X (τ ) = EX (t + τ ) X (t ) is a crucial quantity for a WSS process. • RX (0) = EX 2 (t ) is the mean-square value of the process. • RX ( −τ ) = RX (τ )for real process (for a complex process X(t), RX ( −τ ) = R* (τ ) X • RX (τ ) <RX (0) which follows from the Schwartz inequality RX (τ ) = {EX (t ) X (t + τ )}2 2 = < X (t ), X (t + τ ) > 2 X (t + τ ) 2 2 ≤ X (t ) = EX 2 (t ) EX 2 (t + τ ) = RX (0) RX (0) 2 2 ∴ RX (τ ) <RX (0)
  • 23. RX (τ ) is a positive semi-definite function in the sense that for any positive n n integer n and real a j , a j , ∑ ∑ ai a j RX (ti , t j )>0 i =1 j =1 • If X (t ) is periodic (in the mean square sense or any other sense like with probability 1), then R X (τ ) is also periodic. For a discrete random sequence, we can define the autocorrelation sequence similarly. • If R X (τ ) drops quickly , then the signal samples are less correlated which in turn means that the signal has lot of changes with respect to time. Such a signal has high frequency components. If R X (τ ) drops slowly, the signal samples are highly correlated and such a signal has less high frequency components. • R X (τ ) is directly related to the frequency domain representation of WSS process. 2.4 Spectral Representation of a Random Process How to have the frequency-domain representation of a random process? • Wiener (1930) and Khinchin (1934) independently discovered the spectral representation of a random process. Einstein (1914) also used the concept. • Autocorrelation function and power spectral density forms a Fourier transform pair Lets define X T (t) = X(t) -T < t < T = 0 otherwise as t → ∞, X T (t ) will represent the random process X (t ). T Define X T ( w) = ∫X −T T (t)e − jwt dt in mean square sense. t2 −τ t1 −τ dτ
  • 24. X T (ω ) X T * (ω ) | X (ω ) |2 1 TT − jωt + jωt E =E T = ∫ ∫ EX T (t1 ) X T (t 2 )e 1 e 2 dt1dt 2 2T 2T 2T −T −T 1 T T − jω ( t1 −t2 ) = ∫ ∫ RX (t1 − t2 )e dt1dt2 2T −T −T 1 2T − jωτ = ∫ RX (τ )e (2T − | τ |)dτ 2T −2T Substituting t1 − t 2 = τ so that t 2 = t1 − τ is a line, we get X T (ω ) X T * (ω ) 2T |τ | E = ∫ R x (τ )e − jωτ (1 − ) dτ 2T − 2T 2T If R X (τ ) is integrable then as T → ∞, E X T (ω ) 2 ∞ limT →∞ = ∫ RX (τ )e − jωτ dτ 2T −∞ E X T (ω ) 2 = contribution to average power at freq ω and is called the power spectral 2T density. ∞ Thus ∫ R (τ )e − jωτ SX (ω ) = x dτ −∞ ∞ and R X (τ ) = ∫ −∞ SX (ω )e jωτ dw Properties ∞ • EX 2 (t) = R X (0) = ∫ S X (ω )dw = average power of the process. −∞ w2 • The average power in the band ( w1 , w2) is ∫ S X ( w)dw w1 • R X (τ ) is real and even ⇒ S X (ω ) is real, even. E X T (ω ) 2 • From the definition S X ( w) = limT →∞ is always positive. 2T S x (ω ) • h X ( w) = = normalised power spectral density and has properties of PDF, EX 2 (t ) (always +ve and area=1).
  • 25. 2.5 Cross-correlation & Cross power Spectral Density Consider two real random processes X(t) and Y(t). Joint stationarity of X(t) and Y(t) implies that the joint densities are invariant with shift of time. The cross-correlation function RX,Y (τ ) for a jointly wss processes X(t) and Y(t) is defined as R X,Y (τ ) = Ε X (t + τ )Y (t ) so that RYX (τ ) = Ε Y (t + τ ) X (t ) = Ε X (t )Y (t + τ ) = R X,Y ( − τ ) ∴ RYX (τ ) = R X,Y ( − τ ) Cross power spectral density ∞ S X ,Y ( w) = ∫ RX ,Y (τ )e− jwτ dτ −∞ For real processes X(t) and Y(t) S X ,Y ( w) = SY , X ( w) * The Wiener-Khinchin theorem is also valid for discrete-time random processes. If we define R X [m] = E X [n + m] X [n] Then corresponding PSD is given by ∞ S X ( w) = ∑ Rx [ m ] e − jω m −π ≤ w ≤ π m =−∞ ∞ or S X ( f ) = ∑ Rx [ m] e− j 2π m −1 ≤ f ≤ 1 m =−∞ 1 π jωm ∴ RX [m ] = ∫ S X ( w)e dw 2π −π For a discrete sequence the generalized PSD is defined in the z − domain as follows ∞ S X ( z) = ∑ R [m ] z m =−∞ x −m If we sample a stationary random process uniformly we get a stationary random sequence. Sampling theorem is valid in terms of PSD.
  • 26. Examples 2: −a τ (1) R X (τ ) = e a>0 2a S X ( w) = -∞ < w < ∞ a + w2 2 m (2) RX ( m) = a a >0 1 − a2 S X ( w) = -π ≤ w ≤ π 1 − 2a cos w + a 2 2.6 White noise process S x (f ) → f A white noise process X (t ) is defined by N SX ( f ) = −∞ < f < ∞ 2 The corresponding autocorrelation function is given by N RX (τ ) = δ (τ ) where δ (τ ) is the Dirac delta. 2 The average power of white noise ∞ N Pavg = ∫ df → ∞ −∞ 2 • Samples of a white noise process are uncorrelated. • White noise is an mathematical abstraction, it cannot be realized since it has infinite power • If the system band-width(BW) is sufficiently narrower than the noise BW and noise PSD is flat , we can model it as a white noise process. Thermal and shot noise are well modelled as white Gaussian noise, since they have very flat psd over very wide band (GHzs • For a zero-mean white noise process, the correlation of the process at any lag τ ≠ 0 is zero. • White noise plays a key role in random signal modelling. • Similar role as that of the impulse function in the modeling of deterministic signals. Linear White Noise System WSS Random Signal
  • 27. 2.7 White Noise Sequence For a white noise sequence x[n], N S X ( w) = −π ≤ w ≤ π 2 Therefore N R X ( m) =δ ( m) 2 where δ (m) is the unit impulse sequence. S X (ω ) RX [ m] NN N N N 22 2 2 2 • • • • • m→ −π 2.8 Linear Shift Invariant System with Random Inputs π →ω Consider a discrete-time linear system with impulse response h[n]. h[n] y[n] x[n] y[n] = x[n] * h[n] E y[n] = E x[n] * h[n] For stationary input x[n] l µY = E y[n] = µ X * h[n] = µ X ∑ h[n] k =0 where l is the length of the impulse response sequence RY [m] = E y[n] y[n − m] = E ( x[n] * h[n]) * ( x[n − m] * h[n − m]) = R X [m] * h[m] * h[−m] RY [m] is a function of lag m only. From above we get S Y (w) = | Η ( w) | 2 S X ( w) S XX (w) SYY (w) H(w ) 2
  • 28. Example 3: Suppose H ( w) = 1 − wc ≤ w ≤ wc = 0 otherwise N SX ( w) = −∞ ≤ w≤ ∞ R X (τ ) 2 N Then SY ( w) = − wc ≤ w ≤ wc 2 N τ and R Y (τ ) = sinc(w cτ ) 2 • Note that though the input is an uncorrelated process, the output is a correlated process. Consider the case of the discrete-time system with a random sequence x[n ] as an input. x[n ] y[n ] h[n ] RY [m] = R X [m] * h[m] * h[ −m] Taking the z − transform, we get S Y ( z ) = S X ( z ) H ( z ) H ( z −1 ) Notice that if H (z ) is causal, then H ( z −1 ) is anti causal. Similarly if H (z ) is minimum-phase then H ( z −1 ) is maximum-phase. R XX [m ] RYY [m ] H ( z) H ( z −1 ) S XX ( z ) SYY [z ] Example 4: 1 If H ( z) = and x[n ] is a unity-variance white-noise sequence, then 1 − α z −1 SYY ( z ) = H ( z ) H ( z −1 ) ⎛ 1 ⎞⎛ 1 ⎞ 1 =⎜ −1 ⎟⎜ ⎟ ⎝ 1 − α z ⎠⎝ 1 − α z ⎠ 2π By partial fraction expansion and inverse z − transform, we get 1 RY [m ] = a |m| 1−α 2
  • 29. 2.9 Spectral factorization theorem A stationary random signal X [n ] that satisfies the Paley Wiener condition π ∫ | ln S X ( w) | dw < ∞ can be considered as an output of a linear filter fed by a white noise −π sequence. If S X (w) is an analytic function of w , π and ∫ | ln S X ( w) | dw < ∞ , then S X ( z ) = σ v2 H c ( z ) H a ( z ) −π where H c ( z ) is the causal minimum phase transfer function H a ( z ) is the anti-causal maximum phase transfer function and σ v2 a constant and interpreted as the variance of a white-noise sequence. Innovation sequence v[n] X [n ] H c ( z) Figure Innovation Filter Minimum phase filter => the corresponding inverse filter exists. 1 v[n] X [n ] H c ( z) Figure whitening filter 1 Since ln S XX ( z ) is analytic in an annular region ρ < z < , ρ ∞ ln S XX ( z ) = ∑ c[k ]z − k k =−∞ 1 π iwn where c[k ] = ∫ ln S XX ( w)e dw is the kth order cepstral coefficient. 2π −π For a real signal c[k ] = c[−k ] 1 π and c[0] = ∫ ln S XX ( w)dw 2π −π ∞ ∑ c[ k ] z − k S XX ( z ) = e k =−∞ ∞ −1 ∑ c[ k ] z − k ∑ c[ k ] z − k =e c[0] e k =1 e k =−∞ ∞ ∑ c[ k ] z − k Let H C ( z ) = ek =1 z >ρ = 1 + hc (1)z -1 + hc (2) z −2 + ......
  • 30. (∵ hc [0] = Lim z →∞ H C ( z ) = 1 H C ( z ) and ln H C ( z ) are both analytic => H C ( z ) is a minimum phase filter. Similarly let −1 ∑ c( k ) z− k H a ( z) = e k =−∞ ∞ ∑ c( k ) zk 1 = e k =1 = H C ( z −1 ) z< ρ Therefore, S XX ( z ) = σ V H C ( z ) H C ( z −1 ) 2 where σ V = ec ( 0) 2 Salient points • S XX (z ) can be factorized into a minimum-phase and a maximum-phase factors i.e. H C ( z ) and H C ( z −1 ). • In general spectral factorization is difficult, however for a signal with rational power spectrum, spectral factorization can be easily done. • Since is a minimum phase filter, 1 exists (=> stable), therefore we can have a H C (z) 1 filter to filter the given signal to get the innovation sequence. HC ( z) • X [n ] and v[n] are related through an invertible transform; so they contain the same information. 2.10 Wold’s Decomposition Any WSS signal X [n ] can be decomposed as a sum of two mutually orthogonal processes • a regular process X r [ n] and a predictable process X p [n] , X [n ] = X r [n ] + X p [n ] • X r [ n ] can be expressed as the output of linear filter using a white noise sequence as input. • X p [n] is a predictable process, that is, the process can be predicted from its own past with zero prediction error.
  • 31. RANDOM SIGNAL MODELLING 3.1 Introduction The spectral factorization theorem enables us to model a regular random process as an output of a linear filter with white noise as input. Different models are developed using different forms of linear filters. • These models are mathematically described by linear constant coefficient difference equations. • In statistics, random-process modeling using difference equations is known as time series analysis. 3.2 White Noise Sequence The simplest model is the white noise v[n] . We shall assume that v[n] is of 0- mean and variance σ V . 2 RV [m] • • • • m SV ( w ) σV2 2π w 3.3 Moving Average model MA(q ) model FIR v[n] X [ n ] filter The difference equation model is q X [ n] = ∑ bi v[ n − i ] i =o µe = 0 ⇒ µ X = 0 and v[n] is an uncorrelated sequence means q σ X 2 = ∑ bi 2σ V 2 i =0
  • 32. The autocorrelations are given by RX [ m] = E X [ n ] X [ n - m] q q = ∑ ∑ bib j Ev[n − i ]v[n − m − j ] i = 0 j =0 q q = ∑ ∑ bib j RV [m − i + j ] i = 0 j =0 Noting that RV [m] = σ V δ [m] , we get 2 RV [m] = σ V when 2 m-i + j = 0 ⇒ i=m+ j The maximum value for m + j is q so that q −m RX [m ] = ∑ b j b j + mσ V 2 0≤m≤q j =0 and RX [ − m ] = R X [m ] q− m Writing the above two relations together RX [m ] = ∑bb j =0 j j+ m σ v2 m ≤q = 0 otherwise Notice that, RX [ m ] is related by a nonlinear relationship with model parameters. Thus finding the model parameters is not simple. The power spectral density is given by σV2 − jw B ( w) , where B( w) = = bο + b1 e + ......bq e − jqw 2 S X ( w) = 2π FIR system will give some zeros. So if the spectrum has some valleys then MA will fit well. 3.3.1 Test for MA process RX [m] becomes zero suddenly after some value of m. RX [m] m Figure: Autocorrelation function of a MA process
  • 33. Figure: Power spectrum of a MA process S X ( w) R X [ m] q →m →ω Example 1: MA(1) process X [n ] = b1v[n − 1] + b0v[n ] Here the parameters b0 and b1 are tobe determined. We have σ X = b12 + b0 2 2 RX [1] = b1b0 From above b0 and b1 can be calculated using the variance and autocorrelation at lag 1 of the signal. 3.4 Autoregressive Model In time series analysis it is called AR(p) model. v[n] IIR X [n ] filter The model is given by the difference equation p X [ n] = ∑ ai X [ n − i ] + v[ n] i =1 The transfer function A(w) is given by
  • 34. 1 A( w) = n 1 − ∑ ai e − jωi i =1 σ e2 with a0 = 1 (all poles model) and S X ( w) = 2π | A(ω ) |2 If there are sharp peaks in the spectrum, the AR(p) model may be suitable. S X (ω ) R X [ m] →ω →m The autocorrelation function RX [ m ] is given by RX [ m] = E X [ n ] X [ n - m] p = ∑ ai EX [n − i ] X [ n − m] + Ev[n] X [n − m] i =1 p = ∑ ai RX [m − i ] + σ V δ [m] 2 i =1 p ∴ RX [ m] = ∑ ai RX [ m − i ] + σ V δ [ m] 2 ∨m∈I i =1 The above relation gives a set of linear equations which can be solved to find a i s. These sets of equations are known as Yule-Walker Equation. Example 2: AR(1) process X [n] = a1 X [n − 1] + v[ n] RX [m] = a1RX [m − 1] + σ V δ [ m] 2 ∴ RX [0] = a1RX [ −1] + σ V 2 (1) and RX [1] = a1RX [0] RX [1] so that a1 = RX [0] σV 2 From (1) σ X = RX [0] = 2 1- a 2 After some arithmatic we get a σV m2 RX [ m] = 1- a 2
  • 35. 3.5 ARMA(p,q) – Autoregressive Moving Average Model Under the most practical situation, the process may be considered as an output of a filter that has both zeros and poles. X [ n] v[n] B ( w) H ( w) = A( w) The model is given by p q x[n] = ∑ ai X [ n − i ] + ∑ bi v[ n − i ] (ARMA 1) i =1 i =0 and is called the ARMA( p, q ) model. The transfer function of the filter is given by B (ω ) H ( w) = A(ω ) B (ω ) σ V 22 S X ( w) = A(ω ) 2π 2 How do get the model parameters? For m ≥ max( p, q + 1), there will be no contributions from bi terms to RX [ m]. p RX [ m] = ∑ ai RX [ m − i ] m ≥ max( p, q + 1) i =1 From a set of p Yule Walker equations, ai parameters can be found out. Then we can rewrite the equation p X [n] = X [n] − ∑ ai X [n − i ] i =1 q ∴ X [n] = ∑ bi v[n − i ] i =0 From the above equation bi s can be found out. The ARMA( p, q ) is an economical model. Only AR ( p ) only MA( q ) model may require a large number of model parameters to represent the process adequately. This concept in model building is known as the parsimony of parameters. The difference equation of the ARMA( p, q ) model, given by eq. (ARMA 1) can be reduced to p first-order difference equation give a state space representation of the random process as follows: z[n] = Az[n − 1] + Bu[n] X [n] = Cz[n] where
  • 36. z[n] = [ x[ n] X [n − 1].... X [ n − p]]′ ⎡ a1 a2 ......a p ⎤ ⎢ ⎥ A= ⎢ 0 1......0 ⎥ , B = [1 0...0]′ and ⎢............... ⎥ ⎢ ⎥ ⎢ 0 0......1 ⎥ ⎣ ⎦ C = [b0 b1...bq ] Such representation is convenient for analysis. 3.6 General ARMA( p, q) Model building Steps • Identification of p and q. • Estimation of model parameters. • Check the modeling error. • If it is white noise then stop. Else select new values for p and q and repeat the process. 3.7 Other model: To model nonstatinary random processes • ARIMA model: Here after differencing the data can be fed to an ARMA model. • SARMA model: Seasonal ARMA model etc. Here the signal contains a seasonal fluctuation term. The signal after differencing by step equal to the seasonal period becomes stationary and ARMA model can be fitted to the resulting data.
  • 37. CHAPTER – 4: ESTIMATION THEORY 4.1 Introduction • For speech, we have LPC (linear predictive code) model, the LPC-parameters are to be estimated from observed data. • We may have to estimate the correct value of a signal from the noisy observation. In RADAR signal processing In sonar signal processing Array of sensors Signals generated by the submarine due Mechanical movements of the submarine • Estimate the target, target distance from the observed data • Estimate the location of the submarine. Generally estimation includes parameter estimation and signal estimation. We will discuss the problem of parameter estimation here. We have a sequence of observable random variables X 1 , X 2 ,...., X N , represented by the vector ⎡ X1 ⎤ ⎢ ⎥ X X=⎢ 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢XN ⎥ ⎣ ⎦ X is governed by a joint density junction which depends on some unobservable parameter θ given by f X ( x1 , x 2 ,..., x N | θ ) = f X ( x | θ ) where θ may be deterministic or random. Our aim is to make an inference on θ from an observed sample of X 1 , X 2 ,...., X N . An estimator θˆ(X) is a rule by which we guess about the value of an unknown θ on the basis of X.
  • 38. θˆ(X) is a random, being a function of random variables. For a particular observation x1 , x2 ,...., xN , we get what is known as an estimate (not estimator) Let X 1 , X 2 ,...., X N be a sequence of independent and identically distributed (iid) random variables with mean µ X and variance σ X . 2 1 N µ= ˆ ∑ X i is an estimator for µ X . N i =1 1 N σX = ˆ2 ∑ (Y1 − µ X ) is an estimator for ˆ 2 σX. 2 N i =1 An estimator is a function of the random sequence X 1 , X 2 ,...., X N and if it does not involve any unknown parameters. Such a function is generally called a statistic. 4.2 Properties of the Estimator A good estimator should satisfy some properties. These properties are described in terms of the mean and variance of the estimator. 4.3 Unbiased estimator An estimator θˆ of θ is said to be unbiased if and only if Eθˆ = θ . The quantity Eθˆ − θ is called the bias of the estimator. Unbiased ness is necessary but not sufficient to make an estimator a good one. N 1 Consider σ 12 = ˆ N ∑(X i =1 i − µ X )2 ˆ 1 N and σ 22 = ˆ ∑ ( X i − µ X )2 N − 1 i =1 ˆ for an iid random sequence X 1 , X 2 ,...., X N . We can show that σ 2 is an unbiased estimator. ˆ2 N E ∑ ( X i − µ X )2 = E ∑ ( X i − µ X + µ X − µ X )2 ˆ ˆ i =1 = E ∑ {( X i − µ X ) 2 + ( µ X − µ X ) 2 + 2( X i − µ X )( µ X − µ X )} ˆ ˆ Now E ( X i − µ X ) 2 = σ 2 2 ⎛ ∑ Xi ⎞ and E (µ X − µ X ) = E⎜ µ X − 2 ˆ ⎟ ⎝ N ⎠
  • 39. E = 2 ( N µ X − ∑ X i )2 N E = 2 (∑( X i − µ X )) 2 N E = 2 ∑( X i − µ X ) 2 + ∑ ∑ E ( X i − µ X )( X j − µ X ) N i j ≠i E = 2 ∑( X i − µ X ) 2 (because of independence) N σX 2 = N also E ( X i − µ X )( µ X − µ X ) = − E ( X i − µ X ) 2 ˆ N ∴ E ∑ ( X i − µ X ) 2 = Nσ 2 + σ 2 − 2σ 2 = ( N − 1)σ 2 ˆ i =1 1 So Eσ 2 = ˆ2 E ∑( X i − µ X ) 2 = σ 2 ˆ N −1 ∴σ 2 is an unbiased estimator of σ .2 ˆ2 Similarly sample mean is an unbiased estimator. N 1 µX = ˆ N ∑X i =1 i 1 N Nµ X Eµ X = ˆ N ∑ E{ X } = i =1 i N = µX 4.4 Variance of the estimator The variance of the estimator θˆ is given by var(θˆ) = E (θˆ − E (θˆ)) 2 For the unbiased case var(θˆ) = E (θˆ − θ ) 2 The variance of the estimator should be or low as possible. An unbiased estimator θˆ is called a minimum variance unbiased estimator (MVUE) if E (θˆ − θ ) 2 ≤ E (θˆ′ − θ ) 2 where θˆ′ is any other unbiased estimator.
  • 40. 4.5 Mean square error of the estimator MSE = E (θˆ − θ ) 2 MSE should be as as small as possible. Out of all unbiased estimator, the MVUE has the minimum mean square error. MSE is related to the bias and variance as shown below. MSE = E (θˆ − θ ) 2 = E (θˆ − Eθˆ + Eθˆ − θ ) 2 = E (θˆ − Eθˆ) 2 + E ( Eθˆ − θ ) 2 + 2 E (θˆ − Eθˆ)( Eθˆ − θ ) = E (θˆ − Eθˆ) 2 + E ( Eθˆ − θ ) 2 + 2( Eθˆ − Eθˆ)( Eθˆ − θ ) = var(θˆ) + b 2 (θˆ) + 0 ( why ?) MSE = var(θˆ) + b 2 (θ ) ˆ So 4.6 Consistent Estimators As we have more data, the quality of estimation should be better. This idea is used in defining the consistent estimator. An estimator θˆ is called a consistent estimator of θ if θˆ converges in probability to θ. lim N →∞ P ( θˆ-θ ≥ ε ) = 0 for any ε > 0 Less rigorous test is obtained by applying the Markov Inequality MSE ( P θˆ-θ ≥ ε ) ≤ E (θˆ-θ ) 2 ε2 If θˆ is an unbiased estimator ( b(θˆ) = 0 ), then MSE = var(θˆ). Therefore, if Lim E (θˆ−θ )2 =0 , then θˆ will be a consistent estimator. N →∞ Also note that MSE = var(θˆ) + b 2 (θˆ) Therefore, if the estimator is asymptotically unbiased (i.e. b(θˆ) → 0 as N → ∞) and var(θˆ) → 0 as N → ∞, then MSE → 0. ∴ Therefore for an asymptotically unbiased estimator θˆ, if var(θˆ) → 0, as N → ∞, then θˆ will be a consistent estimator.
  • 41. Example 1: X 1 , X 2 ,..., X N is an iid random sequence with unknown µ X and known variance σ X . 2 1 N Let µ X = ˆ ∑ Xi be an estimator for µ X . We have already shown that µ X is unbiased. ˆ N i=1 σX 2 Also var( µ X ) = ˆ Is it a consistent estimator? N σ2 Clearly lim var( µ X ) = lim X = 0. Therefore µ X is a consistent estimator of µ X . ˆ ˆ N →∞ N →∞ N 4.7 Sufficient Statistic The observations X 1 , X 2 .... X N contain information about the unknown parameter θ . An estimator should carry the same information about θ as the observed data. This concept of sufficient statistic is based on this idea. ˆ A measurable function θ ( X 1 , X 2 .... X N ) is called a sufficient statistic of θ if it contains the same information about θ as contained in the random sequence X 1 , X 2 .... X N . In other word the joint conditional density f X ˆ ( x1 , x2 ,... x N ) 1 , X 2 .. X N |θ ( X 1 , X 2 ... X N ) does not involve θ . There are a large number of sufficient statistics for a particular criterion. One has to select a sufficient statistic which has good estimation properties. A way to check whether a statistic is sufficient or not is through the Factorization theorem which states: θˆ( X 1 , X 2 .... X N ) is a sufficient statistic of θ if f X1 , X 2 .. X N /θ ( x1 , x2 ,...xN ) = g (θ ,θˆ)h( x1 , x2 ,...xN ) where g (θ , θˆ ) is a non-constant and nonnegative function of θ and θˆ and h( x1 , x2 ,...x N ) does not involve θ and is a nonnegative function of x1 , x2 ,...xN . Example 2: Suppose X 1 , X 2 ,..., X N is an iid Gaussian sequence with unknown mean µ X and known variance 1. 1 N Then µ X = ˆ ∑ Xi is a sufficient statistic of . N i=1
  • 42. ( ) 1 2 N 1 − x −µ X 2 i f X1 , X 2 .. X N / µ X ( x1 , x2 ,...xN ) = ∏ e i =1 2π ( ) 1N 2 1 − ∑ x −µ X 2 i =1 i = e ( 2π ) N ( ) 1N 2 1 − ∑ x −µ + µ −µ ˆ ˆ 2 i =1 i Because = e ( 2π ) N 1 − 1N ( ˆ 2 ˆ 2 ∑ ( x − µ X ) + ( µ − µ X ) + 2( xi − µ )( µ − µ X ) 2 i =1 i ˆ ˆ ) = e ( 2π ) N 1 − 1N ˆ 2 ∑ ( x −µ X ) 2 i =1 i − 1N ( ∑ ( µ −µ X ) ˆ 2 ) = e e 2 i =1 e0 ( why ?) ( 2π ) N The first exponential is a function of x1 , x2 ,...x N and the second exponential is a function of µ X and µ X . Therefore µ X is a sufficient statistics of µ X . ˆ ˆ 4.8 Cramer Rao theorem We described about the Minimum Variance Unbiased Estimator (MVUE) which is a very good estimator θˆ is an MVUE if E (θˆ ) = θ and Var (θˆ) ≤ Var (θˆ′) where ˆ θ′ is any other unbiased estimator of θ . Can we reduce the variance of an unbiased estimator indefinitely? The answer is given by the Cramer Rao theorem. Suppose θˆ is an unbiased estimator of random sequence. Let us denote the sequence by the vector ⎡X1 ⎤ ⎢X ⎥ X=⎢ 2⎥ ⎢ ⎥ ⎢ ⎥ ⎣X N ⎦ Let f X ( x1 ,......, x N / θ ) be the joint density function which characterises X. This function is also called likelihood function. θ may also be random. In that case likelihood function will represent conditional joint density function. L( x / θ ) = ln f X ( x1 ...... x N / θ ) is called log likelihood function.
  • 43. 4.9 Statement of the Cramer Rao theorem If θˆ is an unbiased estimator of θ , then 1 Var (θˆ) ≥ I (θ ) ∂L 2 where I (θ ) = E ( ) and I (θ ) is a measure of average information in the random ∂θ sequence and is called Fisher information statistic. ∂L The equality of CR bound holds if = c(θˆ − θ ) where c is a constant. ∂θ Proof: θˆ is an unbiased estimator of θ ∴ E (θˆ − θ ) = 0 . ∞ ˆ ⇒ ∫ (θ − θ ) f X (x / θ )dx = 0. −∞ Differentiate with respect to θ , we get ∞ ∂ −∞ ∫ ∂θ {(θˆ − θ ) f X ( x / θ )}dx = 0. (Since line of integration are not function of θ.) ∞ ∞ ∂ = ∫ (θˆ − θ ) f (x / θ )}dy − ∫ f X (x / θ )dx = 0. ∂θ X −∞ −∞ ∞ ∞ ∂ ∴ ∫ (θˆ − θ ) f (x / θ )}dy = ∫ f X (x / θ )dx = 1. (1) ∂θ X −∞ −∞ ∂ ∂ Note that f X (x / θ ) = {ln f X ( x / θ )} f X ( x / θ ) ∂θ ∂θ ∂L = ( ∂θ ) f X (x / θ ) Therefore, from (1) ∞ ∂ ∫ (θˆ − θ ){∂θ L(x / θ )} f −∞ X (x / θ )}dx = 1. So that 2 ⎧∞ ˆ ∂ ⎫ ⎨ ∫ (θ − θ ) f X (x / θ ) L(x / θ ) f X (x / θ )dx dx ⎬ = 1 . (2) ⎩ −∞ ∂θ ⎭ since f X (x / θ ) is ≥ 0. Recall the Cauchy Schawarz Ineaquality 2 2 2 < a, b > < a b
  • 44. where the equality holds when a = cb ( where c is any scalar ). Applying this inequality to the L.H.S. of equation (2) we get 2 ⎛∞ ˆ ∂ ⎞ ⎜ ∫ (θ − θ ) f X (x / θ ) L(x / θ ) f X (x / θ )dx dx ⎟ ⎝ −∞ ∂θ ⎠ ∞ ∞ 2 ⎛ ∂ ⎞ ≤ ∫ (θˆ-θ ) 2 f X (x / θ ) dx ∫ ⎜ ( ∂θ L(x / θ ) ⎟ f X (x / θ ) dx −∞ -∞ ⎝ ⎠ = var(θˆ ) I(θ ) ∴ L.H .S ≤ var(θˆ ) I(θ ) But R.H.S. = 1 var(θˆ) I(θ ) ≥ 1. 1 ∴ var(θˆ) ≥ , I (θ ) which is the Cramer Rao Inequality. The equality will hold when ∂ {L( x / θ ) f X ( x / θ )} = c(θˆ − θ ) f X ( x / θ ) , ∂θ so that ∂L(x / θ ) = c (θˆ-θ ) ∂θ ∞ Also from ∫ f X (x / θ )dx = 1, we get −∞ ∞ ∂ ∫ ∂θ −∞ f X (x / θ )dx = 0 ∞ ∂L ∴∫ f (x / θ )dx = 0 −∞ ∂θ X Taking the partial derivative with respect to θ again, we get ∞ ⎧ ∂2L ∂L ∂ ⎫ ∫ ⎨ 2 f X (x / θ ) + f X (x / θ ) ⎬ dx = 0 −∞ ⎩ ∂θ ∂θ ∂θ ⎭ ⎧ ∂2L ⎪∞ ⎛ ∂L ⎞ 2 ⎫ ⎪ ∴ ∫ ⎨ 2 f X (x / θ ) + ⎜ ⎟ f X ( x / θ ) ⎬ dx = 0 −∞ ⎪ ∂θ ⎝ ∂θ ⎠ ⎪ ⎩ ⎭ 2 ⎛ ∂L ⎞ ∂ 2L E⎜ ⎟ =-E 2 ⎝ ∂θ ⎠ ∂θ If θˆ satisfies CR -bound with equality, then θˆ is called an efficient estimator.
  • 45. Remark: (1) If the information I (θ ) is more, the variance of the estimator θ ˆ will be less. (2) Suppose X 1 ............. X N are iid. Then 2 ⎛ ∂ ⎞ I1 (θ ) = E ⎜ ln( f X /θ ( x)) ⎟ ⎝ ∂θ 1 ⎠ 2 ⎛ ∂ ⎞ ∴ I N (θ ) = E ⎜ ln( f X , X .. X / θ ( x1 x2 ..xN )) ⎟ ⎝ ∂θ 1 2 N ⎠ = NI1 (θ ) Example 3: Let X 1 ............. X N are iid Gaussian random sequence with known variance σ 2 and unknown mean µ . N 1 Suppose µ = ˆ N ∑X i =1 i which is unbiased. Find CR bound and hence show that µ is an efficient estimator. ˆ Likelihood function f X ( x1 , x 2 ,..... x N / θ ) will be product of individual densities (since iid) 1 N ( x − µ )2 1 − ∑ ∴ f X ( x1 , x 2 ,..... x N / θ ) = e 2σ 2 i =1 i ( ( 2π ) N σ N N 1 so that L( X / µ ) = −ln( 2π ) N σ N − 2σ 2 ∑(x i =1 i − µ )2 ∂L 1 N Now = 0 - 2 ( -2) ∑ (X i − µ ) ∂µ 2σ / i =1 ∂2L N ∴ = - 2 ∂µ 2 σ ∂2L N So that E =- 2 ∂µ 2 σ 1 1 1 σ2 ∴ CR Bound = = = N = I (θ ) ∂2L N - E 2 σ2 ∂µ ∂L 1 N N ⎛ Xi ⎞ = ∂θ 2σ 2 ∑ (X i =1 i − µ) = 2 ∑ ⎜ σ ⎝ i N - µ⎟ ⎠ N = (µ - µ ) ˆ σ2
  • 46. ∂L ˆ Hence - = c (θ - θ ) ∂θ and µ is an efficient estimator. ˆ 4.10 Criteria for Estimation The estimation of a parameter is based on several well-known criteria. Each of the criteria tries to optimize some functions of the observed samples with respect to the unknown parameter to be estimated. Some of the most popular estimation criteria are: • Maximum Likelihood • Minimum Mean Square Error. • Baye’s Method. • Maximum Entropy Method. 4.11 Maximum Likelihood Estimator (MLE) Given a random sequence X 1 ................. X N and the joint density function f X1................. X N /θ ( x1 , x2 ...xN ) which depends on an unknown nonrandom parameter θ . f X ( x 1 , x 2 , ........... x N / θ ) is called the likelihood function (for continuous function …., for discrete it will be joint probability mass function). L( x / θ ) = ln f X ( x1 , x 2 ,........., x N / θ ) is called log likelihood function. ˆ The maximum likelihood estimator θ MLE is such an estimator that f X ( x1 , x2 ........., xN / θˆMLE ) ≥ f X ( x1, x2 ,........, xN / θ ), ∀θ ˆ If the likelihood function is differentiable w.r.t. θ , then θ MLE is given by ∂ f X ( x1 , … x N /θ ) θ = 0 ˆ ∂θ MLE ∂L(x | θ ) or ˆ =0 ∂θ θ MLE Thus the MLE is given by the solution of the likelihood equation given above. ⎡θ1 ⎤ ⎢θ ⎥ If we have a number of unknown parameters given by θ = ⎢ ⎥ 2 ⎢ ⎥ ⎢ ⎥ ⎣θ N ⎦