Estimation Theory1
Estimation TheoryWe seek to determine from a set of data, a set of parameters such that their values would yield the highest probability of obtaining the observed data.The unknown parameters may be seen as deterministic or random variablesThere are essentially two alternatives to the statistical caseWhen no a priori distribution assumed then Maximum LikelihoodWhen a priori distribution known then Bayes
Maximum LikelihoodPrinciple: Estimate a parameter such that for this value the probability of obtaining an actually observed sample is as large as possible.I.e. having got the observation we “look back” and compute probability that the given sample will be observed, as if the experiment is to be done again.This probability depends on a parameter which is adjusted to give it a maximum possible value.Reminds you of politicians observing the movement of the crowd and then move to the front to lead them?
Estimation TheoryLet a random variable        have a probability distribution dependent on a parameter The parameter      lies in a space of all possible parameters Let                be the probability density function of Assume the the mathematical form of          is known but not
Estimation TheoryThe joint pdf of       sample random variables evaluated at each the sample pointsIs given asThe above is known as the likelihood of the sampled observation
Estimation TheoryThe likelihood function is a function of the unknown parameter       for a fixed set of observationsThe Maximum Likelihood Principle requires us to select that value of        that maximises the likelihood functionThe parameter       may also be regarded as a vector of parameters
Estimation TheoryIt is often more convenient to useThe maximum is then at
An exampleLet                                 be a random sample selected from a normal distributionThe joint pdf is We wish to find the best        and
Estimation TheoryForm the log-likelihood functionHenceor
Fisher and Cramer-RaoThe Fisher Information helps in placing a bound on estimatorsCramer-Rao Lower Bound:“If              is any unbiased estimator of       based on maximum likelihood then Ie             provides a lower bound on the covariance matrix of any unbiased estimator
Estimation TheoryIt can be seen that if we model the observations as the output of an AR process driven by zero mean Gaussian noise then the Maximum Likelihood estimator for the variance is also the Least Squares Estimator.
The Cramer-Rao Lower BoundThis is an important theorem which establishes the superiority of the ML estimate over all others. The Cramer-Rao lower bound is the smallest theoretical variance which can be achieved. ML gives this so any other estimation technique can at best only equal it. this is the Cramer-Rao inequality.
CRB Definition:
Inverse of the Fisher Matrix:
lowest possible variance
Purpose of CRB analysis:

Estimation Theory

  • 1.
  • 2.
    Estimation TheoryWe seekto determine from a set of data, a set of parameters such that their values would yield the highest probability of obtaining the observed data.The unknown parameters may be seen as deterministic or random variablesThere are essentially two alternatives to the statistical caseWhen no a priori distribution assumed then Maximum LikelihoodWhen a priori distribution known then Bayes
  • 3.
    Maximum LikelihoodPrinciple: Estimatea parameter such that for this value the probability of obtaining an actually observed sample is as large as possible.I.e. having got the observation we “look back” and compute probability that the given sample will be observed, as if the experiment is to be done again.This probability depends on a parameter which is adjusted to give it a maximum possible value.Reminds you of politicians observing the movement of the crowd and then move to the front to lead them?
  • 4.
    Estimation TheoryLet arandom variable have a probability distribution dependent on a parameter The parameter lies in a space of all possible parameters Let be the probability density function of Assume the the mathematical form of is known but not
  • 5.
    Estimation TheoryThe jointpdf of sample random variables evaluated at each the sample pointsIs given asThe above is known as the likelihood of the sampled observation
  • 6.
    Estimation TheoryThe likelihoodfunction is a function of the unknown parameter for a fixed set of observationsThe Maximum Likelihood Principle requires us to select that value of that maximises the likelihood functionThe parameter may also be regarded as a vector of parameters
  • 7.
    Estimation TheoryIt isoften more convenient to useThe maximum is then at
  • 8.
    An exampleLet be a random sample selected from a normal distributionThe joint pdf is We wish to find the best and
  • 9.
    Estimation TheoryForm thelog-likelihood functionHenceor
  • 10.
    Fisher and Cramer-RaoTheFisher Information helps in placing a bound on estimatorsCramer-Rao Lower Bound:“If is any unbiased estimator of based on maximum likelihood then Ie provides a lower bound on the covariance matrix of any unbiased estimator
  • 11.
    Estimation TheoryIt canbe seen that if we model the observations as the output of an AR process driven by zero mean Gaussian noise then the Maximum Likelihood estimator for the variance is also the Least Squares Estimator.
  • 12.
    The Cramer-Rao LowerBoundThis is an important theorem which establishes the superiority of the ML estimate over all others. The Cramer-Rao lower bound is the smallest theoretical variance which can be achieved. ML gives this so any other estimation technique can at best only equal it. this is the Cramer-Rao inequality.
  • 13.
  • 14.
    Inverse of theFisher Matrix:
  • 15.
  • 16.
  • 17.
    indicate the performancebounds of a particular problem.
  • 18.
    facilitate analysis offactors that impact most on the performance of an algorithm.
  • 19.
    Fisher Matrix of our Energy Decay ModelThe Cramer-Rao Lower Bound
  • 20.
    A different wayof looking at information for continuous functions r(s).
  • 21.
    Intuition: We candiscriminate values of s better if r(s) is changing rapidly. If r(s) does not change much with s, then we don’t learn much about s.The Cramer-Rao Lower Bound
  • 22.
    How much doesr tell us about s?
  • 23.
  • 24.
    High negative curvatureat s for all r (on average)
  • 25.
    Rapid change inp(r|s) at s for all r (on average)
  • 26.
    Implies easy todiscriminate different values of sThe Cramer-Rao Lower Bound
  • 27.
    Applies to unbiasedestimators for which
  • 28.
    The best unbiasedestimator is only so good.
  • 29.
    The best unbiasedestimator is the ML estimate
  • 30.
    Show a relationbetween variance of estimators and an information measureThe Cramer-Rao Lower Bound