Computer vision: models, learning and inference            Chapter 4   Fitting probability models  Please send errata to s...
Structure• Fitting probability distributions  – Maximum likelihood  – Maximum a posteriori  – Bayesian approach• Worked ex...
Maximum LikelihoodAs the name suggests we find the parameters under which thedata       is most likely.We have assumed tha...
Maximum a posteriori (MAP)FittingAs the name suggests we find the parameters which maximize theposterior probability      ...
Maximum a posteriori (MAP)FittingAs the name suggests we find the parameters which maximize theposterior probability      ...
Maximum a posterioriPredictive Density:Evaluate new data point                  under probability distributionwith MAP par...
Bayesian ApproachFittingCompute the posterior distribution over possible parametervalues using Bayes’ rule:Principle: why ...
Bayesian ApproachPredictive Density•   Each possible parameter value makes a prediction•   Some parameters more probable t...
Predictive densities for 3 methodsMaximum likelihood:Evaluate new data point                   under probability distribut...
Predictive densities for 3 methodsHow to rationalize different forms?Consider ML and MAP estimates as probabilitydistribut...
Structure• Fitting probability distributions  – Maximum likelihood  – Maximum a posteriori  – Bayesian approach• Worked ex...
Univariate Normal DistributionFor short we write:                                                       Univariate normal ...
Normal Inverse Gamma DistributionDefined on 2 variables                and        2>0or for short Four parameters         ...
Fitting normal distribution: MLAs the name suggests we find the parameters under which thedata       is most likely.Likeli...
Fitting normal distribution: ML   Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   15
Fitting a normal distribution: ML                                                             Plotted surface of likelihoo...
Fitting normal distribution: MLAlgebraically:where: or alternatively, we can maximize the logarithm            Computer vi...
Why the logarithm?The logarithm is a monotonic transformation.Hence, the position of the peak stays in the same placeBut t...
Fitting normal distribution: MLHow to maximize a function? Take derivative and set to zero.Solution:            Computer v...
Fitting normal distribution: MLMaximum likelihood solution:Should look familiar!           Computer vision: models, learni...
Least SquaresMaximum likelihood for the normal distribution... ...gives `least squares’ fitting criterion.             Com...
Fitting normal distribution: MAPFittingAs the name suggests we find the parameters which maximize theposterior probability...
Fitting normal distribution: MAPPriorUse conjugate prior, normal scaled inverse gamma.           Computer vision: models, ...
Fitting normal distribution: MAPLikelihood                                   Prior                                  Poster...
Fitting normal distribution: MAPAgain maximize the log – does not change position ofmaximum           Computer vision: mod...
Fitting normal distribution: MAPMAP solution:Mean can be rewritten as weighted sum of data mean andprior mean:          Co...
Fitting normal distribution: MAP50 data points   5 data points   1 data points
Fitting normal: Bayesian approachFittingCompute the posterior distribution using Bayes’ rule:
Fitting normal: Bayesian approachFittingCompute the posterior distribution using Bayes’ rule:Two constants MUST cancel out...
Fitting normal: Bayesian approachFittingCompute the posterior distribution using Bayes’ rule: where            Computer vi...
Fitting normal: Bayesian approachPredictive densityTake weighted sum of predictions from different parametervalues:       ...
Fitting normal: Bayesian approachPredictive densityTake weighted sum of predictions from different parametervalues:       ...
Fitting normal: Bayesian approachPredictive densityTake weighted sum of predictions from different parametervalues:     wh...
Fitting normal: Bayesian Approach  50 data points                          5 data points                               1 d...
Structure• Fitting probability distributions  – Maximum likelihood  – Maximum a posteriori  – Bayesian approach• Worked ex...
Categorical Distribution                                                         or can think of data as vector with all  ...
Dirichlet DistributionDefined over K values                                     where Or for short:                       ...
Categorical distribution: MLMaximize product of individual likelihoods            Computer vision: models, learning and in...
Categorical distribution: MLInstead maximize the log probability                                                     Lagra...
Categorical distribution: MAPMAP criterion:        Computer vision: models, learning and inference. ©2011 Simon J.D. Princ...
Categorical distribution: MAP Take derivative, set to zero and re-arrange:With a uniform prior (              1..K=1),    ...
Categorical Distribution                                                      Five samples from priorObserved data        ...
Categorical Distribution:            Bayesian approachCompute posterior distribution over parameters:Two constants MUST ca...
Categorical Distribution:            Bayesian approachCompute predictive distribution:Two constants MUST cancel out or LHS...
ML / MAP vs. Bayesian                               MAP/ML                                      BayesianComputer vision: m...
Conclusion• Three ways to fit probability distributions   • Maximum likelihood   • Maximum a posteriori   • Bayesian Appro...
Upcoming SlideShare
Loading in …5
×

04 cv mil_fitting_probability_models

276 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
276
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

04 cv mil_fitting_probability_models

  1. 1. Computer vision: models, learning and inference Chapter 4 Fitting probability models Please send errata to s.prince@cs.ucl.ac.uk
  2. 2. Structure• Fitting probability distributions – Maximum likelihood – Maximum a posteriori – Bayesian approach• Worked example 1: Normal distribution• Worked example 2: Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 2
  3. 3. Maximum LikelihoodAs the name suggests we find the parameters under which thedata is most likely.We have assumed that data was independent (hence product)Predictive Density:Evaluate new data point under probability distributionwith best parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 3
  4. 4. Maximum a posteriori (MAP)FittingAs the name suggests we find the parameters which maximize theposterior probability .Again we have assumed that data was independent Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 4
  5. 5. Maximum a posteriori (MAP)FittingAs the name suggests we find the parameters which maximize theposterior probability .Since the denominator doesn’t depend on the parameters we caninstead maximize Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 5
  6. 6. Maximum a posterioriPredictive Density:Evaluate new data point under probability distributionwith MAP parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 6
  7. 7. Bayesian ApproachFittingCompute the posterior distribution over possible parametervalues using Bayes’ rule:Principle: why pick one set of parameters? There are many valuesthat could have explained the data. Try to capture all of thepossibilities Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 7
  8. 8. Bayesian ApproachPredictive Density• Each possible parameter value makes a prediction• Some parameters more probable than othersMake a prediction that is an infinite weighted sum (integral) of thepredictions for each parameter value, where weights are theprobabilities Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 8
  9. 9. Predictive densities for 3 methodsMaximum likelihood:Evaluate new data point under probability distributionwith ML parametersMaximum a posteriori:Evaluate new data point under probability distributionwith MAP parametersBayesian:Calculate weighted sum of predictions from all possible value ofparameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 9
  10. 10. Predictive densities for 3 methodsHow to rationalize different forms?Consider ML and MAP estimates as probabilitydistributions with zero probability everywhere exceptat estimate (i.e. delta functions) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 10
  11. 11. Structure• Fitting probability distributions – Maximum likelihood – Maximum a posteriori – Bayesian approach• Worked example 1: Normal distribution• Worked example 2: Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 11
  12. 12. Univariate Normal DistributionFor short we write: Univariate normal distribution describes single continuous variable. Takes 2 parameters and 2>0 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 12
  13. 13. Normal Inverse Gamma DistributionDefined on 2 variables and 2>0or for short Four parameters and Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 13
  14. 14. Fitting normal distribution: MLAs the name suggests we find the parameters under which thedata is most likely.Likelihood given by pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 14
  15. 15. Fitting normal distribution: ML Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 15
  16. 16. Fitting a normal distribution: ML Plotted surface of likelihoods as a function of possible parameter values ML Solution is at peak Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
  17. 17. Fitting normal distribution: MLAlgebraically:where: or alternatively, we can maximize the logarithm Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17
  18. 18. Why the logarithm?The logarithm is a monotonic transformation.Hence, the position of the peak stays in the same placeBut the log likelihood is easier to work with Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 18
  19. 19. Fitting normal distribution: MLHow to maximize a function? Take derivative and set to zero.Solution: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19
  20. 20. Fitting normal distribution: MLMaximum likelihood solution:Should look familiar! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 20
  21. 21. Least SquaresMaximum likelihood for the normal distribution... ...gives `least squares’ fitting criterion. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21
  22. 22. Fitting normal distribution: MAPFittingAs the name suggests we find the parameters which maximize theposterior probability ..Likelihood is normal PDF Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 22
  23. 23. Fitting normal distribution: MAPPriorUse conjugate prior, normal scaled inverse gamma. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 23
  24. 24. Fitting normal distribution: MAPLikelihood Prior Posterior Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 24
  25. 25. Fitting normal distribution: MAPAgain maximize the log – does not change position ofmaximum Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 25
  26. 26. Fitting normal distribution: MAPMAP solution:Mean can be rewritten as weighted sum of data mean andprior mean: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 26
  27. 27. Fitting normal distribution: MAP50 data points 5 data points 1 data points
  28. 28. Fitting normal: Bayesian approachFittingCompute the posterior distribution using Bayes’ rule:
  29. 29. Fitting normal: Bayesian approachFittingCompute the posterior distribution using Bayes’ rule:Two constants MUST cancel out or LHS not a valid pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 29
  30. 30. Fitting normal: Bayesian approachFittingCompute the posterior distribution using Bayes’ rule: where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 30
  31. 31. Fitting normal: Bayesian approachPredictive densityTake weighted sum of predictions from different parametervalues: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 31
  32. 32. Fitting normal: Bayesian approachPredictive densityTake weighted sum of predictions from different parametervalues: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 32
  33. 33. Fitting normal: Bayesian approachPredictive densityTake weighted sum of predictions from different parametervalues: where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 33
  34. 34. Fitting normal: Bayesian Approach 50 data points 5 data points 1 data points Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 34
  35. 35. Structure• Fitting probability distributions – Maximum likelihood – Maximum a posteriori – Bayesian approach• Worked example 1: Normal distribution• Worked example 2: Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 35
  36. 36. Categorical Distribution or can think of data as vector with all elements zero except kth e.g. [0,0,0,1 0] For short we write:Categorical distribution describes situation where K possibleoutcomes y=1… y=k.Takes a K parameters where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 36
  37. 37. Dirichlet DistributionDefined over K values where Or for short: Has k parameters k>0 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 37
  38. 38. Categorical distribution: MLMaximize product of individual likelihoods Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 38
  39. 39. Categorical distribution: MLInstead maximize the log probability Lagrange multiplier to ensure Log likelihood that params sum to oneTake derivative, set to zero and re-arrange: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 39
  40. 40. Categorical distribution: MAPMAP criterion: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 40
  41. 41. Categorical distribution: MAP Take derivative, set to zero and re-arrange:With a uniform prior ( 1..K=1), gives same result asmaximum likelihood. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 41
  42. 42. Categorical Distribution Five samples from priorObserved data Five samples from posterior Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 42
  43. 43. Categorical Distribution: Bayesian approachCompute posterior distribution over parameters:Two constants MUST cancel out or LHS not a valid pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 43
  44. 44. Categorical Distribution: Bayesian approachCompute predictive distribution:Two constants MUST cancel out or LHS not a valid pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 44
  45. 45. ML / MAP vs. Bayesian MAP/ML BayesianComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 45
  46. 46. Conclusion• Three ways to fit probability distributions • Maximum likelihood • Maximum a posteriori • Bayesian Approach• Two worked example • Normal distribution (ML least squares) • Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 46

×