Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

276 views

Published on

License: CC Attribution-ShareAlike License

No Downloads

Total views

276

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

2

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Computer vision: models, learning and inference Chapter 4 Fitting probability models Please send errata to s.prince@cs.ucl.ac.uk
- 2. Structure• Fitting probability distributions – Maximum likelihood – Maximum a posteriori – Bayesian approach• Worked example 1: Normal distribution• Worked example 2: Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 2
- 3. Maximum LikelihoodAs the name suggests we find the parameters under which thedata is most likely.We have assumed that data was independent (hence product)Predictive Density:Evaluate new data point under probability distributionwith best parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 3
- 4. Maximum a posteriori (MAP)FittingAs the name suggests we find the parameters which maximize theposterior probability .Again we have assumed that data was independent Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 4
- 5. Maximum a posteriori (MAP)FittingAs the name suggests we find the parameters which maximize theposterior probability .Since the denominator doesn’t depend on the parameters we caninstead maximize Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 5
- 6. Maximum a posterioriPredictive Density:Evaluate new data point under probability distributionwith MAP parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 6
- 7. Bayesian ApproachFittingCompute the posterior distribution over possible parametervalues using Bayes’ rule:Principle: why pick one set of parameters? There are many valuesthat could have explained the data. Try to capture all of thepossibilities Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 7
- 8. Bayesian ApproachPredictive Density• Each possible parameter value makes a prediction• Some parameters more probable than othersMake a prediction that is an infinite weighted sum (integral) of thepredictions for each parameter value, where weights are theprobabilities Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 8
- 9. Predictive densities for 3 methodsMaximum likelihood:Evaluate new data point under probability distributionwith ML parametersMaximum a posteriori:Evaluate new data point under probability distributionwith MAP parametersBayesian:Calculate weighted sum of predictions from all possible value ofparameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 9
- 10. Predictive densities for 3 methodsHow to rationalize different forms?Consider ML and MAP estimates as probabilitydistributions with zero probability everywhere exceptat estimate (i.e. delta functions) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 10
- 11. Structure• Fitting probability distributions – Maximum likelihood – Maximum a posteriori – Bayesian approach• Worked example 1: Normal distribution• Worked example 2: Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 11
- 12. Univariate Normal DistributionFor short we write: Univariate normal distribution describes single continuous variable. Takes 2 parameters and 2>0 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 12
- 13. Normal Inverse Gamma DistributionDefined on 2 variables and 2>0or for short Four parameters and Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 13
- 14. Fitting normal distribution: MLAs the name suggests we find the parameters under which thedata is most likely.Likelihood given by pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 14
- 15. Fitting normal distribution: ML Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 15
- 16. Fitting a normal distribution: ML Plotted surface of likelihoods as a function of possible parameter values ML Solution is at peak Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
- 17. Fitting normal distribution: MLAlgebraically:where: or alternatively, we can maximize the logarithm Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17
- 18. Why the logarithm?The logarithm is a monotonic transformation.Hence, the position of the peak stays in the same placeBut the log likelihood is easier to work with Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 18
- 19. Fitting normal distribution: MLHow to maximize a function? Take derivative and set to zero.Solution: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19
- 20. Fitting normal distribution: MLMaximum likelihood solution:Should look familiar! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 20
- 21. Least SquaresMaximum likelihood for the normal distribution... ...gives `least squares’ fitting criterion. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21
- 22. Fitting normal distribution: MAPFittingAs the name suggests we find the parameters which maximize theposterior probability ..Likelihood is normal PDF Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 22
- 23. Fitting normal distribution: MAPPriorUse conjugate prior, normal scaled inverse gamma. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 23
- 24. Fitting normal distribution: MAPLikelihood Prior Posterior Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 24
- 25. Fitting normal distribution: MAPAgain maximize the log – does not change position ofmaximum Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 25
- 26. Fitting normal distribution: MAPMAP solution:Mean can be rewritten as weighted sum of data mean andprior mean: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 26
- 27. Fitting normal distribution: MAP50 data points 5 data points 1 data points
- 28. Fitting normal: Bayesian approachFittingCompute the posterior distribution using Bayes’ rule:
- 29. Fitting normal: Bayesian approachFittingCompute the posterior distribution using Bayes’ rule:Two constants MUST cancel out or LHS not a valid pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 29
- 30. Fitting normal: Bayesian approachFittingCompute the posterior distribution using Bayes’ rule: where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 30
- 31. Fitting normal: Bayesian approachPredictive densityTake weighted sum of predictions from different parametervalues: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 31
- 32. Fitting normal: Bayesian approachPredictive densityTake weighted sum of predictions from different parametervalues: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 32
- 33. Fitting normal: Bayesian approachPredictive densityTake weighted sum of predictions from different parametervalues: where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 33
- 34. Fitting normal: Bayesian Approach 50 data points 5 data points 1 data points Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 34
- 35. Structure• Fitting probability distributions – Maximum likelihood – Maximum a posteriori – Bayesian approach• Worked example 1: Normal distribution• Worked example 2: Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 35
- 36. Categorical Distribution or can think of data as vector with all elements zero except kth e.g. [0,0,0,1 0] For short we write:Categorical distribution describes situation where K possibleoutcomes y=1… y=k.Takes a K parameters where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 36
- 37. Dirichlet DistributionDefined over K values where Or for short: Has k parameters k>0 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 37
- 38. Categorical distribution: MLMaximize product of individual likelihoods Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 38
- 39. Categorical distribution: MLInstead maximize the log probability Lagrange multiplier to ensure Log likelihood that params sum to oneTake derivative, set to zero and re-arrange: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 39
- 40. Categorical distribution: MAPMAP criterion: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 40
- 41. Categorical distribution: MAP Take derivative, set to zero and re-arrange:With a uniform prior ( 1..K=1), gives same result asmaximum likelihood. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 41
- 42. Categorical Distribution Five samples from priorObserved data Five samples from posterior Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 42
- 43. Categorical Distribution: Bayesian approachCompute posterior distribution over parameters:Two constants MUST cancel out or LHS not a valid pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 43
- 44. Categorical Distribution: Bayesian approachCompute predictive distribution:Two constants MUST cancel out or LHS not a valid pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 44
- 45. ML / MAP vs. Bayesian MAP/ML BayesianComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 45
- 46. Conclusion• Three ways to fit probability distributions • Maximum likelihood • Maximum a posteriori • Bayesian Approach• Two worked example • Normal distribution (ML least squares) • Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 46

No public clipboards found for this slide

Be the first to comment