Upcoming SlideShare
×

# 04 cv mil_fitting_probability_models

276 views

Published on

Published in: Technology, Education
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
276
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
2
0
Likes
1
Embeds 0
No embeds

No notes for slide

### 04 cv mil_fitting_probability_models

1. 1. Computer vision: models, learning and inference Chapter 4 Fitting probability models Please send errata to s.prince@cs.ucl.ac.uk
2. 2. Structure• Fitting probability distributions – Maximum likelihood – Maximum a posteriori – Bayesian approach• Worked example 1: Normal distribution• Worked example 2: Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 2
3. 3. Maximum LikelihoodAs the name suggests we find the parameters under which thedata is most likely.We have assumed that data was independent (hence product)Predictive Density:Evaluate new data point under probability distributionwith best parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 3
4. 4. Maximum a posteriori (MAP)FittingAs the name suggests we find the parameters which maximize theposterior probability .Again we have assumed that data was independent Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 4
5. 5. Maximum a posteriori (MAP)FittingAs the name suggests we find the parameters which maximize theposterior probability .Since the denominator doesn’t depend on the parameters we caninstead maximize Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 5
6. 6. Maximum a posterioriPredictive Density:Evaluate new data point under probability distributionwith MAP parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 6
7. 7. Bayesian ApproachFittingCompute the posterior distribution over possible parametervalues using Bayes’ rule:Principle: why pick one set of parameters? There are many valuesthat could have explained the data. Try to capture all of thepossibilities Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 7
8. 8. Bayesian ApproachPredictive Density• Each possible parameter value makes a prediction• Some parameters more probable than othersMake a prediction that is an infinite weighted sum (integral) of thepredictions for each parameter value, where weights are theprobabilities Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 8
9. 9. Predictive densities for 3 methodsMaximum likelihood:Evaluate new data point under probability distributionwith ML parametersMaximum a posteriori:Evaluate new data point under probability distributionwith MAP parametersBayesian:Calculate weighted sum of predictions from all possible value ofparameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 9
10. 10. Predictive densities for 3 methodsHow to rationalize different forms?Consider ML and MAP estimates as probabilitydistributions with zero probability everywhere exceptat estimate (i.e. delta functions) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 10
11. 11. Structure• Fitting probability distributions – Maximum likelihood – Maximum a posteriori – Bayesian approach• Worked example 1: Normal distribution• Worked example 2: Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 11
12. 12. Univariate Normal DistributionFor short we write: Univariate normal distribution describes single continuous variable. Takes 2 parameters and 2>0 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 12
13. 13. Normal Inverse Gamma DistributionDefined on 2 variables and 2>0or for short Four parameters and Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 13
14. 14. Fitting normal distribution: MLAs the name suggests we find the parameters under which thedata is most likely.Likelihood given by pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 14
15. 15. Fitting normal distribution: ML Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 15
16. 16. Fitting a normal distribution: ML Plotted surface of likelihoods as a function of possible parameter values ML Solution is at peak Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
17. 17. Fitting normal distribution: MLAlgebraically:where: or alternatively, we can maximize the logarithm Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17
18. 18. Why the logarithm?The logarithm is a monotonic transformation.Hence, the position of the peak stays in the same placeBut the log likelihood is easier to work with Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 18
19. 19. Fitting normal distribution: MLHow to maximize a function? Take derivative and set to zero.Solution: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19
20. 20. Fitting normal distribution: MLMaximum likelihood solution:Should look familiar! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 20
21. 21. Least SquaresMaximum likelihood for the normal distribution... ...gives `least squares’ fitting criterion. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21
22. 22. Fitting normal distribution: MAPFittingAs the name suggests we find the parameters which maximize theposterior probability ..Likelihood is normal PDF Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 22
23. 23. Fitting normal distribution: MAPPriorUse conjugate prior, normal scaled inverse gamma. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 23
24. 24. Fitting normal distribution: MAPLikelihood Prior Posterior Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 24
25. 25. Fitting normal distribution: MAPAgain maximize the log – does not change position ofmaximum Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 25
26. 26. Fitting normal distribution: MAPMAP solution:Mean can be rewritten as weighted sum of data mean andprior mean: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 26
27. 27. Fitting normal distribution: MAP50 data points 5 data points 1 data points
28. 28. Fitting normal: Bayesian approachFittingCompute the posterior distribution using Bayes’ rule:
29. 29. Fitting normal: Bayesian approachFittingCompute the posterior distribution using Bayes’ rule:Two constants MUST cancel out or LHS not a valid pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 29
30. 30. Fitting normal: Bayesian approachFittingCompute the posterior distribution using Bayes’ rule: where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 30
31. 31. Fitting normal: Bayesian approachPredictive densityTake weighted sum of predictions from different parametervalues: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 31
32. 32. Fitting normal: Bayesian approachPredictive densityTake weighted sum of predictions from different parametervalues: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 32
33. 33. Fitting normal: Bayesian approachPredictive densityTake weighted sum of predictions from different parametervalues: where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 33
34. 34. Fitting normal: Bayesian Approach 50 data points 5 data points 1 data points Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 34
35. 35. Structure• Fitting probability distributions – Maximum likelihood – Maximum a posteriori – Bayesian approach• Worked example 1: Normal distribution• Worked example 2: Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 35
36. 36. Categorical Distribution or can think of data as vector with all elements zero except kth e.g. [0,0,0,1 0] For short we write:Categorical distribution describes situation where K possibleoutcomes y=1… y=k.Takes a K parameters where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 36
37. 37. Dirichlet DistributionDefined over K values where Or for short: Has k parameters k>0 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 37
38. 38. Categorical distribution: MLMaximize product of individual likelihoods Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 38
39. 39. Categorical distribution: MLInstead maximize the log probability Lagrange multiplier to ensure Log likelihood that params sum to oneTake derivative, set to zero and re-arrange: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 39
40. 40. Categorical distribution: MAPMAP criterion: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 40
41. 41. Categorical distribution: MAP Take derivative, set to zero and re-arrange:With a uniform prior ( 1..K=1), gives same result asmaximum likelihood. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 41
42. 42. Categorical Distribution Five samples from priorObserved data Five samples from posterior Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 42
43. 43. Categorical Distribution: Bayesian approachCompute posterior distribution over parameters:Two constants MUST cancel out or LHS not a valid pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 43
44. 44. Categorical Distribution: Bayesian approachCompute predictive distribution:Two constants MUST cancel out or LHS not a valid pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 44
45. 45. ML / MAP vs. Bayesian MAP/ML BayesianComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 45
46. 46. Conclusion• Three ways to fit probability distributions • Maximum likelihood • Maximum a posteriori • Bayesian Approach• Two worked example • Normal distribution (ML least squares) • Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 46