SlideShare a Scribd company logo
Computer vision: models,
 learning and inference
            Chapter 4
   Fitting probability models


  Please send errata to s.prince@cs.ucl.ac.uk
Structure
• Fitting probability distributions
  – Maximum likelihood
  – Maximum a posteriori
  – Bayesian approach
• Worked example 1: Normal distribution
• Worked example 2: Categorical distribution



         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   2
Maximum Likelihood
As the name suggests we find the parameters under which the
data       is most likely.




We have assumed that data was independent (hence product)

Predictive Density:

Evaluate new data point                 under probability distribution
with best parameters
           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   3
Maximum a posteriori (MAP)
Fitting

As the name suggests we find the parameters which maximize the
posterior probability            .




Again we have assumed that data was independent
           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   4
Maximum a posteriori (MAP)
Fitting

As the name suggests we find the parameters which maximize the
posterior probability            .




Since the denominator doesn’t depend on the parameters we can
instead maximize



           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   5
Maximum a posteriori

Predictive Density:

Evaluate new data point                  under probability distribution
with MAP parameters




            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   6
Bayesian Approach
Fitting

Compute the posterior distribution over possible parameter
values using Bayes’ rule:




Principle: why pick one set of parameters? There are many values
that could have explained the data. Try to capture all of the
possibilities

           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   7
Bayesian Approach
Predictive Density

•   Each possible parameter value makes a prediction
•   Some parameters more probable than others




Make a prediction that is an infinite weighted sum (integral) of the
predictions for each parameter value, where weights are the
probabilities

            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   8
Predictive densities for 3 methods
Maximum likelihood:
Evaluate new data point                   under probability distribution
with ML parameters

Maximum a posteriori:
Evaluate new data point                   under probability distribution
with MAP parameters

Bayesian:
Calculate weighted sum of predictions from all possible value of
parameters

            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   9
Predictive densities for 3 methods

How to rationalize different forms?
Consider ML and MAP estimates as probability
distributions with zero probability everywhere except
at estimate (i.e. delta functions)




          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   10
Structure
• Fitting probability distributions
  – Maximum likelihood
  – Maximum a posteriori
  – Bayesian approach
• Worked example 1: Normal distribution
• Worked example 2: Categorical distribution



         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   11
Univariate Normal Distribution


For short we write:




                                                       Univariate normal distribution
                                                       describes single continuous
                                                       variable.

                                                       Takes 2 parameters              and   2>0

            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince          12
Normal Inverse Gamma Distribution
Defined on 2 variables                and        2>0




or for short

 Four parameters                           and




               Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   13
Fitting normal distribution: ML
As the name suggests we find the parameters under which the
data       is most likely.




Likelihood given by pdf




           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   14
Fitting normal distribution: ML




   Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   15
Fitting a normal distribution: ML




                                                             Plotted surface of likelihoods
                                                             as a function of possible
                                                             parameter values

                                                             ML Solution is at peak
    Computer vision: models, learning and inference. ©2011 Simon J.D. Prince                  16
Fitting normal distribution: ML
Algebraically:



where:


 or alternatively, we can maximize the logarithm




            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   17
Why the logarithm?




The logarithm is a monotonic transformation.

Hence, the position of the peak stays in the same place

But the log likelihood is easier to work with
          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   18
Fitting normal distribution: ML



How to maximize a function? Take derivative and set to zero.




Solution:



            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   19
Fitting normal distribution: ML

Maximum likelihood solution:




Should look familiar!

           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   20
Least Squares
Maximum likelihood for the normal distribution...




 ...gives `least squares’ fitting criterion.

             Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   21
Fitting normal distribution: MAP
Fitting

As the name suggests we find the parameters which maximize the
posterior probability           ..




Likelihood is normal PDF



            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   22
Fitting normal distribution: MAP
Prior
Use conjugate prior, normal scaled inverse gamma.




           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   23
Fitting normal distribution: MAP




Likelihood                                   Prior                                  Posterior




         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince               24
Fitting normal distribution: MAP




Again maximize the log – does not change position of
maximum




           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   25
Fitting normal distribution: MAP
MAP solution:




Mean can be rewritten as weighted sum of data mean and
prior mean:




          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   26
Fitting normal distribution: MAP




50 data points   5 data points   1 data points
Fitting normal: Bayesian approach
Fitting
Compute the posterior distribution using Bayes’ rule:
Fitting normal: Bayesian approach
Fitting
Compute the posterior distribution using Bayes’ rule:




Two constants MUST cancel out or LHS not a valid pdf
            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   29
Fitting normal: Bayesian approach
Fitting
Compute the posterior distribution using Bayes’ rule:




 where




            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   30
Fitting normal: Bayesian approach
Predictive density
Take weighted sum of predictions from different parameter
values:




            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   31
Fitting normal: Bayesian approach
Predictive density
Take weighted sum of predictions from different parameter
values:




            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   32
Fitting normal: Bayesian approach
Predictive density
Take weighted sum of predictions from different parameter
values:




     where




             Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   33
Fitting normal: Bayesian Approach




  50 data points                          5 data points                               1 data points
           Computer vision: models, learning and inference. ©2011 Simon J.D. Prince                   34
Structure
• Fitting probability distributions
  – Maximum likelihood
  – Maximum a posteriori
  – Bayesian approach
• Worked example 1: Normal distribution
• Worked example 2: Categorical distribution



         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   35
Categorical Distribution

                                                         or can think of data as vector with all
                                                         elements zero except kth e.g. [0,0,0,1 0]




                                                         For short we write:



Categorical distribution describes situation where K possible
outcomes y=1… y=k.
Takes a K parameters                  where

              Computer vision: models, learning and inference. ©2011 Simon J.D. Prince           36
Dirichlet Distribution
Defined over K values                                     where




 Or for short:                                                                              Has k
                                                                                            parameters    k>0




                 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince            37
Categorical distribution: ML
Maximize product of individual likelihoods




            Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   38
Categorical distribution: ML
Instead maximize the log probability




                                                     Lagrange multiplier to ensure
 Log likelihood
                                                       that params sum to one

Take derivative, set to zero and re-arrange:



         Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   39
Categorical distribution: MAP
MAP criterion:




        Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   40
Categorical distribution: MAP

 Take derivative, set to zero and re-arrange:




With a uniform prior (              1..K=1),      gives same result as
maximum likelihood.




        Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   41
Categorical Distribution
                                                      Five samples from prior




Observed data

                                                      Five samples from posterior

                Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   42
Categorical Distribution:
            Bayesian approach
Compute posterior distribution over parameters:




Two constants MUST cancel out or LHS not a valid pdf

          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   43
Categorical Distribution:
            Bayesian approach
Compute predictive distribution:




Two constants MUST cancel out or LHS not a valid pdf



          Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   44
ML / MAP vs. Bayesian




                               MAP/ML                                      Bayesian
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince              45
Conclusion
• Three ways to fit probability distributions
   • Maximum likelihood
   • Maximum a posteriori
   • Bayesian Approach

• Two worked example
   • Normal distribution (ML least squares)
   • Categorical distribution



       Computer vision: models, learning and inference. ©2011 Simon J.D. Prince   46

More Related Content

Viewers also liked

Fcv cross hebert
Fcv cross hebertFcv cross hebert
Fcv cross hebertzukun
 
Fcv bio cv_poggio
Fcv bio cv_poggioFcv bio cv_poggio
Fcv bio cv_poggiozukun
 
Fcv appli science_perona
Fcv appli science_peronaFcv appli science_perona
Fcv appli science_peronazukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
Fcv core sawhney
Fcv core sawhneyFcv core sawhney
Fcv core sawhneyzukun
 
Fcv hum mach_grauman
Fcv hum mach_graumanFcv hum mach_grauman
Fcv hum mach_graumanzukun
 
Fcv acad ind_szeliski
Fcv acad ind_szeliskiFcv acad ind_szeliski
Fcv acad ind_szeliskizukun
 
Fcv bio cv_cottrell
Fcv bio cv_cottrellFcv bio cv_cottrell
Fcv bio cv_cottrellzukun
 
01 history of cv welcome
01  history of cv welcome01  history of cv welcome
01 history of cv welcomezukun
 

Viewers also liked (9)

Fcv cross hebert
Fcv cross hebertFcv cross hebert
Fcv cross hebert
 
Fcv bio cv_poggio
Fcv bio cv_poggioFcv bio cv_poggio
Fcv bio cv_poggio
 
Fcv appli science_perona
Fcv appli science_peronaFcv appli science_perona
Fcv appli science_perona
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
Fcv core sawhney
Fcv core sawhneyFcv core sawhney
Fcv core sawhney
 
Fcv hum mach_grauman
Fcv hum mach_graumanFcv hum mach_grauman
Fcv hum mach_grauman
 
Fcv acad ind_szeliski
Fcv acad ind_szeliskiFcv acad ind_szeliski
Fcv acad ind_szeliski
 
Fcv bio cv_cottrell
Fcv bio cv_cottrellFcv bio cv_cottrell
Fcv bio cv_cottrell
 
01 history of cv welcome
01  history of cv welcome01  history of cv welcome
01 history of cv welcome
 

Similar to 04 cv mil_fitting_probability_models

08 cv mil_regression
08 cv mil_regression08 cv mil_regression
08 cv mil_regressionzukun
 
17 cv mil_models_for_shape
17 cv mil_models_for_shape17 cv mil_models_for_shape
17 cv mil_models_for_shapezukun
 
Common Probability Distibution
Common Probability DistibutionCommon Probability Distibution
Common Probability DistibutionLukas Tencer
 
02 cv mil_intro_to_probability
02 cv mil_intro_to_probability02 cv mil_intro_to_probability
02 cv mil_intro_to_probabilityzukun
 
Introduction to Probability
Introduction to ProbabilityIntroduction to Probability
Introduction to ProbabilityLukas Tencer
 
16 cv mil_multiple_cameras
16 cv mil_multiple_cameras16 cv mil_multiple_cameras
16 cv mil_multiple_cameraszukun
 
15 cv mil_models_for_transformations
15 cv mil_models_for_transformations15 cv mil_models_for_transformations
15 cv mil_models_for_transformationszukun
 
20 cv mil_models_for_words
20 cv mil_models_for_words20 cv mil_models_for_words
20 cv mil_models_for_wordszukun
 
06 cv mil_learning_and_inference
06 cv mil_learning_and_inference06 cv mil_learning_and_inference
06 cv mil_learning_and_inferencezukun
 
Graphical Models for chains, trees and grids
Graphical Models for chains, trees and gridsGraphical Models for chains, trees and grids
Graphical Models for chains, trees and gridspotaters
 
05 cv mil_normal_distribution
05 cv mil_normal_distribution05 cv mil_normal_distribution
05 cv mil_normal_distributionzukun
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithmsShalitha Suranga
 
14 cv mil_the_pinhole_camera
14 cv mil_the_pinhole_camera14 cv mil_the_pinhole_camera
14 cv mil_the_pinhole_camerazukun
 
04 Fitting Probability Models ノート
04 Fitting Probability Models ノート04 Fitting Probability Models ノート
04 Fitting Probability Models ノートGen Sasaki
 
Car Accident Severity Report
Car Accident Severity ReportCar Accident Severity Report
Car Accident Severity ReportFaizan Hussain
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Robust Design And Variation Reduction Using DiscoverSim
Robust Design And Variation Reduction Using DiscoverSimRobust Design And Variation Reduction Using DiscoverSim
Robust Design And Variation Reduction Using DiscoverSimJohnNoguera
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
19 cv mil_temporal_models
19 cv mil_temporal_models19 cv mil_temporal_models
19 cv mil_temporal_modelszukun
 

Similar to 04 cv mil_fitting_probability_models (20)

08 cv mil_regression
08 cv mil_regression08 cv mil_regression
08 cv mil_regression
 
17 cv mil_models_for_shape
17 cv mil_models_for_shape17 cv mil_models_for_shape
17 cv mil_models_for_shape
 
Common Probability Distibution
Common Probability DistibutionCommon Probability Distibution
Common Probability Distibution
 
02 cv mil_intro_to_probability
02 cv mil_intro_to_probability02 cv mil_intro_to_probability
02 cv mil_intro_to_probability
 
Introduction to Probability
Introduction to ProbabilityIntroduction to Probability
Introduction to Probability
 
16 cv mil_multiple_cameras
16 cv mil_multiple_cameras16 cv mil_multiple_cameras
16 cv mil_multiple_cameras
 
15 cv mil_models_for_transformations
15 cv mil_models_for_transformations15 cv mil_models_for_transformations
15 cv mil_models_for_transformations
 
20 cv mil_models_for_words
20 cv mil_models_for_words20 cv mil_models_for_words
20 cv mil_models_for_words
 
06 cv mil_learning_and_inference
06 cv mil_learning_and_inference06 cv mil_learning_and_inference
06 cv mil_learning_and_inference
 
Graphical Models for chains, trees and grids
Graphical Models for chains, trees and gridsGraphical Models for chains, trees and grids
Graphical Models for chains, trees and grids
 
05 cv mil_normal_distribution
05 cv mil_normal_distribution05 cv mil_normal_distribution
05 cv mil_normal_distribution
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithms
 
14 cv mil_the_pinhole_camera
14 cv mil_the_pinhole_camera14 cv mil_the_pinhole_camera
14 cv mil_the_pinhole_camera
 
04 Fitting Probability Models ノート
04 Fitting Probability Models ノート04 Fitting Probability Models ノート
04 Fitting Probability Models ノート
 
Car Accident Severity Report
Car Accident Severity ReportCar Accident Severity Report
Car Accident Severity Report
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
 
Robust Design And Variation Reduction Using DiscoverSim
Robust Design And Variation Reduction Using DiscoverSimRobust Design And Variation Reduction Using DiscoverSim
Robust Design And Variation Reduction Using DiscoverSim
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
19 cv mil_temporal_models
19 cv mil_temporal_models19 cv mil_temporal_models
19 cv mil_temporal_models
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featureszukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
 

Recently uploaded

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 

Recently uploaded (20)

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 

04 cv mil_fitting_probability_models

  • 1. Computer vision: models, learning and inference Chapter 4 Fitting probability models Please send errata to s.prince@cs.ucl.ac.uk
  • 2. Structure • Fitting probability distributions – Maximum likelihood – Maximum a posteriori – Bayesian approach • Worked example 1: Normal distribution • Worked example 2: Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 2
  • 3. Maximum Likelihood As the name suggests we find the parameters under which the data is most likely. We have assumed that data was independent (hence product) Predictive Density: Evaluate new data point under probability distribution with best parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 3
  • 4. Maximum a posteriori (MAP) Fitting As the name suggests we find the parameters which maximize the posterior probability . Again we have assumed that data was independent Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 4
  • 5. Maximum a posteriori (MAP) Fitting As the name suggests we find the parameters which maximize the posterior probability . Since the denominator doesn’t depend on the parameters we can instead maximize Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 5
  • 6. Maximum a posteriori Predictive Density: Evaluate new data point under probability distribution with MAP parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 6
  • 7. Bayesian Approach Fitting Compute the posterior distribution over possible parameter values using Bayes’ rule: Principle: why pick one set of parameters? There are many values that could have explained the data. Try to capture all of the possibilities Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 7
  • 8. Bayesian Approach Predictive Density • Each possible parameter value makes a prediction • Some parameters more probable than others Make a prediction that is an infinite weighted sum (integral) of the predictions for each parameter value, where weights are the probabilities Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 8
  • 9. Predictive densities for 3 methods Maximum likelihood: Evaluate new data point under probability distribution with ML parameters Maximum a posteriori: Evaluate new data point under probability distribution with MAP parameters Bayesian: Calculate weighted sum of predictions from all possible value of parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 9
  • 10. Predictive densities for 3 methods How to rationalize different forms? Consider ML and MAP estimates as probability distributions with zero probability everywhere except at estimate (i.e. delta functions) Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 10
  • 11. Structure • Fitting probability distributions – Maximum likelihood – Maximum a posteriori – Bayesian approach • Worked example 1: Normal distribution • Worked example 2: Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 11
  • 12. Univariate Normal Distribution For short we write: Univariate normal distribution describes single continuous variable. Takes 2 parameters and 2>0 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 12
  • 13. Normal Inverse Gamma Distribution Defined on 2 variables and 2>0 or for short Four parameters and Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 13
  • 14. Fitting normal distribution: ML As the name suggests we find the parameters under which the data is most likely. Likelihood given by pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 14
  • 15. Fitting normal distribution: ML Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 15
  • 16. Fitting a normal distribution: ML Plotted surface of likelihoods as a function of possible parameter values ML Solution is at peak Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
  • 17. Fitting normal distribution: ML Algebraically: where: or alternatively, we can maximize the logarithm Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17
  • 18. Why the logarithm? The logarithm is a monotonic transformation. Hence, the position of the peak stays in the same place But the log likelihood is easier to work with Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 18
  • 19. Fitting normal distribution: ML How to maximize a function? Take derivative and set to zero. Solution: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19
  • 20. Fitting normal distribution: ML Maximum likelihood solution: Should look familiar! Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 20
  • 21. Least Squares Maximum likelihood for the normal distribution... ...gives `least squares’ fitting criterion. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21
  • 22. Fitting normal distribution: MAP Fitting As the name suggests we find the parameters which maximize the posterior probability .. Likelihood is normal PDF Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 22
  • 23. Fitting normal distribution: MAP Prior Use conjugate prior, normal scaled inverse gamma. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 23
  • 24. Fitting normal distribution: MAP Likelihood Prior Posterior Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 24
  • 25. Fitting normal distribution: MAP Again maximize the log – does not change position of maximum Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 25
  • 26. Fitting normal distribution: MAP MAP solution: Mean can be rewritten as weighted sum of data mean and prior mean: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 26
  • 27. Fitting normal distribution: MAP 50 data points 5 data points 1 data points
  • 28. Fitting normal: Bayesian approach Fitting Compute the posterior distribution using Bayes’ rule:
  • 29. Fitting normal: Bayesian approach Fitting Compute the posterior distribution using Bayes’ rule: Two constants MUST cancel out or LHS not a valid pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 29
  • 30. Fitting normal: Bayesian approach Fitting Compute the posterior distribution using Bayes’ rule: where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 30
  • 31. Fitting normal: Bayesian approach Predictive density Take weighted sum of predictions from different parameter values: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 31
  • 32. Fitting normal: Bayesian approach Predictive density Take weighted sum of predictions from different parameter values: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 32
  • 33. Fitting normal: Bayesian approach Predictive density Take weighted sum of predictions from different parameter values: where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 33
  • 34. Fitting normal: Bayesian Approach 50 data points 5 data points 1 data points Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 34
  • 35. Structure • Fitting probability distributions – Maximum likelihood – Maximum a posteriori – Bayesian approach • Worked example 1: Normal distribution • Worked example 2: Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 35
  • 36. Categorical Distribution or can think of data as vector with all elements zero except kth e.g. [0,0,0,1 0] For short we write: Categorical distribution describes situation where K possible outcomes y=1… y=k. Takes a K parameters where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 36
  • 37. Dirichlet Distribution Defined over K values where Or for short: Has k parameters k>0 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 37
  • 38. Categorical distribution: ML Maximize product of individual likelihoods Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 38
  • 39. Categorical distribution: ML Instead maximize the log probability Lagrange multiplier to ensure Log likelihood that params sum to one Take derivative, set to zero and re-arrange: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 39
  • 40. Categorical distribution: MAP MAP criterion: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 40
  • 41. Categorical distribution: MAP Take derivative, set to zero and re-arrange: With a uniform prior ( 1..K=1), gives same result as maximum likelihood. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 41
  • 42. Categorical Distribution Five samples from prior Observed data Five samples from posterior Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 42
  • 43. Categorical Distribution: Bayesian approach Compute posterior distribution over parameters: Two constants MUST cancel out or LHS not a valid pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 43
  • 44. Categorical Distribution: Bayesian approach Compute predictive distribution: Two constants MUST cancel out or LHS not a valid pdf Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 44
  • 45. ML / MAP vs. Bayesian MAP/ML Bayesian Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 45
  • 46. Conclusion • Three ways to fit probability distributions • Maximum likelihood • Maximum a posteriori • Bayesian Approach • Two worked example • Normal distribution (ML least squares) • Categorical distribution Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 46