04 Fitting Probability Models ノート

Computer vision: models, learning
and inference
Chapter 4
Fitting Probability Models
Iihs 'RE to a 7h 7 Ta -
- 7
"
no ; t -
7 Nyah

Structure
22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Fitting probability distributions
– Maximum likelihood
– Maximum a posteriori
– Bayesian approach
• Worked example 1: Normal distribution
• Worked example 2: Categorical distribution
IEEE HETE
* Kkk
.TK#.EEakfKzTeMAPfhEFer--k7i

Fitting: As the name suggests: find the parameters under which
the data are most likely:
Maximum Likelihood
Predictive Density:
Evaluate new data point under probability distribution
with best parameters
We have assumed that data was independent (hence product)
E
EEE Fastest
E- A to
"
FL 's K2 ,
¥7 E E
'
E 's cc ,
no 5 x - A 2 Fi K2
43-4 Let "
I
"
-
9- Xi knot lift I
"
-
A a no it -7
nm
is # . a EFE It Btp a En a the Er I f "
.
qi fax
c- = a ti Fk z FIE E3 3 Hit -
A O
Xie G . . . my t Ez.
-422 . THE a
Fifa I B Fpa a
.
It
-
A Efik 3 .
Fi -2 THEIR't 2
. '
I s .
= IF FEE tha Fa's
th
sa
Xi X s
Xgx5¥
'
ok . Xx
I
"
-
'
7 X ,
i 's XI It I 2 9K¥ 71 a 2
"
FEg
. te et 3 .
7¥,
airs E -
'
715 's lit t -
A a HE FINE I te tu la ES .
Eth u u
T
"
-
A X
*
zi IKEA
u ta tf 3¥ Et F H2o

Maximum a posteriori (MAP)
Fitting
As the name suggests we find the parameters which maximize the
posterior probability .
Again we have assumed that data was independent
¥4 .EE#iEEF-kf5EBeCMAP7Bahe )
E
"
-7 F
"
-7 Xi . . . I 8 "E¥5kE2E9N3t -
A On 7¥64 ft EFFI '¥
Xi their .EE#.EIEutebEa4iz
¥43. -434¥.NET/EFIQTktqa=4evE3.ETEiHEFEtthUtt
¥.EE#zzfkgzgz-*g1~hH9dtEtE
Its Tufts # 8ft CHE 'ELE .
¥¥yEfE¥
÷¥t¥÷¥.
j÷÷:÷*⇒-
J
E
"
-7k
"
3kV KARLIE "2
KEEN -322
.fi/Tfc23krakn#X I
"
r
ask.at#GfpaFEakAzI
-
FEEL 5th So
X ,
X¢XsXsX5#k8
49 Xo

Fitting
posterior probability .
Since the denominator doesn’t depend on the parameters we can
instead maximize I¥¥¥÷÷::::in
' '
fat HE .
¥2 E
'
fkIkHEE¥¥IIPnf Xi I o I E
' '
et Fi ; E x
"
,
Ma p fFE If it Z the k
Ef FM I TG a
"
7 u 2 a 3 .

Predictive Density:
with MAP parameters
It I Fel
-
k a 5 t
-
A v Effi u n T
"
-
7 E
'
I 4 HIT t 3

Bayesian Approach
Fitting
Compute the posterior distribution over possible parameter values
using Bayes’ rule:
Principle: why pick one set of parameters? There are many values
that could have explained the data. Try to capture all of the
possibilities
N
'
il 2
' '
HE II
n
"
h K HE ¥ a Fg in Y
TI E
"
Ikki HE ft F M A P FAE ¥ a
F 's E it 3 t
-
A Z -
7 k Ik I 3 He II gu Ta 3 a 5 ? Y
'
W H I " -7 t -
A a 7 u
2
E ¥4 Katie Ef low 54 YI v I 3 a Fi
'
a
-
I ,
ETHKK Itt ' '
a e -
Fi s 's .
E
"
-
A b
"
it it -
7 a ¥47297548 Top E E 2 It 97¥ to 3 .

Bayesian Approach
Predictive Density
• Each possible parameter value makes a prediction
• Some parameters more probable than others
Make a prediction that is an infinite weighted sum (integral) of the
predictions for each parameter value, where weights are the
probabilities
q
.
.
a A E ¥ KI ETE It Et ¥ , .
a
- -
¥44 He E TE KK FIE Et EETFizEe e n A p HE ¥2 "
t .
that Ec Eli 3 x -
F O
I ok I 7
"
t Hike "
K
iz
Has me Fitz ¥ I tip is 's ABE s in z x
*
a qq.az 3¥ E
. .
y .
N I t -
A 0 n El Kk KITE ¥
#
I TEH h't 4 c 2 Fauna 3 .
Itis E 5432
fPrCxxdoR7
X
*
209 An a HE Top
u KML i Q q .
Ffa Be Tg 3 a Each I , 2 O X
"
HA L2 .
I a 8 Ff o a I I .
-
EE4 ft IT 4k

Predictive densities for 3 methods
Maximum a posteriori:
with MAP parameters
Maximum likelihood:
with ML parameters
Bayesian:
Calculate weighted sum of predictions from all possible values of
parameters
'
ETEKIAZFE
-
¥hkEfe¥EEtHE¥ ( MAP )
-
ftp.#IKe2MAPHEEetE3HAtIh*mAnci
.
N km

How to rationalize different forms?
Consider ML and MAP estimates as probability
distributions with zero probability everywhere except at
estimate (i.e. delta functions)
Predictive densities for 3 methods
EEEHE Heh M A p HEBe it t k 7 " 793£ a
*¥ Fit
'
I 73g E- a KEI 's K I .
f a 48996k¥Ef -
or
'
E a 2 E ex
'
H -
n
.
-
FTW Th# EET MA P TAEK E hav

Structure

Univariate Normal Distribution
For short we write:
Univariate normal distribution
describes single continuous
variable.
Takes 2 parameters m and s2>0
EEE Sfp
↳ Iz Aka
µ o
'
m Ms

Normal Inverse Gamma
DistributionDefined on 2 variables m and s2>0
or for short
Four parameters a,b,g > 0 and d.
9h24 "
'
r 28 tip
x or d
bae g m

Ready?
• Approach the same problem 3 different ways:
– Learn ML parameters
– Learn MAP parameters
– Learn Bayesian distribution of parameters
• Will we get the same results?

As the name suggests we find the parameters under which the data
is most likely.
Fitting normal distribution: ML
Likelihood given by
pdf
III fish a I EYE b TG a les t -
A FETE
I
"
-
A X I u .
X I
am I l 's th E 2 I
,
ci i f -
F O om h '
Fer II e
t - I C of I F b of co 3 t -
F O E Fei K 3 .
'
te n I
"
-
A of AKI
94 a T TAE aw fi F L .
Y
FEEL'
Ik a -
HE Eng
-
- It EES-4 g t g t
-
a Ten HSE .
* .
D- it
m , o
-
the I
' '
-
A D - -
Kc
-
-
d 3 .
<

rake
-
¥
"
-
o
'
a
4Th
( Fife ) it #eEH¥ Ese .
I
-
L
I FEI k s is -5 Hated ME # La IEEE "
or 's , TE -82295 a FA
C a hi Fax "
IIe 't EE u .

Fitting a normal distribution: ML
Plotted surface of likelihoods
as a function of possible
parameter values
ML Solution is at peak
fifth a. y I - LEED
-
-
- - - - - - - -
- - - -
w .
-
"
"
!
( vs t -
7 MIO 'D "
Ink 'Ea2E .
Ezek '
Feb "
FE -
i
.

Algebraically
:
or alternatively, we can maximize the
logarithm
where:
Etta
me€
⇐ " '
'
B
Ff Eknath
" -
FIERI Fk
FIE Du 412 E 7J I .
I E.
ftp.EKPH#kaIF7L3
.

Why the logarithm?
The logarithm is a monotonic transformation.
Hence, the position of the peak stays in the same place
But the log likelihood is easier to work with
at E-
'
Ft Eh E EH , 2 E f u
a a ?
It ¥42 EH, 2E he N -
PEHEt to ' B 's I . .
.
Fy :p # than a
PEEK

How to maximize a function? Take derivative and equate to zero.
Solution
:
444,2 .tt#NEME-Ek3fIx-ta-gz$
-
?
Initiates
-53
HEIKE kB 417 # to "
On e'ZEIT LERETTE)
Likelihood 'Ll "
@
EFES .
E' Fat 'HEka=u .
I
hEt4Eh
14779¥ 't . H 's .

Maximum likelihood solution:
Should look
familiar!
4th
bitts

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Least Squares
2323
Maximum likelihood for the normal distribution...
...gives `least squares’ fitting criterion.
R
Fak = FEEL
IEDS tip IEEE '
II
a if¥7 '
E H i3 E on ¥34 - =
# Eh
↳ IKE Ethic.
EEE B Ep z f Kiku . BIKE % I Fi Fff to CT I s a 5. Es.
J M k -7 ul Ii K F- u a I
' '
.mx
"
2¥18of ¥e.
FA
ur
-
II,
tri -
ni EILERT 389362 .
I.,
Lai -
n )
'
z Bit -
ke -92 PHOEBE a Me u .
.
-
IN a -5¥ Eh

Fitting normal distribution: MAP
Fitting
posterior probability ..
Likelihood is normal PDF
MAPFETEV.EE/I8FpnN5t -
775¥52
-fifth ¥8159
Ghee .tt#KBFpnfi5f-9FEEItaz. .
FACIE
.LI/3FpI#KBEp9EEiKEfFxsJ9z..Taz- Fasti 't 28 TGZhK 's .

Prior
Use conjugate prior, normal scaled inverse gamma.
,
i
g t -
A d .
R ,
T
,
8 I 't Lk
4h17 "
t ? B Fp 2 E a u a o
'
a inHas It 8 The
Mega t IRYaa
u
.
y
Fifth
to
"
Ya u

Likelihoo
d
Prior Posterior
-¥&nkTmHu
"
Eiti -K±¥
E- A TETI
"
-7
MAP 't Asha
Extensive
.
zakat "¥tk
Maximum Prior
Ed'EyTeTAs¥E¥xknE.
d x =
tiFe Effigy 424£59
-
ti 'Fk2¥Ey8fpa' ftp.547aylffpaci-t-ac-ttass-vkfk3
-
-
£¥n£kK5fx ¥4231 '
Ifta -4hL
# =agq

Again maximize the log – does not change position of
maximum
| It aft a
-
F Hav If Ea q IRL a . hi Ee e
Ed Kl 5-4 n FIE Eu,
-
te men
tfto
E723 .
{ the
"
K HE It Ei E "
IT It FI Y th it
"
F - i
.
( i 3 t
-
A µ E o
'
e s u 2 3 42
"
K TEHKkk I a ⇐ o ) 2 T I 2 .
ik no -
I
' '
n I
'
I
-
-
z 2
Tifft et TEE I a a BE A #

MAP
solution:
Mean can be rewritten as weighted sum of data mean and
prior mean:
A ,
B ,
r
,
or it ¥ Fang tip a
N I t -
A
(
¥a¥si¥ EEE #a
-

50 data points 5 data points 1 data points
I
"
-
A an E CIL Ta 3 E EEE'
fAs Erbe E
MAP FSE the 9¥ it STL 7
"
C o
F- 7 a
"
Htt uE¥¥18 The k£3 .
my mm
me
FIEND9
q
.
.
.
a * LEE ex
-
Kc 2 E MAP HE 'Et7¥E
MAP HE Ekka
42k£ Ffa
Fifa
&
"
-
Ma
'
"
qq.ws#Fay 43¥ £4 E I
"
-
A E KI At to do E
"
-
A a- 4 Hi e . EEK 48 TG re f t .
F- A 6 "
1k¥ use
-
In z Tiff on
. .
It ¥27.
E
'
a ,
Retake
"
Iet'¥
2- E Hu 155 .
MA PFE feet Est 4¥ .

Fitting normal: Bayesian approach
Fitting
Compute the posterior distribution using Bayes’ rule:
IEEE Fp
nee
-7 t
-
7 Zakk .
-4¥ '¥
a anutttaso . -
¥ :*
.int#hkEFEEE8Fp-EfiK3avtt2tc
.
Ern -447k¥ '¥zi¥AuI
's
.
¥f¥Et
.rs/f*Fttc2Ic.FE4#tAfEItiK3
to ¥ x
EITHEEEEi.TK=
Elektra '¥ 's -4

Fitting
Two constants MUST cancel out or LHS not a valid pdf
E.fi/iz-EfazTE8Ff
list -
Art # Effy
d
m-FBEEN.tt#heIthf*Ts3 -
F¥HuIk43Ehe¥FqEt NEFJJ ITEVE 's .
an -4391=4445*40 "
-42mL
Titch .
I ¥229 # HIT
SEALE 's .

Fitting
where

Predictive density
Take weighted sum of predictions from different parameter values:
Posterior Samples from posterior
-
-
-
-
game
it
Era no 5 t -
79 ¥ Fay
gig 2- EI Htt at oh FAIR
CETI's I
FIH ft t C 2 FEES LEE a
⇒ I se ft E 4th
l ,

Predictive density
-
me 06
"
Ez Ek lust n ol
-
'
s EES a F tea £2 .
-
l k 2L I

Predictive density
where

Fitting normal: Bayesian Approach
50 data points 5 data points 1 data points
77¥22 Not
"
ER,
t.IE
MAP
v Hi '
et m AP HETE is Ect
-
2 ten I
"
-
A Eu ZEAL IFE IT a -
u .
I
"
-
F ka 472 u 2 MAP HE FEI REEFS at et Tu .

Structure

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Categorical Distribution
or can think of data as vector with all
elements zero except kth e.g. [0,0,0,1 0]
For short we write:
Categorical distribution describes situation where K possible
outcomes y=1… y=k.
Takes K parameters where
38
4 I I
" '
I 71 a 95 Fp
*TETE EE IK 2 k 3 " K k Fo FEET a k sac
.
In d .
Ffa IT it AI Te t e I .

Dirichlet Distribution
Defined over K values where
Or for short: Has k
parameters ak>0

Categorical distribution: ML
Maximize product of individual
likelihoods
(remember, P(x) = )
Nk = # times we
observed bin k
$ I a
"
4 a
a 84 z EFFETE he

Categorical distribution: ML
Instead maximize the log probability
Log
likelihood
Lagrange multiplier to ensure
that params sum to one
Take derivative, set to zero and re-arrange:
Equity # H
-
n
¥tERE¥ I -
917
"
-7 '-E2¥E¥f
=0kK3n .
Ff
'
dear
2- k¥4553
24430
( 72 am
"
-7412kt
KEE Ata
HHtlFH2 .

Categorical distribution: MAP
MAP
criterion:
A I I
" '
I A n 5 F a M A P HI He
EfKEEFE ¥ 2 HE E re Td
←
no T t -
7 a K ,
2 a I a
55ft HEE HE 2- I 3 a e .
,
= I La.
ez la
'
f Fi 9 .
-fi FE of A I 7
"
4 a Nis En Is In 5 Fe et F'a 4 the 5 Ep
- -
y
if
"
Is a
"
Taika
-
T PEI Ek a Fat he k MEH a n a e
-
-
IF '
n k
-
-

Categorical distribution: MAP
With a uniform prior (a1..K=1), gives same result as
maximum likelihood.
Take derivative, set to zero and re-arrange:
MAP -fE¥afzEh
Lank
'e2f±cTH
.
T
si
-71--71249
MAP
-tEd¥k¥K
feat
' '
µ¥Tp.at#sEay7L8Gaci3t -
F
-
-
Ali . . k
xx .ci#21HsEfIE'fAFegIEfE2-ZfE

Categorical Distribution
Observed data
Five samples from prior
Five samples from posterior
¥584954
k¥2474 I KE E
"
-
7
¥ Ekg tip

Categorical Distribution:
Bayesian approach
Compute posterior distribution over parameters:
A I I
'
'
I a a Q Ep a v a 2
"
HE ¥
fi Ea ¥ Eat if
I I I
"
4 8 a
GF
Z Fa 4 7 L
IF a
I e
"
I
" -
-
I
¥42
EffingFife If 2 a E 9 2 F
-
70 a 2 k¥5 . fi EE is
tf I "
4 an 4364459 I Fa y a
-
-
FT T ITA CE ,
7 I a
23 .
¥447fft 8TH

Categorical Distribution:
Bayesian approach
Compute predictive distribution:
- -
¥2 a list -
F re Et KkFifa IT i ¥ Htt Hu FAI he
-
I R k EIKE I a a t Ftc =
I To E
"
x Y
'
7 c
8 F I k¥ E FF.
E F I e
I .

ML / MAP vs. Bayesian
MAP/M
L
Bayesia
n 47Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
FEZIBME-FC.at#xc
⇒
t.EE#actTuNkHfE&I2rLNenhEaEpu3FbEc
FEE
of
* a .
HERMIE
"
-
A L h
do on

Conclusion
• Three ways to fit probability distributions
• Maximum likelihood
• Maximum a posteriori
• Bayesian Approach
• Two worked example
• Normal distribution (ML least squares)
• Categorical distribution

04 Fitting Probability Models ノート

04 Fitting Probability Models ノート

Recommended

Recommended

More Related Content

Similar to 04 Fitting Probability Models ノート

Similar to 04 Fitting Probability Models ノート (12)

Recently uploaded

Recently uploaded (20)

04 Fitting Probability Models ノート