More Related Content Similar to 04 Fitting Probability Models ノート (12) 04 Fitting Probability Models ノート1. Computer vision: models, learning
and inference
Chapter 4
Fitting Probability Models
Iihs 'RE to a 7h 7 Ta -
- 7
"
no ; t -
7 Nyah
2. Structure
22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Fitting probability distributions
– Maximum likelihood
– Maximum a posteriori
– Bayesian approach
• Worked example 1: Normal distribution
• Worked example 2: Categorical distribution
IEEE HETE
* Kkk
.TK#.EEakfKzTeMAPfhEFer--k7i
3. Fitting: As the name suggests: find the parameters under which
the data are most likely:
Maximum Likelihood
3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Predictive Density:
Evaluate new data point under probability distribution
with best parameters
We have assumed that data was independent (hence product)
E
EEE Fastest
E- A to
"
FL 's K2 ,
¥7 E E
'
E 's cc ,
no 5 x - A 2 Fi K2
43-4 Let "
I
"
-
9- Xi knot lift I
"
-
A a no it -7
nm
is # . a EFE It Btp a En a the Er I f "
.
qi fax
c- = a ti Fk z FIE E3 3 Hit -
A O
Xie G . . . my t Ez.
-422 . THE a
Fifa I B Fpa a
.
It
-
A Efik 3 .
Fi -2 THEIR't 2
. '
I s .
= IF FEE tha Fa's
th
sa
Xi X s
Xgx5¥
'
ok . Xx
I
"
-
'
7 X ,
i 's XI It I 2 9K¥ 71 a 2
"
FEg
. te et 3 .
7¥,
airs E -
'
715 's lit t -
A a HE FINE I te tu la ES .
Eth u u
T
"
-
A X
*
zi IKEA
u ta tf 3¥ Et F H2o
4. Maximum a posteriori (MAP)
Fitting
As the name suggests we find the parameters which maximize the
posterior probability .
4Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Again we have assumed that data was independent
¥4 .EE#iEEF-kf5EBeCMAP7Bahe )
E
"
-7 F
"
-7 Xi . . . I 8 "E¥5kE2E9N3t -
A On 7¥64 ft EFFI '¥
Xi their .EE#.EIEutebEa4iz
¥43. -434¥.NET/EFIQTktqa=4evE3.ETEiHEFEtthUtt
¥.EE#zzfkgzgz-*g1~hH9dtEtE
Its Tufts # 8ft CHE 'ELE .
¥¥yEfE¥
÷¥t¥÷¥.
j÷÷:÷*⇒-
J
E
"
-7k
"
3kV KARLIE "2
KEEN -322
.fi/Tfc23krakn#X I
"
r
ask.at#GfpaFEakAzI
-
FEEL 5th So
X ,
X¢XsXsX5#k8
49 Xo
5. Maximum a posteriori (MAP)
Fitting
As the name suggests we find the parameters which maximize the
posterior probability .
6Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Since the denominator doesn’t depend on the parameters we can
instead maximize I¥¥¥÷÷::::in
' '
fat HE .
¥2 E
'
fkIkHEE¥¥IIPnf Xi I o I E
' '
et Fi ; E x
"
,
Ma p fFE If it Z the k
Ef FM I TG a
"
7 u 2 a 3 .
6. Maximum a posteriori (MAP)
Predictive Density:
Evaluate new data point under probability distribution
with MAP parameters
7Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
It I Fel
-
k a 5 t
-
A v Effi u n T
"
-
7 E
'
I 4 HIT t 3
7. Bayesian Approach
Fitting
Compute the posterior distribution over possible parameter values
using Bayes’ rule:
Principle: why pick one set of parameters? There are many values
that could have explained the data. Try to capture all of the
possibilities
8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
N
'
il 2
' '
HE II
n
"
h K HE ¥ a Fg in Y
TI E
"
Ikki HE ft F M A P FAE ¥ a
F 's E it 3 t
-
A Z -
7 k Ik I 3 He II gu Ta 3 a 5 ? Y
'
W H I " -7 t -
A a 7 u
2
E ¥4 Katie Ef low 54 YI v I 3 a Fi
'
a
-
I ,
ETHKK Itt ' '
a e -
Fi s 's .
E
"
-
A b
"
it it -
7 a ¥47297548 Top E E 2 It 97¥ to 3 .
8. Bayesian Approach
Predictive Density
• Each possible parameter value makes a prediction
• Some parameters more probable than others
Make a prediction that is an infinite weighted sum (integral) of the
predictions for each parameter value, where weights are the
probabilities
9Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
q
.
.
a A E ¥ KI ETE It Et ¥ , .
a
- -
¥44 He E TE KK FIE Et EETFizEe e n A p HE ¥2 "
t .
that Ec Eli 3 x -
F O
I ok I 7
"
t Hike "
K
iz
Has me Fitz ¥ I tip is 's ABE s in z x
*
a qq.az 3¥ E
. .
y .
N I t -
A 0 n El Kk KITE ¥
#
I TEH h't 4 c 2 Fauna 3 .
Itis E 5432
fPrCxxdoR7
X
*
209 An a HE Top
u KML i Q q .
Ffa Be Tg 3 a Each I , 2 O X
"
HA L2 .
I a 8 Ff o a I I .
-
EE4 ft IT 4k
9. Predictive densities for 3 methods
Maximum a posteriori:
Evaluate new data point under probability distribution
with MAP parameters
Maximum likelihood:
Evaluate new data point under probability distribution
with ML parameters
Bayesian:
Calculate weighted sum of predictions from all possible values of
parameters
10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
'
ETEKIAZFE
-
¥hkEfe¥EEtHE¥ ( MAP )
-
ftp.#IKe2MAPHEEetE3HAtIh*mAnci
.
N km
10. How to rationalize different forms?
Consider ML and MAP estimates as probability
distributions with zero probability everywhere except at
estimate (i.e. delta functions)
Predictive densities for 3 methods
11Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
EEEHE Heh M A p HEBe it t k 7 " 793£ a
*¥ Fit
'
I 73g E- a KEI 's K I .
f a 48996k¥Ef -
or
'
E a 2 E ex
'
H -
n
.
-
FTW Th# EET MA P TAEK E hav
11. Structure
1212Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Fitting probability distributions
– Maximum likelihood
– Maximum a posteriori
– Bayesian approach
• Worked example 1: Normal distribution
• Worked example 2: Categorical distribution
12. Univariate Normal Distribution
For short we write:
Univariate normal distribution
describes single continuous
variable.
Takes 2 parameters m and s2>0
13Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
EEE Sfp
↳ Iz Aka
µ o
'
m Ms
13. Normal Inverse Gamma
DistributionDefined on 2 variables m and s2>0
or for short
Four parameters a,b,g > 0 and d.
14Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
9h24 "
'
r 28 tip
x or d
bae g m
14. Ready?
• Approach the same problem 3 different ways:
– Learn ML parameters
– Learn MAP parameters
– Learn Bayesian distribution of parameters
• Will we get the same results?
15Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
15. As the name suggests we find the parameters under which the data
is most likely.
Fitting normal distribution: ML
16Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Likelihood given by
pdf
III fish a I EYE b TG a les t -
A FETE
I
"
-
A X I u .
X I
am I l 's th E 2 I
,
ci i f -
F O om h '
Fer II e
t - I C of I F b of co 3 t -
F O E Fei K 3 .
'
te n I
"
-
A of AKI
94 a T TAE aw fi F L .
Y
FEEL'
Ik a -
HE Eng
-
- It EES-4 g t g t
-
a Ten HSE .
* .
D- it
m , o
-
the I
' '
-
A D - -
Kc
-
-
d 3 .
<
16. Fitting normal distribution: ML
17Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
rake
-
¥
"
-
o
'
a
4Th
( Fife ) it #eEH¥ Ese .
I
-
L
I FEI k s is -5 Hated ME # La IEEE "
or 's , TE -82295 a FA
C a hi Fax "
IIe 't EE u .
17. Fitting a normal distribution: ML
Plotted surface of likelihoods
as a function of possible
parameter values
ML Solution is at peak
18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
fifth a. y I - LEED
-
-
- - - - - - - -
- - - -
w .
-
"
"
!
( vs t -
7 MIO 'D "
Ink 'Ea2E .
Ezek '
Feb "
FE -
i
.
18. Fitting normal distribution: ML
Algebraically
:
or alternatively, we can maximize the
logarithm
19Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
where:
Etta
me€
⇐ " '
'
B
Ff Eknath
" -
FIERI Fk
FIE Du 412 E 7J I .
I E.
ftp.EKPH#kaIF7L3
.
19. Why the logarithm?
The logarithm is a monotonic transformation.
Hence, the position of the peak stays in the same place
But the log likelihood is easier to work with
20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
at E-
'
Ft Eh E EH , 2 E f u
a a ?
It ¥42 EH, 2E he N -
PEHEt to ' B 's I . .
.
Fy :p # than a
PEEK
20. Fitting normal distribution: ML
How to maximize a function? Take derivative and equate to zero.
21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Solution
:
444,2 .tt#NEME-Ek3fIx-ta-gz$
-
?
Initiates
-53
HEIKE kB 417 # to "
On e'ZEIT LERETTE)
Likelihood 'Ll "
@
EFES .
E' Fat 'HEka=u .
I
hEt4Eh
14779¥ 't . H 's .
21. Fitting normal distribution: ML
Maximum likelihood solution:
Should look
familiar!
22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
4th
bitts
22. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Least Squares
2323
Maximum likelihood for the normal distribution...
...gives `least squares’ fitting criterion.
R
Fak = FEEL
IEDS tip IEEE '
II
a if¥7 '
E H i3 E on ¥34 - =
# Eh
↳ IKE Ethic.
EEE B Ep z f Kiku . BIKE % I Fi Fff to CT I s a 5. Es.
J M k -7 ul Ii K F- u a I
' '
.mx
"
2¥18of ¥e.
FA
ur
-
II,
tri -
ni EILERT 389362 .
I.,
Lai -
n )
'
z Bit -
ke -92 PHOEBE a Me u .
.
-
IN a -5¥ Eh
23. Fitting normal distribution: MAP
Fitting
As the name suggests we find the parameters which maximize the
posterior probability ..
Likelihood is normal PDF
24Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
MAPFETEV.EE/I8FpnN5t -
775¥52
-fifth ¥8159
Ghee .tt#KBFpnfi5f-9FEEItaz. .
FACIE
.LI/3FpI#KBEp9EEiKEfFxsJ9z..Taz- Fasti 't 28 TGZhK 's .
24. Fitting normal distribution: MAP
Prior
Use conjugate prior, normal scaled inverse gamma.
25Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
,
i
g t -
A d .
R ,
T
,
8 I 't Lk
4h17 "
t ? B Fp 2 E a u a o
'
a inHas It 8 The
Mega t IRYaa
u
.
y
Fifth
to
"
Ya u
25. Fitting normal distribution: MAP
Likelihoo
d
Prior Posterior
26Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
-¥&nkTmHu
"
Eiti -K±¥
E- A TETI
"
-7
MAP 't Asha
Extensive
.
zakat "¥tk
Maximum Prior
Ed'EyTeTAs¥E¥xknE.
d x =
tiFe Effigy 424£59
-
ti 'Fk2¥Ey8fpa' ftp.547aylffpaci-t-ac-ttass-vkfk3
-
-
£¥n£kK5fx ¥4231 '
Ifta -4hL
# =agq
26. Fitting normal distribution: MAP
Again maximize the log – does not change position of
maximum
27Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
| It aft a
-
F Hav If Ea q IRL a . hi Ee e
Ed Kl 5-4 n FIE Eu,
-
te men
tfto
E723 .
{ the
"
K HE It Ei E "
IT It FI Y th it
"
F - i
.
( i 3 t
-
A µ E o
'
e s u 2 3 42
"
K TEHKkk I a ⇐ o ) 2 T I 2 .
ik no -
I
' '
n I
'
I
-
-
z 2
Tifft et TEE I a a BE A #
27. Fitting normal distribution: MAP
MAP
solution:
Mean can be rewritten as weighted sum of data mean and
prior mean:
28Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
A ,
B ,
r
,
or it ¥ Fang tip a
N I t -
A
(
¥a¥si¥ EEE #a
-
28. Fitting normal distribution: MAP
50 data points 5 data points 1 data points
I
"
-
A an E CIL Ta 3 E EEE'
fAs Erbe E
MAP FSE the 9¥ it STL 7
"
C o
F- 7 a
"
Htt uE¥¥18 The k£3 .
my mm
me
FIEND9
q
.
.
.
a * LEE ex
-
Kc 2 E MAP HE 'Et7¥E
MAP HE Ekka
42k£ Ffa
Fifa
&
"
-
Ma
'
"
qq.ws#Fay 43¥ £4 E I
"
-
A E KI At to do E
"
-
A a- 4 Hi e . EEK 48 TG re f t .
F- A 6 "
1k¥ use
-
In z Tiff on
. .
It ¥27.
E
'
a ,
Retake
"
Iet'¥
2- E Hu 155 .
MA PFE feet Est 4¥ .
29. Fitting normal: Bayesian approach
Fitting
Compute the posterior distribution using Bayes’ rule:
IEEE Fp
nee
-7 t
-
7 Zakk .
-4¥ '¥
a anutttaso . -
¥ :*
.int#hkEFEEE8Fp-EfiK3avtt2tc
.
Ern -447k¥ '¥zi¥AuI
's
.
¥f¥Et
.rs/f*Fttc2Ic.FE4#tAfEItiK3
to ¥ x
EITHEEEEi.TK=
Elektra '¥ 's -4
30. Fitting normal: Bayesian approach
Fitting
Compute the posterior distribution using Bayes’ rule:
Two constants MUST cancel out or LHS not a valid pdf
31Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
E.fi/iz-EfazTE8Ff
list -
Art # Effy
d
m-FBEEN.tt#heIthf*Ts3 -
F¥HuIk43Ehe¥FqEt NEFJJ ITEVE 's .
an -4391=4445*40 "
-42mL
Titch .
I ¥229 # HIT
SEALE 's .
31. Fitting normal: Bayesian approach
Fitting
Compute the posterior distribution using Bayes’ rule:
where
32Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
32. Fitting normal: Bayesian approach
Predictive density
Take weighted sum of predictions from different parameter values:
33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Posterior Samples from posterior
-
-
-
-
game
it
Era no 5 t -
79 ¥ Fay
gig 2- EI Htt at oh FAIR
CETI's I
FIH ft t C 2 FEES LEE a
⇒ I se ft E 4th
l ,
33. Fitting normal: Bayesian approach
Predictive density
Take weighted sum of predictions from different parameter values:
34Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
-
me 06
"
Ez Ek lust n ol
-
'
s EES a F tea £2 .
-
l k 2L I
34. Fitting normal: Bayesian approach
Predictive density
Take weighted sum of predictions from different parameter values:
where
35Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
35. Fitting normal: Bayesian Approach
50 data points 5 data points 1 data points
36Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
77¥22 Not
"
ER,
t.IE
MAP
v Hi '
et m AP HETE is Ect
-
2 ten I
"
-
A Eu ZEAL IFE IT a -
u .
I
"
-
F ka 472 u 2 MAP HE FEI REEFS at et Tu .
36. Structure
3737Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Fitting probability distributions
– Maximum likelihood
– Maximum a posteriori
– Bayesian approach
• Worked example 1: Normal distribution
• Worked example 2: Categorical distribution
37. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Categorical Distribution
or can think of data as vector with all
elements zero except kth e.g. [0,0,0,1 0]
For short we write:
Categorical distribution describes situation where K possible
outcomes y=1… y=k.
Takes K parameters where
38
4 I I
" '
I 71 a 95 Fp
*TETE EE IK 2 k 3 " K k Fo FEET a k sac
.
In d .
Ffa IT it AI Te t e I .
39. Categorical distribution: ML
40Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Maximize product of individual
likelihoods
(remember, P(x) = )
Nk = # times we
observed bin k
$ I a
"
4 a
a 84 z EFFETE he
40. Categorical distribution: ML
41Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Instead maximize the log probability
Log
likelihood
Lagrange multiplier to ensure
that params sum to one
Take derivative, set to zero and re-arrange:
Equity # H
-
n
¥tERE¥ I -
917
"
-7 '-E2¥E¥f
=0kK3n .
Ff
'
dear
2- k¥4553
24430
( 72 am
"
-7412kt
KEE Ata
HHtlFH2 .
41. Categorical distribution: MAP
42Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
MAP
criterion:
A I I
" '
I A n 5 F a M A P HI He
EfKEEFE ¥ 2 HE E re Td
←
no T t -
7 a K ,
2 a I a
55ft HEE HE 2- I 3 a e .
,
= I La.
ez la
'
f Fi 9 .
-fi FE of A I 7
"
4 a Nis En Is In 5 Fe et F'a 4 the 5 Ep
- -
y
if
"
Is a
"
Taika
-
T PEI Ek a Fat he k MEH a n a e
-
-
IF '
n k
-
-
42. Categorical distribution: MAP
43Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
With a uniform prior (a1..K=1), gives same result as
maximum likelihood.
Take derivative, set to zero and re-arrange:
MAP -fE¥afzEh
Lank
'e2f±cTH
.
T
si
-71--71249
MAP
-tEd¥k¥K
feat
' '
µ¥Tp.at#sEay7L8Gaci3t -
F
-
-
Ali . . k
xx .ci#21HsEfIE'fAFegIEfE2-ZfE
44. Categorical Distribution:
Bayesian approach
45Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Two constants MUST cancel out or LHS not a valid pdf
Compute posterior distribution over parameters:
A I I
'
'
I a a Q Ep a v a 2
"
HE ¥
fi Ea ¥ Eat if
I I I
"
4 8 a
GF
Z Fa 4 7 L
IF a
I e
"
I
" -
-
I
¥42
EffingFife If 2 a E 9 2 F
-
70 a 2 k¥5 . fi EE is
tf I "
4 an 4364459 I Fa y a
-
-
FT T ITA CE ,
7 I a
23 .
¥447fft 8TH
45. Categorical Distribution:
Bayesian approach
46Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Two constants MUST cancel out or LHS not a valid pdf
Compute predictive distribution:
- -
¥2 a list -
F re Et KkFifa IT i ¥ Htt Hu FAI he
-
I R k EIKE I a a t Ftc =
I To E
"
x Y
'
7 c
8 F I k¥ E FF.
E F I e
I .
46. ML / MAP vs. Bayesian
MAP/M
L
Bayesia
n 47Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
FEZIBME-FC.at#xc
⇒
t.EE#actTuNkHfE&I2rLNenhEaEpu3FbEc
FEE
of
* a .
HERMIE
"
-
A L h
do on
47. Conclusion
48Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Three ways to fit probability distributions
• Maximum likelihood
• Maximum a posteriori
• Bayesian Approach
• Two worked example
• Normal distribution (ML least squares)
• Categorical distribution