SlideShare a Scribd company logo
Chapter 3 : 
Likelihood function and inference 
[0] 
4 Likelihood function and inference 
The likelihood 
Information and curvature 
Suciency and ancilarity 
Maximum likelihood estimation 
Non-regular models 
EM algorithm
The likelihood 
Given an usually parametric family of distributions 
F 2 fF,  2 g 
with densities f [wrt a
xed measure ], the density of the iid 
sample x1, . . . , xn is 
Yn 
i=1 
f(xi) 
Note In the special case  is a counting measure, 
Yn 
i=1 
f(xi) 
is the probability of observing the sample x1, . . . , xn among all 
possible realisations of X1, . . . ,Xn
The likelihood 
De
nition (likelihood function) 
The likelihood function associated with a sample x1, . . . , xn is the 
function 
L : ! R+ 
 ! 
Yn 
i=1 
f(xi) 
same formula as density but dierent space of variation
Example: density function versus likelihood function 
Take the case of a Poisson density 
[against the counting measure] 
f(x; ) = 
x 
x! 
e- IN(x) 
which varies in N as a function of x 
versus 
L(; x) = 
x 
x! 
e- 
which varies in R+ as a function of   = 3
Example: density function versus likelihood function 
Take the case of a Poisson density 
[against the counting measure] 
f(x; ) = 
x 
x! 
e- IN(x) 
which varies in N as a function of x 
versus 
L(; x) = 
x 
x! 
e- 
which varies in R+ as a function of  x = 3
Example: density function versus likelihood function 
Take the case of a Normal N(0, ) 
density [against the Lebesgue measure] 
f(x; ) = 
1 
p 
2 
e-x2=2 IR(x) 
which varies in R as a function of x 
versus 
L(; x) = 
1 
p 
2 
e-x2=2 
which varies in R+ as a function of  
 = 2
Example: density function versus likelihood function 
Take the case of a Normal N(0, ) 
density [against the Lebesgue measure] 
f(x; ) = 
1 
p 
2 
e-x2=2 IR(x) 
which varies in R as a function of x 
versus 
L(; x) = 
1 
p 
2 
e-x2=2 
which varies in R+ as a function of  
x = 2
Example: density function versus likelihood function 
Take the case of a Normal N(0, 1=) 
density [against the Lebesgue measure] 
f(x; ) = 
p 
 
p 
2 
e-x2=2 IR(x) 
which varies in R as a function of x 
versus 
L(; x) = 
p 
 
p 
2 
e-x2=2 IR(x) 
which varies in R+ as a function of  
 = 1=2
Example: density function versus likelihood function 
Take the case of a Normal N(0, 1=) 
density [against the Lebesgue measure] 
f(x; ) = 
p 
 
p 
2 
e-x2=2 IR(x) 
which varies in R as a function of x 
versus 
L(; x) = 
p 
 
p 
2 
e-x2=2 IR(x) 
which varies in R+ as a function of  
x = 1=2
Example: Hardy-Weinberg equilibrium 
Population genetics: 
Genotypes of biallelic genes AA, Aa, and aa 
sample frequencies nAA, nAa and naa 
multinomial model M(n; pAA, pAa, paa) 
related to population proportion of A alleles, pA: 
pAA = p2 
A , pAa = 2pA(1 - pA) , paa = (1 - pA)2 
likelihood 
L(pAjnAA, nAa, naa) / p2nAA 
A [2pA(1 - pA)]nAa(1 - pA)2naa 
[Boos  Stefanski, 2013]
mixed distributions and their likelihood 
Special case when a random variable X may take speci
c values 
a1, . . . , ak and a continum of values A 
Example: Rainfall at a given spot on a given day may be zero with 
positive probability p0 [it did not rain!] or an arbitrary number 
between 0 and 100 [capacity of measurement container] or 100 
with positive probability p100 [container full]
mixed distributions and their likelihood 
Special case when a random variable X may take speci
c values 
a1, . . . , ak and a continum of values A 
Example: Tobit model where y  N(XT
, 2) but 
y = y  Ify  0g observed
mixed distributions and their likelihood 
Special case when a random variable X may take speci
c values 
a1, . . . , ak and a continum of values A 
Density of X against composition of two measures, counting and 
Lebesgue: 
fX(a) = 
 
P(X = a) if a 2 fa1, . . . , akg 
f(aj) otherwise 
Results in likelihood 
L(jx1, . . . , xn) = 
Yk 
j=1 
P(X = ai)nj  
Y 
xi =2fa1,...,akg 
f(xij) 
where nj # observations equal to aj
Enters Fisher, Ronald Fisher! 
Fisher's intuition in the 20's: 
the likelihood function contains the 
relevant information about the 
parameter  
the higher the likelihood the more 
likely the parameter 
the curvature of the likelihood 
determines the precision of the 
estimation
Concentration of likelihood mode around true parameter 
Likelihood functions for x1, . . . , xn  P(3) as n increases 
n = 40, ..., 240
Concentration of likelihood mode around true parameter 
Likelihood functions for x1, . . . , xn  P(3) as n increases 
n = 38, ..., 240
Concentration of likelihood mode around true parameter 
Likelihood functions for x1, . . . , xn  N(0, 1) as n increases
Concentration of likelihood mode around true parameter 
Likelihood functions for x1, . . . , xn  N(0, 1) as sample varies
Concentration of likelihood mode around true parameter 
Likelihood functions for x1, . . . , xn  N(0, 1) as sample varies
why concentration takes place 
Consider 
x1, . . . , xn 
iid 
 F 
Then 
log 
Yn 
i=1 
f(xij) = 
Xn 
i=1 
log f(xij) 
and by LLN 
1=n 
Xn 
i=1 
log f(xij) 
L 
! 
Z 
X 
log f(xj) dF(x) 
Lemma 
Maximising the likelihood is asymptotically equivalent to 
minimising the Kullback-Leibler divergence 
Z 
X 
log f(x)=f(xj) dF(x) 

c Member of the family closest to true distribution
Score function 
Score function de
ned by 
rlog L(jx) = 
 
@=@1L(jx), . . . , @=@pL(jx) 
 
L(jx) 
Gradient (slope) of likelihood function at point  
lemma 
When X  F, 
E[rlog L(jX)] = 0
Score function 
Score function de
ned by 
rlog L(jx) = 
 
@=@1L(jx), . . . , @=@pL(jx) 
 
L(jx) 
Gradient (slope) of likelihood function at point  
lemma 
When X  F, 
E[rlog L(jX)] = 0
Score function 
Score function de
ned by 
rlog L(jx) = 
 
@=@1L(jx), . . . , @=@pL(jx) 
 
L(jx) 
Gradient (slope) of likelihood function at point  
lemma 
When X  F, 
E[rlog L(jX)] = 0 
Reason: 
Z 
X 
rlog L(jx) dF(x) = 
Z 
X 
rL(jx) dx = r 
Z 
X 
dF(x)
Score function 
Score function de
ned by 
rlog L(jx) = 
 
@=@1L(jx), . . . , @=@pL(jx) 
 
L(jx) 
Gradient (slope) of likelihood function at point  
lemma 
When X  F, 
E[rlog L(jX)] = 0 
Connected with concentration theorem: gradient null on average 
for true value of parameter
Score function 
Score function de
ned by 
rlog L(jx) = 
 
@=@1L(jx), . . . , @=@pL(jx) 
 
L(jx) 
Gradient (slope) of likelihood function at point  
lemma 
When X  F, 
E[rlog L(jX)] = 0 
Warning: Not de
ned for non-dierentiable likelihoods, e.g. when 
support depends on
Fisher's information matrix 
Another notion attributed to Fisher [more likely due to Edgeworth] 
Information: covariance matrix of the score vector 
I() = E 
h 
rlog f(Xj) frlog f(Xj)gT 
i 
Often called Fisher information 
Measures curvature of the likelihood surface, which translates as 
information brought by the data 
Sometimes denoted IX to stress dependence on distribution of X
Fisher's information matrix 
Second derivative of the log-likelihood as well 
lemma 
If L(jx) is twice dierentiable [as a function of ] 
I() = -E 
 
rTrlog f(Xj) 
 
Hence 
Iij() = -E 
 
@2 
@i@j 
log f(Xj)
Illustrations 
Binomial B(n, p) distribution 
f(xjp) = 
 
n 
x 
 
px(1 - p)n-x 
@=@p log f(xjp) = x=p - n-x=1-p 
@2=@p2 log f(xjp) = -x=p2 - n-x=(1-p)2 
Hence 
I(p) = np=p2 + n-np=(1-p)2 
= n=p(1-p)
Illustrations 
Multinomial M(n; p1, . . . , pk) distribution 
f(xjp) = 
 
n 
x1    xk 
 
px1 
1    pxk 
k 
@=@pi log f(xjp) = xi=pi - xk=pk 
@2=@pi@pj log f(xjp) = -xk=p2 
k 
@2=@p2i 
log f(xjp) = -xi=p2i 
- xk=p2 
k 
Hence 
I(p) = n 
0 
1=p1 + 1=pk    1=pk 
BBB@ 
1=pk    1=pk 
. . . 
1=pk    1=pk-1 + 1=pk 
1 
CCCA
Illustrations 
Multinomial M(n; p1, . . . , pk) distribution 
f(xjp) = 
 
n 
x1    xk 
 
px1 
1    pxk 
k 
@=@pi log f(xjp) = xi=pi - xk=pk 
@2=@pi@pj log f(xjp) = -xk=p2 
k 
@2=@p2i 
log f(xjp) = -xi=p2i 
- xk=p2 
k 
and 
I(p)-1 = 1=n 
0 
p1(1 - p1) -p1p2    -p1pk-1 
-p1p2 p2(1 - p2)    -p2pk-1 
BBB@ 
. . . 
. . . 
-p1pk-1 -p2pk-1    pk-1(1 - pk-1) 
1 
CCCA
Illustrations 
Normal N(, 2) distribution 
f(xj) = 
1 
p 
2 
1 
 
exp 
 
-(x-)2=22 
	 
@=@ log f(xj) = x-=2 
@=@ log f(xj) = -1= + (x-)2=3 @2=@2 log f(xj) = -1=2 
@2=@@ log f(xj) = -2 x-=3 @2=@2 log f(xj) = 1=2 - 3 (x-)2=4 
Hence 
I() = 1=2 
 
1 0 
0 2
Properties 
Additive features translating as accumulation of information: 
if X and Y are independent, IX() + IY() = I(X,Y)() 
IX1,...,Xn() = nIX1() 
if X = T(Y) and Y = S(X), IX() = IY() 
if X = T(Y), IX() 6 IY() 
If  = 	() is a bijective transform, change of parameterisation: 
I() =
@ 
@ 

T 
I()
@ 
@ 

 
In information geometry, this is seen as a change of 
coordinates on a Riemannian manifold, and the intrinsic 
properties of curvature are unchanged under dierent 
parametrization. In general, the Fisher information 
matrix provides a Riemannian metric (more precisely, the 
Fisher-Rao metric). [Wikipedia]
Properties 
Additive features translating as accumulation of information: 
if X and Y are independent, IX() + IY() = I(X,Y)() 
IX1,...,Xn() = nIX1() 
if X = T(Y) and Y = S(X), IX() = IY() 
if X = T(Y), IX() 6 IY() 
If  = 	() is a bijective transform, change of parameterisation: 
I() =
@ 
@ 

T 
I()
@ 
@ 

 
In information geometry, this is seen as a change of 
coordinates on a Riemannian manifold, and the intrinsic 
properties of curvature are unchanged under dierent 
parametrization. In general, the Fisher information 
matrix provides a Riemannian metric (more precisely, the 
Fisher-Rao metric). [Wikipedia]
Approximations 
Back to the Kullback{Leibler divergence 
D(0, ) = 
Z 
X 
f(xj0) log f(xj0)=f(xj) dx 
Using a second degree Taylor expansion 
log f(xj) = log f(xj0) + ( - 0)Trlog f(xj0) 
+ 
1 
2 
( - 0)TrrT log f(xj0)( - 0) + o(jj - 0 jj2) 
approximation of divergence: 
D(0, )  
1 
2 
( - 0)TI(0)( - 0) 
[Exercise: show this is exact in the normal case]
Approximations 
Central limit law of the score vector 
p 
nrlog L(jX1, . . . ,Xn)  N(0, IX1()) 
1= 
Notation I1() stands for IX1() and indicates information 
associated with a single observation
Suciency 
What if a transform of the sample 
S(X1, . . . ,Xn) 
contains all the information, i.e. 
I(X1,...,Xn)() = IS(X1,...,Xn)() 
uniformly in ? 
In this case S() is called a sucient statistic [because it is 
sucient to know the value of S(x1, . . . , xn) to get complete 
information] 
A statistic is an arbitrary transform of the data X1, . . . ,Xn
Suciency (bis) 
Alternative de
nition: 
If (X1, . . . ,Xn)  f(x1, . . . , xnj) and if T = S(X1, . . . ,Xn) is such 
that the distribution of (X1, . . . ,Xn) conditional on T does not 
depend on , then S() is a sucient statistic 
Factorisation theorem 
S() is a sucient statistic if and only if 
f(x1, . . . , xnj) = g(S(x1, . . . , xnj))  h(x1, . . . , xn) 
another notion due to Fisher
Illustrations 
Uniform U(0, ) distribution 
L(jx1, . . . , xn) = -n 
Yn 
i=1 
I(0,)(xi) = -nI max 
i 
xi 
Hence 
S(X1, . . . ,Xn) = max 
i 
Xi = X(n) 
is sucient
Illustrations 
Bernoulli B(p) distribution 
L(pjx1, . . . , xn) = 
Yn 
i=1 
pxi(1 - p)n-xi = fp=1-pg 
P 
i xi (1 - p)n 
Hence 
S(X1, . . . ,Xn) = Xn 
is sucient
Illustrations 
Normal N(, 2) distribution 
L(, jx1, . . . , xn) = 
Yn 
i=1 
1 
p 
2 
expf-(xi-)2=22g 
= 
1 
f22gn=2 
exp 
 
- 
1 
22 
Xn 
i=1 
(xi - xn + xn - )2 
 
= 
1 
f22gn=2 
exp 
 
- 
1 
22 
Xn 
i=1 
(xi - xn)2 - 
1 
22 
Xn 
i=1 
(xn - )2 
 
Hence 
S(X1, . . . ,Xn) = 
  
Xn, 
Xn 
i=1 
(Xi - Xn)2 
! 
is sucient
Suciency and exponential families 
Both previous examples belong to exponential families 
f(xj) = h(x) exp 
 
T()TS(x) - () 
	 
Generic property of exponential families: 
f(x1, . . . , xnj) = 
Yn 
i=1 
h(xi) exp 
 
T()T 
Xn 
i=1 
 
S(xi) - n() 
lemma 
For an exponential family with summary statistic S(), the statistic 
S(X1, . . . ,Xn) = 
Xn 
i=1 
S(Xi) 
is sucient
Suciency as a rare feature 
Nice property reducing the data to a low dimension transform but... 
How frequent is it within the collection of probability distributions? 
Very rare as essentially restricted to exponential families 
[Pitman-Koopman-Darmois theorem] 
with the exception of parameter-dependent families like U(0, )
Pitman-Koopman-Darmois characterisation 
If X1, . . . ,Xn are iid random variables from a density 
f(j) whose support does not depend on  and verifying 
the property that there exists an integer n0 such that, for 
n  n0, there is a sucient statistic S(X1, . . . ,Xn) with
xed [in n] dimension, then f(j) belongs to an 
exponential family 
[Factorisation theorem] 
Note: Darmois published this result in 1935 [in French] and 
Koopman and Pitman in 1936 [in English] but Darmois is generally 
omitted from the theorem... Fisher proved it for one-D sucient 
statistics in 1934
Minimal suciency 
Multiplicity of sucient statistics, e.g., S0(x) = (S(x),U(x)) 
remains sucient when S() is sucient 
Search of a most concentrated summary: 
Minimal suciency 
A sucient statistic S() is minimal sucient if it is a function of 
any other sucient statistic 
Lemma 
For a minimal exponential family representation 
f(xj) = h(x) exp 
 
T()TS(x) - () 
	 
S(X1) + . . . + S(Xn) is minimal sucient
Ancillarity 
Opposite of suciency: 
Ancillarity 
When X1, . . . ,Xn are iid random variables from a density f(j), a 
statistic A() is ancillary if A(X1, . . . ,Xn) has a distribution that 
does not depend on  
Useless?! Not necessarily, as conditioning upon A(X1, . . . ,Xn) 
leads to more precision and eciency: 
Use of F(x1, . . . , xnjA(x1, . . . , xn)) instead of F(x1, . . . , xn) 
Notion of maximal ancillary statistic
Illustrations 
1 If X1, . . . ,Xn 
iid 
 U(0, ), A(X1, . . . ,Xn) = (X1, . . . ,Xn)=X(n) 
is ancillary 
2 If X1, . . . ,Xn 
iid 
 N(, 2), 
A(X1, . . . ,Xn) = 
(X1P- Xn, . . . ,Xn - Xn n 
i=1(Xi - Xn)2 
is ancillary 
3 If X1, . . . ,Xn 
iid 
 f(xj), rank(X1, . . . ,Xn) is ancillary 
 x=rnorm(10) 
 rank(x) 
[1] 7 4 1 5 2 6 8 9 10 3 
[see, e.g., rank tests]
Point estimation, estimators and estimates 
When given a parametric family f(j) and a sample supposedly 
drawn from this family 
(X1, . . . ,XN) 
iid 
 f(xj) 
an estimator of  is a statistic T(X1, . . . ,XN) or ^n providing a 
[reasonable] substitute for the unknown value . 
an estimate of  is the value of the estimator for a given [realised] 
sample, T(x1, . . . , xn) 
Example: For a Normal N(, 2 sample X1, . . . ,XN, 
T(X1, . . . ,XN) = ^n = 1=n Xn 
is an estimator of  and ^n = 2.014 is an estimate
Maximum likelihood principle 
Given the concentration property of the likelihood function, 
reasonable choice of estimator as mode: 
MLE 
A maximum likelihood estimator (MLE) ^n satis
es 
L(^njX1, . . . ,XN)  L(^njX1, . . . ,XN) for all  2  
Under regularity of L(jX1, . . . ,XN), MLE also solution of the 
likelihood equations 
rlog L(njX1, . . . ,XN) = 0 
Warning: ^n is not most likely value of  but makes observation 
(x1, . . . , xN) most likely...
Maximum likelihood invariance 
Principle independent of parameterisation: 
If  = h() is a one-to-one transform of , then 
^ 
MLE 
n = h(^MLE 
n ) 
[estimator of transform = transform of estimator] 
By extension, if  = h() is any transform of , then 
^ 
MLE 
n = h(^MLE 
n )
Unicity of maximum likelihood estimate 
Depending on regularity of L(jx1, . . . , xN), there may be 
1 an a.s. unique MLE ^MLE 
n 
2 
3 
1 Case of x1, . . . , xn  N(, 1) 
2 
3
Unicity of maximum likelihood estimate 
Depending on regularity of L(jx1, . . . , xN), there may be 
1 
2 several or an in
nity of MLE's [or of solutions to likelihood 
equations] 
3 
1 
2 Case of x1, . . . , xn  N(1 + 2, 1) [[and mixtures of normal] 
3
Unicity of maximum likelihood estimate 
Depending on regularity of L(jx1, . . . , xN), there may be 
1 
2 
3 no MLE at all 
1 
2 
3 Case of x1, . . . , xn  N(i, -2)
Unicity of maximum likelihood estimate 
Consequence of standard dierential calculus results on 
`() = L(jx1, . . . , xn): 
lemma 
If  is connected and open, and if `() is twice-dierentiable with 
lim 
!@ 
`()  +1 
and if H() = rrT`() is positive de
nite at all solutions of the 
likelihood equations, then `() has a unique global maximum 
Limited appeal because excluding local maxima
Unicity of MLE for exponential families 
lemma 
If f(j) is a minimal exponential family 
f(xj) = h(x) exp 
 
T()TS(x) - () 
	 
with T() one-to-one and twice dierentiable over , if  is open, 
and if there is at least one solution to the likelihood equations, 
then it is the unique MLE 
Likelihood equation is equivalent to S(x) = E[S(x)]
Unicity of MLE for exponential families 
Likelihood equation is equivalent to S(x) = E[S(x)] 
lemma 
If  is connected and open, and if `() is twice-dierentiable with 
lim 
!@ 
`()  +1 
and if H() = rrT`() is positive de
nite at all solutions of the 
likelihood equations, then `() has a unique global maximum
Illustrations 
Uniform U(0, ) likelihood 
L(jx1, . . . , xn) = -nI max 
i 
xi 
not dierentiable at X(n) but 
^MLE 
n = X(n)
Illustrations 
Bernoulli B(p) likelihood 
L(pjx1, . . . , xn) = fp=1-pg 
P 
i xi (1 - p)n 
dierentiable over (0, 1) and 
^pMLE 
n = Xn
Illustrations 
Normal N(, 2) likelihood 
L(, jx1, . . . , xn) / -n exp 
 
- 
1 
22 
Xn 
i=1 
(xi - xn)2 - 
1 
22 
Xn 
i=1 
(xn - )2 
 
dierentiable with 
n , ^2 
(^MLE 
MLE 
n ) = 
  
Xn, 
1 
n 
Xn 
i=1 
(Xi - Xn)2 
!
The fundamental theorem of Statistics 
fundamental theorem 
Under appropriate conditions, if (X1, . . . ,Xn) 
iid 
 f(xj), if ^n is 
solution of rlog f(X1, . . . ,Xnj) = 0, then 
p 
nf^n - g 
L 
! Np(0, I()-1) 
Equivalent of CLT for estimation purposes
Assumptions 
 identi

More Related Content

What's hot

Simple Linear Regression (simplified)
Simple Linear Regression (simplified)Simple Linear Regression (simplified)
Simple Linear Regression (simplified)
Haoran Zhang
 
Multinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisMultinomial Logistic Regression Analysis
Multinomial Logistic Regression Analysis
HARISH Kumar H R
 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regression
alok tiwari
 
Linear regression
Linear regressionLinear regression
Linear regression
vermaumeshverma
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
Christian Robert
 
multiple linear regression
multiple linear regressionmultiple linear regression
multiple linear regression
Akhilesh Joshi
 
Least Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaLeast Squares Regression Method | Edureka
Least Squares Regression Method | Edureka
Edureka!
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
Seth Anandaram Jaipuria College
 
Simple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepSimple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-Step
Dan Wellisch
 
Scatter plots
Scatter plotsScatter plots
Scatter plotsswartzje
 
kinds of distribution
 kinds of distribution kinds of distribution
kinds of distribution
Unsa Shakir
 
Statistical inference
Statistical inferenceStatistical inference
Statistical inferenceJags Jagdish
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
Guy Lebanon
 
Logistic Regression.ppt
Logistic Regression.pptLogistic Regression.ppt
Logistic Regression.ppt
habtamu biazin
 
Statistical Estimation
Statistical Estimation Statistical Estimation
Statistical Estimation
Remyagharishs
 
Laplace Distribution | Statistics
Laplace Distribution | StatisticsLaplace Distribution | Statistics
Laplace Distribution | Statistics
Transweb Global Inc
 
Estimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, BelgiumEstimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, Belgium
Stijn De Vuyst
 

What's hot (20)

Simple Linear Regression (simplified)
Simple Linear Regression (simplified)Simple Linear Regression (simplified)
Simple Linear Regression (simplified)
 
Multinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisMultinomial Logistic Regression Analysis
Multinomial Logistic Regression Analysis
 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regression
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Survival analysis
Survival  analysisSurvival  analysis
Survival analysis
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
multiple linear regression
multiple linear regressionmultiple linear regression
multiple linear regression
 
Least Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaLeast Squares Regression Method | Edureka
Least Squares Regression Method | Edureka
 
Correlation analysis
Correlation analysis Correlation analysis
Correlation analysis
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
 
Simple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepSimple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-Step
 
Scatter plots
Scatter plotsScatter plots
Scatter plots
 
kinds of distribution
 kinds of distribution kinds of distribution
kinds of distribution
 
Statistical inference
Statistical inferenceStatistical inference
Statistical inference
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
 
Logistic Regression.ppt
Logistic Regression.pptLogistic Regression.ppt
Logistic Regression.ppt
 
Statistical Estimation
Statistical Estimation Statistical Estimation
Statistical Estimation
 
Laplace Distribution | Statistics
Laplace Distribution | StatisticsLaplace Distribution | Statistics
Laplace Distribution | Statistics
 
Confidence interval
Confidence intervalConfidence interval
Confidence interval
 
Estimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, BelgiumEstimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, Belgium
 

Viewers also liked

ABC in Roma
ABC in RomaABC in Roma
ABC in Roma
Christian Robert
 
Bayes250: Thomas Bayes Memorial Lecture at EMS 2013
Bayes250: Thomas Bayes Memorial Lecture at EMS 2013Bayes250: Thomas Bayes Memorial Lecture at EMS 2013
Bayes250: Thomas Bayes Memorial Lecture at EMS 2013
Christian Robert
 
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapStatistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Christian Robert
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
Christian Robert
 
Chapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysisChapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysis
Christian Robert
 
Tabel Normal, MLE Distribusi Normal & Weibull, Grafik Mean & Standar Deviasi ...
Tabel Normal, MLE Distribusi Normal & Weibull, Grafik Mean & Standar Deviasi ...Tabel Normal, MLE Distribusi Normal & Weibull, Grafik Mean & Standar Deviasi ...
Tabel Normal, MLE Distribusi Normal & Weibull, Grafik Mean & Standar Deviasi ...
Nuruul Aini Willywili
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
Amir Al-Ansary
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimationguestfee8698
 
Metode maximum likelihood
Metode maximum likelihoodMetode maximum likelihood
Metode maximum likelihoodririn12
 
Chapter 3 maximum likelihood and bayesian estimation-fix
Chapter 3   maximum likelihood and bayesian estimation-fixChapter 3   maximum likelihood and bayesian estimation-fix
Chapter 3 maximum likelihood and bayesian estimation-fix
jelli123
 
Chapter 4 likelihood
Chapter 4 likelihoodChapter 4 likelihood
Chapter 4 likelihoodNBER
 
Sensitivity, specificity and likelihood ratios
Sensitivity, specificity and likelihood ratiosSensitivity, specificity and likelihood ratios
Sensitivity, specificity and likelihood ratios
Chew Keng Sheng
 
The Method Of Maximum Likelihood
The Method Of Maximum LikelihoodThe Method Of Maximum Likelihood
The Method Of Maximum LikelihoodMax Chipulu
 

Viewers also liked (13)

ABC in Roma
ABC in RomaABC in Roma
ABC in Roma
 
Bayes250: Thomas Bayes Memorial Lecture at EMS 2013
Bayes250: Thomas Bayes Memorial Lecture at EMS 2013Bayes250: Thomas Bayes Memorial Lecture at EMS 2013
Bayes250: Thomas Bayes Memorial Lecture at EMS 2013
 
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapStatistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
 
Chapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysisChapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysis
 
Tabel Normal, MLE Distribusi Normal & Weibull, Grafik Mean & Standar Deviasi ...
Tabel Normal, MLE Distribusi Normal & Weibull, Grafik Mean & Standar Deviasi ...Tabel Normal, MLE Distribusi Normal & Weibull, Grafik Mean & Standar Deviasi ...
Tabel Normal, MLE Distribusi Normal & Weibull, Grafik Mean & Standar Deviasi ...
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimation
 
Metode maximum likelihood
Metode maximum likelihoodMetode maximum likelihood
Metode maximum likelihood
 
Chapter 3 maximum likelihood and bayesian estimation-fix
Chapter 3   maximum likelihood and bayesian estimation-fixChapter 3   maximum likelihood and bayesian estimation-fix
Chapter 3 maximum likelihood and bayesian estimation-fix
 
Chapter 4 likelihood
Chapter 4 likelihoodChapter 4 likelihood
Chapter 4 likelihood
 
Sensitivity, specificity and likelihood ratios
Sensitivity, specificity and likelihood ratiosSensitivity, specificity and likelihood ratios
Sensitivity, specificity and likelihood ratios
 
The Method Of Maximum Likelihood
The Method Of Maximum LikelihoodThe Method Of Maximum Likelihood
The Method Of Maximum Likelihood
 

Similar to Statistics (1): estimation Chapter 3: likelihood function and likelihood estimation

Actuarial Science Reference Sheet
Actuarial Science Reference SheetActuarial Science Reference Sheet
Actuarial Science Reference SheetDaniel Nolan
 
13 graphs of factorable polynomials x
13 graphs of factorable polynomials x13 graphs of factorable polynomials x
13 graphs of factorable polynomials x
math260
 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt ms
Faeco Bot
 
Maths 12
Maths 12Maths 12
Maths 12
Mehtab Rai
 
2nd-year-Math-full-Book-PB.pdf
2nd-year-Math-full-Book-PB.pdf2nd-year-Math-full-Book-PB.pdf
2nd-year-Math-full-Book-PB.pdf
proacademyhub
 
2018-G12-Math-E.pdf
2018-G12-Math-E.pdf2018-G12-Math-E.pdf
2018-G12-Math-E.pdf
ZainMehmood21
 
Refresher probabilities-statistics
Refresher probabilities-statisticsRefresher probabilities-statistics
Refresher probabilities-statistics
Steve Nouri
 
Unit 2.6
Unit 2.6Unit 2.6
Unit 2.6
Mark Ryder
 
237654933 mathematics-t-form-6
237654933 mathematics-t-form-6237654933 mathematics-t-form-6
237654933 mathematics-t-form-6
homeworkping3
 
Quantitative Techniques random variables
Quantitative Techniques random variablesQuantitative Techniques random variables
Quantitative Techniques random variables
Rohan Bhatkar
 
Seismic data processing lecture 3
Seismic data processing lecture 3Seismic data processing lecture 3
Seismic data processing lecture 3
Amin khalil
 
this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...
BhojRajAdhikari5
 
Lecture notes
Lecture notes Lecture notes
Lecture notes
graciasabineKAGLAN
 
2.9 graphs of factorable polynomials
2.9 graphs of factorable polynomials2.9 graphs of factorable polynomials
2.9 graphs of factorable polynomialsmath260
 
Random Variable
Random Variable Random Variable
Random Variable
Abhishek652999
 
1 review on derivatives
1 review on derivatives1 review on derivatives
1 review on derivatives
math266
 
Polynomials
PolynomialsPolynomials
Polynomials
Ankit Goel
 
Chapter 5 interpolation
Chapter 5 interpolationChapter 5 interpolation
Chapter 5 interpolation
ssuser53ee01
 
Finance Enginering from Columbia.pdf
Finance Enginering from Columbia.pdfFinance Enginering from Columbia.pdf
Finance Enginering from Columbia.pdf
CarlosLazo45
 

Similar to Statistics (1): estimation Chapter 3: likelihood function and likelihood estimation (20)

Actuarial Science Reference Sheet
Actuarial Science Reference SheetActuarial Science Reference Sheet
Actuarial Science Reference Sheet
 
13 graphs of factorable polynomials x
13 graphs of factorable polynomials x13 graphs of factorable polynomials x
13 graphs of factorable polynomials x
 
Multivriada ppt ms
Multivriada   ppt msMultivriada   ppt ms
Multivriada ppt ms
 
Maths 12
Maths 12Maths 12
Maths 12
 
2nd-year-Math-full-Book-PB.pdf
2nd-year-Math-full-Book-PB.pdf2nd-year-Math-full-Book-PB.pdf
2nd-year-Math-full-Book-PB.pdf
 
2018-G12-Math-E.pdf
2018-G12-Math-E.pdf2018-G12-Math-E.pdf
2018-G12-Math-E.pdf
 
Refresher probabilities-statistics
Refresher probabilities-statisticsRefresher probabilities-statistics
Refresher probabilities-statistics
 
Unit 2.6
Unit 2.6Unit 2.6
Unit 2.6
 
237654933 mathematics-t-form-6
237654933 mathematics-t-form-6237654933 mathematics-t-form-6
237654933 mathematics-t-form-6
 
Quantitative Techniques random variables
Quantitative Techniques random variablesQuantitative Techniques random variables
Quantitative Techniques random variables
 
Seismic data processing lecture 3
Seismic data processing lecture 3Seismic data processing lecture 3
Seismic data processing lecture 3
 
this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...
 
Lecture notes
Lecture notes Lecture notes
Lecture notes
 
2.9 graphs of factorable polynomials
2.9 graphs of factorable polynomials2.9 graphs of factorable polynomials
2.9 graphs of factorable polynomials
 
Functions (Theory)
Functions (Theory)Functions (Theory)
Functions (Theory)
 
Random Variable
Random Variable Random Variable
Random Variable
 
1 review on derivatives
1 review on derivatives1 review on derivatives
1 review on derivatives
 
Polynomials
PolynomialsPolynomials
Polynomials
 
Chapter 5 interpolation
Chapter 5 interpolationChapter 5 interpolation
Chapter 5 interpolation
 
Finance Enginering from Columbia.pdf
Finance Enginering from Columbia.pdfFinance Enginering from Columbia.pdf
Finance Enginering from Columbia.pdf
 

More from Christian Robert

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
Christian Robert
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
Christian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
Christian Robert
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
Christian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
Christian Robert
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
Christian Robert
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
Christian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
Christian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
Christian Robert
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
Christian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
Christian Robert
 

More from Christian Robert (20)

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 

Recently uploaded

Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 

Recently uploaded (20)

Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 

Statistics (1): estimation Chapter 3: likelihood function and likelihood estimation

  • 1. Chapter 3 : Likelihood function and inference [0] 4 Likelihood function and inference The likelihood Information and curvature Suciency and ancilarity Maximum likelihood estimation Non-regular models EM algorithm
  • 2. The likelihood Given an usually parametric family of distributions F 2 fF, 2 g with densities f [wrt a
  • 3. xed measure ], the density of the iid sample x1, . . . , xn is Yn i=1 f(xi) Note In the special case is a counting measure, Yn i=1 f(xi) is the probability of observing the sample x1, . . . , xn among all possible realisations of X1, . . . ,Xn
  • 5. nition (likelihood function) The likelihood function associated with a sample x1, . . . , xn is the function L : ! R+ ! Yn i=1 f(xi) same formula as density but dierent space of variation
  • 6. Example: density function versus likelihood function Take the case of a Poisson density [against the counting measure] f(x; ) = x x! e- IN(x) which varies in N as a function of x versus L(; x) = x x! e- which varies in R+ as a function of = 3
  • 7. Example: density function versus likelihood function Take the case of a Poisson density [against the counting measure] f(x; ) = x x! e- IN(x) which varies in N as a function of x versus L(; x) = x x! e- which varies in R+ as a function of x = 3
  • 8. Example: density function versus likelihood function Take the case of a Normal N(0, ) density [against the Lebesgue measure] f(x; ) = 1 p 2 e-x2=2 IR(x) which varies in R as a function of x versus L(; x) = 1 p 2 e-x2=2 which varies in R+ as a function of = 2
  • 9. Example: density function versus likelihood function Take the case of a Normal N(0, ) density [against the Lebesgue measure] f(x; ) = 1 p 2 e-x2=2 IR(x) which varies in R as a function of x versus L(; x) = 1 p 2 e-x2=2 which varies in R+ as a function of x = 2
  • 10. Example: density function versus likelihood function Take the case of a Normal N(0, 1=) density [against the Lebesgue measure] f(x; ) = p p 2 e-x2=2 IR(x) which varies in R as a function of x versus L(; x) = p p 2 e-x2=2 IR(x) which varies in R+ as a function of = 1=2
  • 11. Example: density function versus likelihood function Take the case of a Normal N(0, 1=) density [against the Lebesgue measure] f(x; ) = p p 2 e-x2=2 IR(x) which varies in R as a function of x versus L(; x) = p p 2 e-x2=2 IR(x) which varies in R+ as a function of x = 1=2
  • 12. Example: Hardy-Weinberg equilibrium Population genetics: Genotypes of biallelic genes AA, Aa, and aa sample frequencies nAA, nAa and naa multinomial model M(n; pAA, pAa, paa) related to population proportion of A alleles, pA: pAA = p2 A , pAa = 2pA(1 - pA) , paa = (1 - pA)2 likelihood L(pAjnAA, nAa, naa) / p2nAA A [2pA(1 - pA)]nAa(1 - pA)2naa [Boos Stefanski, 2013]
  • 13. mixed distributions and their likelihood Special case when a random variable X may take speci
  • 14. c values a1, . . . , ak and a continum of values A Example: Rainfall at a given spot on a given day may be zero with positive probability p0 [it did not rain!] or an arbitrary number between 0 and 100 [capacity of measurement container] or 100 with positive probability p100 [container full]
  • 15. mixed distributions and their likelihood Special case when a random variable X may take speci
  • 16. c values a1, . . . , ak and a continum of values A Example: Tobit model where y N(XT
  • 17. , 2) but y = y Ify 0g observed
  • 18. mixed distributions and their likelihood Special case when a random variable X may take speci
  • 19. c values a1, . . . , ak and a continum of values A Density of X against composition of two measures, counting and Lebesgue: fX(a) = P(X = a) if a 2 fa1, . . . , akg f(aj) otherwise Results in likelihood L(jx1, . . . , xn) = Yk j=1 P(X = ai)nj Y xi =2fa1,...,akg f(xij) where nj # observations equal to aj
  • 20. Enters Fisher, Ronald Fisher! Fisher's intuition in the 20's: the likelihood function contains the relevant information about the parameter the higher the likelihood the more likely the parameter the curvature of the likelihood determines the precision of the estimation
  • 21. Concentration of likelihood mode around true parameter Likelihood functions for x1, . . . , xn P(3) as n increases n = 40, ..., 240
  • 22. Concentration of likelihood mode around true parameter Likelihood functions for x1, . . . , xn P(3) as n increases n = 38, ..., 240
  • 23. Concentration of likelihood mode around true parameter Likelihood functions for x1, . . . , xn N(0, 1) as n increases
  • 24. Concentration of likelihood mode around true parameter Likelihood functions for x1, . . . , xn N(0, 1) as sample varies
  • 25. Concentration of likelihood mode around true parameter Likelihood functions for x1, . . . , xn N(0, 1) as sample varies
  • 26. why concentration takes place Consider x1, . . . , xn iid F Then log Yn i=1 f(xij) = Xn i=1 log f(xij) and by LLN 1=n Xn i=1 log f(xij) L ! Z X log f(xj) dF(x) Lemma Maximising the likelihood is asymptotically equivalent to minimising the Kullback-Leibler divergence Z X log f(x)=f(xj) dF(x) c Member of the family closest to true distribution
  • 27. Score function Score function de
  • 28. ned by rlog L(jx) = @=@1L(jx), . . . , @=@pL(jx) L(jx) Gradient (slope) of likelihood function at point lemma When X F, E[rlog L(jX)] = 0
  • 29. Score function Score function de
  • 30. ned by rlog L(jx) = @=@1L(jx), . . . , @=@pL(jx) L(jx) Gradient (slope) of likelihood function at point lemma When X F, E[rlog L(jX)] = 0
  • 31. Score function Score function de
  • 32. ned by rlog L(jx) = @=@1L(jx), . . . , @=@pL(jx) L(jx) Gradient (slope) of likelihood function at point lemma When X F, E[rlog L(jX)] = 0 Reason: Z X rlog L(jx) dF(x) = Z X rL(jx) dx = r Z X dF(x)
  • 33. Score function Score function de
  • 34. ned by rlog L(jx) = @=@1L(jx), . . . , @=@pL(jx) L(jx) Gradient (slope) of likelihood function at point lemma When X F, E[rlog L(jX)] = 0 Connected with concentration theorem: gradient null on average for true value of parameter
  • 35. Score function Score function de
  • 36. ned by rlog L(jx) = @=@1L(jx), . . . , @=@pL(jx) L(jx) Gradient (slope) of likelihood function at point lemma When X F, E[rlog L(jX)] = 0 Warning: Not de
  • 37. ned for non-dierentiable likelihoods, e.g. when support depends on
  • 38. Fisher's information matrix Another notion attributed to Fisher [more likely due to Edgeworth] Information: covariance matrix of the score vector I() = E h rlog f(Xj) frlog f(Xj)gT i Often called Fisher information Measures curvature of the likelihood surface, which translates as information brought by the data Sometimes denoted IX to stress dependence on distribution of X
  • 39. Fisher's information matrix Second derivative of the log-likelihood as well lemma If L(jx) is twice dierentiable [as a function of ] I() = -E rTrlog f(Xj) Hence Iij() = -E @2 @i@j log f(Xj)
  • 40. Illustrations Binomial B(n, p) distribution f(xjp) = n x px(1 - p)n-x @=@p log f(xjp) = x=p - n-x=1-p @2=@p2 log f(xjp) = -x=p2 - n-x=(1-p)2 Hence I(p) = np=p2 + n-np=(1-p)2 = n=p(1-p)
  • 41. Illustrations Multinomial M(n; p1, . . . , pk) distribution f(xjp) = n x1 xk px1 1 pxk k @=@pi log f(xjp) = xi=pi - xk=pk @2=@pi@pj log f(xjp) = -xk=p2 k @2=@p2i log f(xjp) = -xi=p2i - xk=p2 k Hence I(p) = n 0 1=p1 + 1=pk 1=pk BBB@ 1=pk 1=pk . . . 1=pk 1=pk-1 + 1=pk 1 CCCA
  • 42. Illustrations Multinomial M(n; p1, . . . , pk) distribution f(xjp) = n x1 xk px1 1 pxk k @=@pi log f(xjp) = xi=pi - xk=pk @2=@pi@pj log f(xjp) = -xk=p2 k @2=@p2i log f(xjp) = -xi=p2i - xk=p2 k and I(p)-1 = 1=n 0 p1(1 - p1) -p1p2 -p1pk-1 -p1p2 p2(1 - p2) -p2pk-1 BBB@ . . . . . . -p1pk-1 -p2pk-1 pk-1(1 - pk-1) 1 CCCA
  • 43. Illustrations Normal N(, 2) distribution f(xj) = 1 p 2 1 exp -(x-)2=22 @=@ log f(xj) = x-=2 @=@ log f(xj) = -1= + (x-)2=3 @2=@2 log f(xj) = -1=2 @2=@@ log f(xj) = -2 x-=3 @2=@2 log f(xj) = 1=2 - 3 (x-)2=4 Hence I() = 1=2 1 0 0 2
  • 44. Properties Additive features translating as accumulation of information: if X and Y are independent, IX() + IY() = I(X,Y)() IX1,...,Xn() = nIX1() if X = T(Y) and Y = S(X), IX() = IY() if X = T(Y), IX() 6 IY() If = () is a bijective transform, change of parameterisation: I() =
  • 45. @ @ T I()
  • 46. @ @ In information geometry, this is seen as a change of coordinates on a Riemannian manifold, and the intrinsic properties of curvature are unchanged under dierent parametrization. In general, the Fisher information matrix provides a Riemannian metric (more precisely, the Fisher-Rao metric). [Wikipedia]
  • 47. Properties Additive features translating as accumulation of information: if X and Y are independent, IX() + IY() = I(X,Y)() IX1,...,Xn() = nIX1() if X = T(Y) and Y = S(X), IX() = IY() if X = T(Y), IX() 6 IY() If = () is a bijective transform, change of parameterisation: I() =
  • 48. @ @ T I()
  • 49. @ @ In information geometry, this is seen as a change of coordinates on a Riemannian manifold, and the intrinsic properties of curvature are unchanged under dierent parametrization. In general, the Fisher information matrix provides a Riemannian metric (more precisely, the Fisher-Rao metric). [Wikipedia]
  • 50. Approximations Back to the Kullback{Leibler divergence D(0, ) = Z X f(xj0) log f(xj0)=f(xj) dx Using a second degree Taylor expansion log f(xj) = log f(xj0) + ( - 0)Trlog f(xj0) + 1 2 ( - 0)TrrT log f(xj0)( - 0) + o(jj - 0 jj2) approximation of divergence: D(0, ) 1 2 ( - 0)TI(0)( - 0) [Exercise: show this is exact in the normal case]
  • 51. Approximations Central limit law of the score vector p nrlog L(jX1, . . . ,Xn) N(0, IX1()) 1= Notation I1() stands for IX1() and indicates information associated with a single observation
  • 52. Suciency What if a transform of the sample S(X1, . . . ,Xn) contains all the information, i.e. I(X1,...,Xn)() = IS(X1,...,Xn)() uniformly in ? In this case S() is called a sucient statistic [because it is sucient to know the value of S(x1, . . . , xn) to get complete information] A statistic is an arbitrary transform of the data X1, . . . ,Xn
  • 54. nition: If (X1, . . . ,Xn) f(x1, . . . , xnj) and if T = S(X1, . . . ,Xn) is such that the distribution of (X1, . . . ,Xn) conditional on T does not depend on , then S() is a sucient statistic Factorisation theorem S() is a sucient statistic if and only if f(x1, . . . , xnj) = g(S(x1, . . . , xnj)) h(x1, . . . , xn) another notion due to Fisher
  • 55. Illustrations Uniform U(0, ) distribution L(jx1, . . . , xn) = -n Yn i=1 I(0,)(xi) = -nI max i xi Hence S(X1, . . . ,Xn) = max i Xi = X(n) is sucient
  • 56. Illustrations Bernoulli B(p) distribution L(pjx1, . . . , xn) = Yn i=1 pxi(1 - p)n-xi = fp=1-pg P i xi (1 - p)n Hence S(X1, . . . ,Xn) = Xn is sucient
  • 57. Illustrations Normal N(, 2) distribution L(, jx1, . . . , xn) = Yn i=1 1 p 2 expf-(xi-)2=22g = 1 f22gn=2 exp - 1 22 Xn i=1 (xi - xn + xn - )2 = 1 f22gn=2 exp - 1 22 Xn i=1 (xi - xn)2 - 1 22 Xn i=1 (xn - )2 Hence S(X1, . . . ,Xn) = Xn, Xn i=1 (Xi - Xn)2 ! is sucient
  • 58. Suciency and exponential families Both previous examples belong to exponential families f(xj) = h(x) exp T()TS(x) - () Generic property of exponential families: f(x1, . . . , xnj) = Yn i=1 h(xi) exp T()T Xn i=1 S(xi) - n() lemma For an exponential family with summary statistic S(), the statistic S(X1, . . . ,Xn) = Xn i=1 S(Xi) is sucient
  • 59. Suciency as a rare feature Nice property reducing the data to a low dimension transform but... How frequent is it within the collection of probability distributions? Very rare as essentially restricted to exponential families [Pitman-Koopman-Darmois theorem] with the exception of parameter-dependent families like U(0, )
  • 60. Pitman-Koopman-Darmois characterisation If X1, . . . ,Xn are iid random variables from a density f(j) whose support does not depend on and verifying the property that there exists an integer n0 such that, for n n0, there is a sucient statistic S(X1, . . . ,Xn) with
  • 61. xed [in n] dimension, then f(j) belongs to an exponential family [Factorisation theorem] Note: Darmois published this result in 1935 [in French] and Koopman and Pitman in 1936 [in English] but Darmois is generally omitted from the theorem... Fisher proved it for one-D sucient statistics in 1934
  • 62. Minimal suciency Multiplicity of sucient statistics, e.g., S0(x) = (S(x),U(x)) remains sucient when S() is sucient Search of a most concentrated summary: Minimal suciency A sucient statistic S() is minimal sucient if it is a function of any other sucient statistic Lemma For a minimal exponential family representation f(xj) = h(x) exp T()TS(x) - () S(X1) + . . . + S(Xn) is minimal sucient
  • 63. Ancillarity Opposite of suciency: Ancillarity When X1, . . . ,Xn are iid random variables from a density f(j), a statistic A() is ancillary if A(X1, . . . ,Xn) has a distribution that does not depend on Useless?! Not necessarily, as conditioning upon A(X1, . . . ,Xn) leads to more precision and eciency: Use of F(x1, . . . , xnjA(x1, . . . , xn)) instead of F(x1, . . . , xn) Notion of maximal ancillary statistic
  • 64. Illustrations 1 If X1, . . . ,Xn iid U(0, ), A(X1, . . . ,Xn) = (X1, . . . ,Xn)=X(n) is ancillary 2 If X1, . . . ,Xn iid N(, 2), A(X1, . . . ,Xn) = (X1P- Xn, . . . ,Xn - Xn n i=1(Xi - Xn)2 is ancillary 3 If X1, . . . ,Xn iid f(xj), rank(X1, . . . ,Xn) is ancillary x=rnorm(10) rank(x) [1] 7 4 1 5 2 6 8 9 10 3 [see, e.g., rank tests]
  • 65. Point estimation, estimators and estimates When given a parametric family f(j) and a sample supposedly drawn from this family (X1, . . . ,XN) iid f(xj) an estimator of is a statistic T(X1, . . . ,XN) or ^n providing a [reasonable] substitute for the unknown value . an estimate of is the value of the estimator for a given [realised] sample, T(x1, . . . , xn) Example: For a Normal N(, 2 sample X1, . . . ,XN, T(X1, . . . ,XN) = ^n = 1=n Xn is an estimator of and ^n = 2.014 is an estimate
  • 66. Maximum likelihood principle Given the concentration property of the likelihood function, reasonable choice of estimator as mode: MLE A maximum likelihood estimator (MLE) ^n satis
  • 67. es L(^njX1, . . . ,XN) L(^njX1, . . . ,XN) for all 2 Under regularity of L(jX1, . . . ,XN), MLE also solution of the likelihood equations rlog L(njX1, . . . ,XN) = 0 Warning: ^n is not most likely value of but makes observation (x1, . . . , xN) most likely...
  • 68. Maximum likelihood invariance Principle independent of parameterisation: If = h() is a one-to-one transform of , then ^ MLE n = h(^MLE n ) [estimator of transform = transform of estimator] By extension, if = h() is any transform of , then ^ MLE n = h(^MLE n )
  • 69. Unicity of maximum likelihood estimate Depending on regularity of L(jx1, . . . , xN), there may be 1 an a.s. unique MLE ^MLE n 2 3 1 Case of x1, . . . , xn N(, 1) 2 3
  • 70. Unicity of maximum likelihood estimate Depending on regularity of L(jx1, . . . , xN), there may be 1 2 several or an in
  • 71. nity of MLE's [or of solutions to likelihood equations] 3 1 2 Case of x1, . . . , xn N(1 + 2, 1) [[and mixtures of normal] 3
  • 72. Unicity of maximum likelihood estimate Depending on regularity of L(jx1, . . . , xN), there may be 1 2 3 no MLE at all 1 2 3 Case of x1, . . . , xn N(i, -2)
  • 73. Unicity of maximum likelihood estimate Consequence of standard dierential calculus results on `() = L(jx1, . . . , xn): lemma If is connected and open, and if `() is twice-dierentiable with lim !@ `() +1 and if H() = rrT`() is positive de
  • 74. nite at all solutions of the likelihood equations, then `() has a unique global maximum Limited appeal because excluding local maxima
  • 75. Unicity of MLE for exponential families lemma If f(j) is a minimal exponential family f(xj) = h(x) exp T()TS(x) - () with T() one-to-one and twice dierentiable over , if is open, and if there is at least one solution to the likelihood equations, then it is the unique MLE Likelihood equation is equivalent to S(x) = E[S(x)]
  • 76. Unicity of MLE for exponential families Likelihood equation is equivalent to S(x) = E[S(x)] lemma If is connected and open, and if `() is twice-dierentiable with lim !@ `() +1 and if H() = rrT`() is positive de
  • 77. nite at all solutions of the likelihood equations, then `() has a unique global maximum
  • 78. Illustrations Uniform U(0, ) likelihood L(jx1, . . . , xn) = -nI max i xi not dierentiable at X(n) but ^MLE n = X(n)
  • 79. Illustrations Bernoulli B(p) likelihood L(pjx1, . . . , xn) = fp=1-pg P i xi (1 - p)n dierentiable over (0, 1) and ^pMLE n = Xn
  • 80. Illustrations Normal N(, 2) likelihood L(, jx1, . . . , xn) / -n exp - 1 22 Xn i=1 (xi - xn)2 - 1 22 Xn i=1 (xn - )2 dierentiable with n , ^2 (^MLE MLE n ) = Xn, 1 n Xn i=1 (Xi - Xn)2 !
  • 81. The fundamental theorem of Statistics fundamental theorem Under appropriate conditions, if (X1, . . . ,Xn) iid f(xj), if ^n is solution of rlog f(X1, . . . ,Xnj) = 0, then p nf^n - g L ! Np(0, I()-1) Equivalent of CLT for estimation purposes
  • 83. able support of f(j) constant in `() thrice dierentiable [the killer] there exists g(x) integrable against f(j) in a neighbourhood of the true parameter such that
  • 84.
  • 85.
  • 86.
  • 88.
  • 89.
  • 90.
  • 91. 6 g(x) the identity I() = E h rlog f(Xj) frlog f(Xj)gT i = -E rTrlog f(Xj) stands [mostly super uous] ^n converges in probability to [similarly super uous] [Boos Stefanski, 2014, p.286; Lehmann Casella, 1998]
  • 92. Inecient MLEs Example of MLE of = jjjj2 when x Np(, Ip): ^MLE = jjxjj2 Then E[jjxjj2] = + p diverges away from with p Note: Consistent and ecient behaviour when considering the MLE of based on Z = jjXjj2 2 p() [Robert, 2001]
  • 93. Inconsistent MLEs Take X1, . . . ,Xn iid f(x) with f(x) = (1 - ) 1 () f0(x-=()) + f1(x) for 2 [0, 1], f1(x) = I[-1,1](x) f0(x) = (1 - jxj)I[1,1](x) and () = (1 - ) expf-(1 - )-4 + 1g Then for any ^MLE n a.s. ! 1 [Ferguson, 1982; John Wellner's slides, ca. 2005]
  • 94. Inconsistent MLEs Consider Xij i = 1, . . . , n, j = 1, 2 with Xij N(i, 2). Then ^MLE i = Xi1+Xi2=2 ^2 MLE = 1 4n Xn i=1 (Xi1 - Xi2)2 Therefore ^2 MLE a.s. ! 2=2 [Neyman Scott, 1948]
  • 95. Inconsistent MLEs Consider Xij i = 1, . . . , n, j = 1, 2 with Xij N(i, 2). Then ^MLE i = Xi1+Xi2=2 ^2 MLE = 1 4n Xn i=1 (Xi1 - Xi2)2 Therefore ^2 MLE a.s. ! 2=2 [Neyman Scott, 1948] Note: Working solely with Xi1 - Xi2 N(0, 22) produces a consistent MLE
  • 96. Likelihood optimisation Practical optimisation of the likelihood function ? = arg max L(jx) = Yn i=1 g(Xij). assuming X = (X1, . . . ,Xn) iid g(xj) analytical resolution feasible for exponential families rT() Xn i=1 S(xi) = nr() use of standard numerical techniques like Newton-Raphson (t+1) = (t) + Iobs(X, (t))-1r`((t)) with `(.) log-likelihood and Iobs observed information matrix
  • 97. EM algorithm Cases where g is too complex for the above to work Special case when g is a marginal g(xj) = Z Z f(x, zj) dz Z called latent or missing variable
  • 98. Illustrations censored data X = min(X, a) X N(, 1) mixture model X .3N1(0, 1) + .7N1(1, 1), desequilibrium model X = min(X, Y) X f1(xj) Y f2(xj)
  • 99. Completion EM algorithm based on completing data x with z, such as (X,Z) f(x, zj) Z missing data vector and pair (X,Z) complete data vector Conditional density of Z given x: k(zj, x) = f(x, zj) g(xj)
  • 100. Likelihood decomposition Likelihood associated with complete data (x, z) Lc(jx, z) = f(x, zj) and likelihood for observed data L(jx) such that log L(jx) = E[log Lc(jx,Z)j0, x] - E[log k(Zj, x)j0, x] (1) for any 0, with integration operated against conditionnal distribution of Z given observables (and parameters), k(zj0, x)
  • 101. [A tale of] two 's There are two 's ! : in (1), 0 is a
  • 102. xed (and arbitrary) value driving integration, while both free (and variable) Maximising observed likelihood L(jx) equivalent to maximise r.h.s. term in (1) E[log Lc(jx,Z)j0, x] - E[log k(Zj, x)j0, x]
  • 103. Intuition for EM Instead of maximising wrt r.h.s. term in (1), maximise only E[log Lc(jx,Z)j0, x] Maximisation of complete log-likelihood impossible since z unknown, hence substitute by maximisation of expected complete log-likelihood, with expectation depending on term 0
  • 104. Expectation{Maximisation Expectation of complete log-likelihood denoted Q(j0, x) = E[log Lc(jx,Z)j0, x] to stress dependence on 0 and sample x Principle EM derives sequence of estimators ^(j), j = 1, 2, . . ., through iteration of Expectation and Maximisation steps: Q(^(j)j^(j-1), x) = max Q(j^(j-1), x).
  • 105. EM Algorithm Iterate (in m) 1 (step E) Compute Q(j^(m), x) = E[log Lc(jx,Z)j^(m), x] , 2 (step M) Maximise Q(j^(m), x) in and set ^(m+1) = arg max Q(j^(m), x). until a
  • 106. xed point [of Q] is found [Dempster, Laird, Rubin, 1978]
  • 107. Justi
  • 108. cation Observed likelihood L(jx) increases at every EM step L(^(m+1)jx) L(^(m)jx) [Exercice: use Jensen and (1)]
  • 109. Censored data Normal N(, 1) sample right-censored L(jx) = 1 (2)m=2 exp - 1 2 Xm i=1 (xi - )2 [1 - (a - )]n-m Associated complete log-likelihood: log Lc(jx, z) / - 1 2 Xm i=1 (xi - )2 - 1 2 Xn i=m+1 (zi - )2 , where zi's are censored observations, with density k(zj, x) = expf-1 2 (z - )2g p 2[1 - (a - )] = '(z - ) 1 - (a - ) , a z.
  • 110. Censored data (2) At j-th EM iteration Q(j^(j), x) / - 1 2 Xm i=1 (xi - )2 - 1 2 E Xn i=m+1 (Zi - )2
  • 111.
  • 112.
  • 113.
  • 114.
  • 115. # ^(j), x / - 1 2 Xm i=1 (xi - )2 - 1 2 Xn i=m+1 Z1 a (zi - )2k(zj^(j), x) dzi
  • 116. Censored data (3) Dierenciating in , n ^(j+1) = mx + (n -m)E[Zj^(j)] , with E[Zj^(j)] = Z1 a zk(zj^(j), x) dz = ^(j) + '(a - ^(j)) 1 - (a - ^(j)) . Hence, EM sequence provided by ^(j+1) = m n x + n -m n ^(j) + '(a - ^(j)) 1 - (a - ^(j)) # , which converges to likelihood maximum ^
  • 117. Mixtures Mixture of two normal distributions with unknown means .3N1(0, 1) + .7N1(1, 1), sample X1, . . . ,Xn and parameter = (0, 1) Missing data: Zi 2 f0, 1g, indicator of component associated with Xi , Xijzi N(zi , 1) Zi B(.7) Complete likelihood log Lc(jx, z) / - 1 2 Xn i=1 zi(xi - 1)2 - 1 2 Xn i=1 (1 - zi)(xi - 0)2 = - 1 2 n1(^1 - 1)2 - 1 2 (n - n1)(^0 - 0)2 with n1 = Xn i=1 zi , n1^1 = Xn i=1 zixi , (n - n1)^0 = Xn i=1 (1 - zi)xi
  • 118. Mixtures (2) At j-th EM iteration Q(j^(j), x) = 1 2 E n1(^1 - 1)2 + (n - n1)(^0 - 0)2j^(j), x Dierenciating in ^(j+1) = 0 BB@ E n1^1
  • 119.
  • 120. . ^(j), x E n1j^(j), x E (n - n1)^0
  • 121.
  • 122. . ^(j), x E (n - n1)j^(j), x 1 CCA
  • 123. Mixtures (3) Hence ^(j+1) given by 0 BB@ Pn i=1 E Zi
  • 124.
  • 125. ^(j), xi xi .Pn i=1 E Zij^(j), xi Pn i=1 E (1 - Zi)
  • 126.
  • 127. ^(j), xi xi .Pn i=1 E (1 - Zi)j^(j), xi 1 CCA Conclusion Step (E) in EM replaces missing data Zi with their conditional expectation, given x (expectation that depend on ^(m)).
  • 128. Mixtures (3) −1 0 1 2 3 −1 0 1 2 3 m1 m2 EM iterations for several starting values
  • 129. Properties EM algorithm such that it converges to local maximum or saddle-point it depends on the initial condition (0) it requires several initial values when likelihood multimodal