SlideShare a Scribd company logo
1 of 65
Download to read offline
GVlogo
Hessian Matrices In Statistics
Ferris Jumah, David Schlueter, Matt Vance
MTH 327
Final Project
December 7, 2011
Hessian Matrices in Statistics
GVlogo
Topic Introduction
Today we are going to talk about . . .
Hessian Matrices in Statistics
GVlogo
Topic Introduction
Today we are going to talk about . . .
Introduce the Hessian matrix
Hessian Matrices in Statistics
GVlogo
Topic Introduction
Today we are going to talk about . . .
Introduce the Hessian matrix
Brief description of relevant statistics
Hessian Matrices in Statistics
GVlogo
Topic Introduction
Today we are going to talk about . . .
Introduce the Hessian matrix
Brief description of relevant statistics
Maximum Likelihood Estimation (MLE)
Hessian Matrices in Statistics
GVlogo
Topic Introduction
Today we are going to talk about . . .
Introduce the Hessian matrix
Brief description of relevant statistics
Maximum Likelihood Estimation (MLE)
Fisher Information and Applications
Hessian Matrices in Statistics
GVlogo
The Hessian Matrix
Recall the Hessian matrix
H(f) =













∂2f
∂x2
1
∂2f
∂x1 ∂x2
· · · ∂2f
∂x1 ∂xn
∂2f
∂x2 ∂x1
∂2f
∂x2
2
· · · ∂2f
∂x2 ∂xn
...
...
...
...
∂2f
∂xn ∂x1
∂2f
∂xn ∂x2
· · · ∂2f
∂x2
n













(1)
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random Variables
Definition: A random variable X is a function X : Ω → R
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random Variables
Definition: A random variable X is a function X : Ω → R
Each r.v. follows a distribution that has associated probability function
f(x|θ)
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random Variables
Definition: A random variable X is a function X : Ω → R
Each r.v. follows a distribution that has associated probability function
f(x|θ)
E.g.
f(x|µ, σ2
) =
1
σ
√
2π
exp −
(x − µ)2
2σ2
(2)
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random Variables
Definition: A random variable X is a function X : Ω → R
Each r.v. follows a distribution that has associated probability function
f(x|θ)
E.g.
f(x|µ, σ2
) =
1
σ
√
2π
exp −
(x − µ)2
2σ2
(2)
What is a Random Sample?
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random Variables
Definition: A random variable X is a function X : Ω → R
Each r.v. follows a distribution that has associated probability function
f(x|θ)
E.g.
f(x|µ, σ2
) =
1
σ
√
2π
exp −
(x − µ)2
2σ2
(2)
What is a Random Sample? X1, . . . , Xn i.i.d.
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random Variables
Definition: A random variable X is a function X : Ω → R
Each r.v. follows a distribution that has associated probability function
f(x|θ)
E.g.
f(x|µ, σ2
) =
1
σ
√
2π
exp −
(x − µ)2
2σ2
(2)
What is a Random Sample? X1, . . . , Xn i.i.d.
Outputs of these r.v.s are our sample data
Hessian Matrices in Statistics
GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random Variables
Definition: A random variable X is a function X : Ω → R
Each r.v. follows a distribution that has associated probability function
f(x|θ)
E.g.
f(x|µ, σ2
) =
1
σ
√
2π
exp −
(x − µ)2
2σ2
(2)
What is a Random Sample? X1, . . . , Xn i.i.d.
Outputs of these r.v.s are our sample data
Hessian Matrices in Statistics
GVlogo
Stats cont.
Estimators (ˆθ) of Population Parameters
Hessian Matrices in Statistics
GVlogo
Stats cont.
Estimators (ˆθ) of Population Parameters
Definition: Estimator is often a formula to calculate an estimate of a
parameter, θ based on sample data
Hessian Matrices in Statistics
GVlogo
Stats cont.
Estimators (ˆθ) of Population Parameters
Definition: Estimator is often a formula to calculate an estimate of a
parameter, θ based on sample data
Many estimators, but which is the best?
Hessian Matrices in Statistics
GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood Estimation
Hessian Matrices in Statistics
GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood Estimation
GOAL: to determine the best estimate of a parameter θ from a sample
Hessian Matrices in Statistics
GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood Estimation
GOAL: to determine the best estimate of a parameter θ from a sample
Likelihood Function
Hessian Matrices in Statistics
GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood Estimation
GOAL: to determine the best estimate of a parameter θ from a sample
Likelihood Function
We obtain data vector x = (x1, . . . , xn)
Hessian Matrices in Statistics
GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood Estimation
GOAL: to determine the best estimate of a parameter θ from a sample
Likelihood Function
We obtain data vector x = (x1, . . . , xn)
Since random sample is i.i.d., we express the probability of our observed
data given θ as
Hessian Matrices in Statistics
GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood Estimation
GOAL: to determine the best estimate of a parameter θ from a sample
Likelihood Function
We obtain data vector x = (x1, . . . , xn)
Since random sample is i.i.d., we express the probability of our observed
data given θ as
f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)
fn(x|θ) =
n
i=1
f(xi|θ) (4)
Hessian Matrices in Statistics
GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood Estimation
GOAL: to determine the best estimate of a parameter θ from a sample
Likelihood Function
We obtain data vector x = (x1, . . . , xn)
Since random sample is i.i.d., we express the probability of our observed
data given θ as
f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)
fn(x|θ) =
n
i=1
f(xi|θ) (4)
Implication of maximizing likelihood function
Hessian Matrices in Statistics
GVlogo
Example of MLE
Example: Gaussian (Normal) Linear regression
Hessian Matrices in Statistics
GVlogo
Example of MLE
Example: Gaussian (Normal) Linear regression
Recall Least Squares Regression
Wish to determine weight vector w
Hessian Matrices in Statistics
GVlogo
Example of MLE
Example: Gaussian (Normal) Linear regression
Recall Least Squares Regression
Wish to determine weight vector w
Likelihood function given by
P(y|x, w) =
1
σ
√
2π
n
exp − i(yi − wT xi)2
2σ2
(5)
Hessian Matrices in Statistics
GVlogo
Example of MLE
Example: Gaussian (Normal) Linear regression
Recall Least Squares Regression
Wish to determine weight vector w
Likelihood function given by
P(y|x, w) =
1
σ
√
2π
n
exp − i(yi − wT xi)2
2σ2
(5)
Need to minimize
n
i=1
(yi − wT
xi)2
= (y − Aw)T
(y − Aw) (6)
where A is the design matrix of our data.
Hessian Matrices in Statistics
GVlogo
Example of MLE
Example: Gaussian (Normal) Linear regression
Recall Least Squares Regression
Wish to determine weight vector w
Likelihood function given by
P(y|x, w) =
1
σ
√
2π
n
exp − i(yi − wT xi)2
2σ2
(5)
Need to minimize
n
i=1
(yi − wT
xi)2
= (y − Aw)T
(y − Aw) (6)
where A is the design matrix of our data.
Hessian Matrices in Statistics
GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
Hessian Matrices in Statistics
GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
S = −AT
y + AT
Aw (7)
Hessian Matrices in Statistics
GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
S = −AT
y + AT
Aw (7)
Notice linear combination of weights and columns of AT
A
Hessian Matrices in Statistics
GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
S = −AT
y + AT
Aw (7)
Notice linear combination of weights and columns of AT
A
Our resulting critical point is
ˆw = (AT
A)−1
AT
y, (8)
Hessian Matrices in Statistics
GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
S = −AT
y + AT
Aw (7)
Notice linear combination of weights and columns of AT
A
Our resulting critical point is
ˆw = (AT
A)−1
AT
y, (8)
which we recognize to be the normal equations!
Hessian Matrices in Statistics
GVlogo
Computing the Hessian Matrix
We compute the Hessian in order to show that this is minimum
∂
∂wk
S =
∂
∂wk





w1





x1,1
...
xn,1





+ · · · + wk





x1,k
...
xn,k





+ · · · + wn





x1,n
...
xn,n










Hessian Matrices in Statistics
GVlogo
Computing the Hessian Matrix
We compute the Hessian in order to show that this is minimum
∂
∂wk
S =
∂
∂wk





w1





x1,1
...
xn,1





+ · · · + wk





x1,k
...
xn,k





+ · · · + wn





x1,n
...
xn,n










=





x1,k
...
xn,k





Hessian Matrices in Statistics
GVlogo
Computing the Hessian Matrix
We compute the Hessian in order to show that this is minimum
∂
∂wk
S =
∂
∂wk





w1





x1,1
...
xn,1





+ · · · + wk





x1,k
...
xn,k





+ · · · + wn





x1,n
...
xn,n










=





x1,k
...
xn,k





Therefore,
H = AT
A (9)
which is positive semi-definite. Therefore, our estimate for w
maximizes our likelihood function
Hessian Matrices in Statistics
GVlogo
MLE cont.
Advantages and Disadvantages
Hessian Matrices in Statistics
GVlogo
MLE cont.
Advantages and Disadvantages
Larger samples, as n → ∞, give better estimates
Hessian Matrices in Statistics
GVlogo
MLE cont.
Advantages and Disadvantages
Larger samples, as n → ∞, give better estimates
ˆθn → θ
Hessian Matrices in Statistics
GVlogo
MLE cont.
Advantages and Disadvantages
Larger samples, as n → ∞, give better estimates
ˆθn → θ
Other Advantages
Hessian Matrices in Statistics
GVlogo
MLE cont.
Advantages and Disadvantages
Larger samples, as n → ∞, give better estimates
ˆθn → θ
Other Advantages
Disadvantages: Uniqueness, existence, reliance upon distribution fit
Hessian Matrices in Statistics
GVlogo
MLE cont.
Advantages and Disadvantages
Larger samples, as n → ∞, give better estimates
ˆθn → θ
Other Advantages
Disadvantages: Uniqueness, existence, reliance upon distribution fit
Begs the question: How much information about a parameter can be
gathered from sample data?
Hessian Matrices in Statistics
GVlogo
Fisher Information
Key Concept: Fisher Information
Hessian Matrices in Statistics
GVlogo
Fisher Information
Key Concept: Fisher Information
We determine the amount of information about a parameter from
sample using Fisher information defined by
Hessian Matrices in Statistics
GVlogo
Fisher Information
Key Concept: Fisher Information
We determine the amount of information about a parameter from
sample using Fisher information defined by
I(θ) = −E
∂2
ln[f(x|θ)]
∂θ
. (10)
Hessian Matrices in Statistics
GVlogo
Fisher Information
Key Concept: Fisher Information
We determine the amount of information about a parameter from
sample using Fisher information defined by
I(θ) = −E
∂2
ln[f(x|θ)]
∂θ
. (10)
Intuitive appeal: More data provides more information about
population parameter
Hessian Matrices in Statistics
GVlogo
Fisher information example
Example: Finding the Fisher information for the normal distribution
N(µ, σ2
)
Hessian Matrices in Statistics
GVlogo
Fisher information example
Example: Finding the Fisher information for the normal distribution
N(µ, σ2
)
Log likelihood function of
ln[f(x|θ)] = −
1
2
ln(2πσ2
) −
(x − µ)2
2σ2
(11)
where the the parameter vector θ = (µ, σ2
).
Hessian Matrices in Statistics
GVlogo
Fisher information example
Example: Finding the Fisher information for the normal distribution
N(µ, σ2
)
Log likelihood function of
ln[f(x|θ)] = −
1
2
ln(2πσ2
) −
(x − µ)2
2σ2
(11)
where the the parameter vector θ = (µ, σ2
).
The gradient of the log likelihood is,
Hessian Matrices in Statistics
GVlogo
Fisher information example
Example: Finding the Fisher information for the normal distribution
N(µ, σ2
)
Log likelihood function of
ln[f(x|θ)] = −
1
2
ln(2πσ2
) −
(x − µ)2
2σ2
(11)
where the the parameter vector θ = (µ, σ2
).
The gradient of the log likelihood is,
∂ ln[f(x|θ)]
∂µ
,
∂ ln[f(x|θ)]
∂σ2
=
x − µ
σ2
,
(x − µ)2
2σ4
−
1
2σ2
(12)
Hessian Matrices in Statistics
GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisher
information matrix
Hessian Matrices in Statistics
GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisher
information matrix
∂2
ln[f(x|θ)])
∂θ2
=





∂2
ln[f(x|θ)]
∂µ2
∂2
ln[f(x|θ)])
∂µ∂σ2
∂2
ln[f(x|θ)]
∂µ∂σ2
∂2
ln[f(x|θ)]
∂(σ2)2





Hessian Matrices in Statistics
GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisher
information matrix
∂2
ln[f(x|θ)])
∂θ2
=





∂2
ln[f(x|θ)]
∂µ2
∂2
ln[f(x|θ)])
∂µ∂σ2
∂2
ln[f(x|θ)]
∂µ∂σ2
∂2
ln[f(x|θ)]
∂(σ2)2





=







−1
σ2
−
x − µ
σ4
−
x − µ
σ4
1
2σ4
−
(x − µ)2
σ6







(13)
We now compute our Fisher information matrix.
Hessian Matrices in Statistics
GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisher
information matrix
∂2
ln[f(x|θ)])
∂θ2
=





∂2
ln[f(x|θ)]
∂µ2
∂2
ln[f(x|θ)])
∂µ∂σ2
∂2
ln[f(x|θ)]
∂µ∂σ2
∂2
ln[f(x|θ)]
∂(σ2)2





=







−1
σ2
−
x − µ
σ4
−
x − µ
σ4
1
2σ4
−
(x − µ)2
σ6







(13)
We now compute our Fisher information matrix. We see that
I(θ) = −E
∂2
f(x|θ)
∂θ2
(14)
Hessian Matrices in Statistics
GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisher
information matrix
∂2
ln[f(x|θ)])
∂θ2
=





∂2
ln[f(x|θ)]
∂µ2
∂2
ln[f(x|θ)])
∂µ∂σ2
∂2
ln[f(x|θ)]
∂µ∂σ2
∂2
ln[f(x|θ)]
∂(σ2)2





=







−1
σ2
−
x − µ
σ4
−
x − µ
σ4
1
2σ4
−
(x − µ)2
σ6







(13)
We now compute our Fisher information matrix. We see that
I(θ) = −E
∂2
f(x|θ)
∂θ2
(14)
=
1
σ2 0
0 −1
2σ4
(15)
Hessian Matrices in Statistics
GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .
Hessian Matrices in Statistics
GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .
Lower bound of V ar(ˆθ) given by
Hessian Matrices in Statistics
GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .
Lower bound of V ar(ˆθ) given by
V ar(ˆθ) ≥
1
I(θ)
(16)
for an estimator ˆθ
Hessian Matrices in Statistics
GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .
Lower bound of V ar(ˆθ) given by
V ar(ˆθ) ≥
1
I(θ)
(16)
for an estimator ˆθ
Wald Test: Comparing a proposed value of θ against the MLE
Hessian Matrices in Statistics
GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .
Lower bound of V ar(ˆθ) given by
V ar(ˆθ) ≥
1
I(θ)
(16)
for an estimator ˆθ
Wald Test: Comparing a proposed value of θ against the MLE
Test statistic given by
W =
ˆθ − θ0
s.e.(ˆθ)
(17)
Hessian Matrices in Statistics
GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .
Lower bound of V ar(ˆθ) given by
V ar(ˆθ) ≥
1
I(θ)
(16)
for an estimator ˆθ
Wald Test: Comparing a proposed value of θ against the MLE
Test statistic given by
W =
ˆθ − θ0
s.e.(ˆθ)
(17)
where
s.e.(ˆθ) =
1
I(θ)
(18)
Hessian Matrices in Statistics

More Related Content

What's hot

Principal Component Analysis PCA
Principal Component Analysis PCAPrincipal Component Analysis PCA
Principal Component Analysis PCAAbdullah al Mamun
 
Extension principle
Extension principleExtension principle
Extension principleSavo Delić
 
Multiple discriminant analysis
Multiple discriminant analysisMultiple discriminant analysis
Multiple discriminant analysisMUHAMMAD HASRATH
 
Canonical form and Standard form of LPP
Canonical form and Standard form of LPPCanonical form and Standard form of LPP
Canonical form and Standard form of LPPSundar B N
 
Chapter8
Chapter8Chapter8
Chapter8Vu Vo
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With REdureka!
 
Deep Learning A-Z™: Recurrent Neural Networks (RNN) - The Vanishing Gradient ...
Deep Learning A-Z™: Recurrent Neural Networks (RNN) - The Vanishing Gradient ...Deep Learning A-Z™: Recurrent Neural Networks (RNN) - The Vanishing Gradient ...
Deep Learning A-Z™: Recurrent Neural Networks (RNN) - The Vanishing Gradient ...Kirill Eremenko
 
Lesson 4 ar-ma
Lesson 4 ar-maLesson 4 ar-ma
Lesson 4 ar-maankit_ppt
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and morehsharmasshare
 
Laplace Distribution | Statistics
Laplace Distribution | StatisticsLaplace Distribution | Statistics
Laplace Distribution | StatisticsTransweb Global Inc
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
Lesson 5 arima
Lesson 5 arimaLesson 5 arima
Lesson 5 arimaankit_ppt
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorAmir Al-Ansary
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to RstudioOlga Scrivner
 

What's hot (20)

Principal Component Analysis PCA
Principal Component Analysis PCAPrincipal Component Analysis PCA
Principal Component Analysis PCA
 
Extension principle
Extension principleExtension principle
Extension principle
 
Multiple discriminant analysis
Multiple discriminant analysisMultiple discriminant analysis
Multiple discriminant analysis
 
Canonical form and Standard form of LPP
Canonical form and Standard form of LPPCanonical form and Standard form of LPP
Canonical form and Standard form of LPP
 
Programming in R
Programming in RProgramming in R
Programming in R
 
Chapter8
Chapter8Chapter8
Chapter8
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
 
R studio
R studio R studio
R studio
 
Deep Learning A-Z™: Recurrent Neural Networks (RNN) - The Vanishing Gradient ...
Deep Learning A-Z™: Recurrent Neural Networks (RNN) - The Vanishing Gradient ...Deep Learning A-Z™: Recurrent Neural Networks (RNN) - The Vanishing Gradient ...
Deep Learning A-Z™: Recurrent Neural Networks (RNN) - The Vanishing Gradient ...
 
Lesson 4 ar-ma
Lesson 4 ar-maLesson 4 ar-ma
Lesson 4 ar-ma
 
ARIMA
ARIMA ARIMA
ARIMA
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
 
Laplace Distribution | Statistics
Laplace Distribution | StatisticsLaplace Distribution | Statistics
Laplace Distribution | Statistics
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Lesson 5 arima
Lesson 5 arimaLesson 5 arima
Lesson 5 arima
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
 
Time series analysis
Time series analysisTime series analysis
Time series analysis
 
SVM for Regression
SVM for RegressionSVM for Regression
SVM for Regression
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to Rstudio
 

Viewers also liked

Large Scale Modeling Overview
Large Scale Modeling OverviewLarge Scale Modeling Overview
Large Scale Modeling OverviewFerris Jumah
 
How to quote reference in social science
How to quote reference in social scienceHow to quote reference in social science
How to quote reference in social scienceUdaykumar Shinde
 
Extremos de funciones de dos variables
Extremos de funciones de dos variablesExtremos de funciones de dos variables
Extremos de funciones de dos variablesEliana Acurio Mendez
 
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)Yusuke Uchida
 
Allegory of the Cave and The Matrix
Allegory of the Cave and The MatrixAllegory of the Cave and The Matrix
Allegory of the Cave and The MatrixAbhinav Anand
 
Theory of estimation
Theory of estimationTheory of estimation
Theory of estimationTech_MX
 

Viewers also liked (7)

Large Scale Modeling Overview
Large Scale Modeling OverviewLarge Scale Modeling Overview
Large Scale Modeling Overview
 
Estimation Theory
Estimation TheoryEstimation Theory
Estimation Theory
 
How to quote reference in social science
How to quote reference in social scienceHow to quote reference in social science
How to quote reference in social science
 
Extremos de funciones de dos variables
Extremos de funciones de dos variablesExtremos de funciones de dos variables
Extremos de funciones de dos variables
 
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
 
Allegory of the Cave and The Matrix
Allegory of the Cave and The MatrixAllegory of the Cave and The Matrix
Allegory of the Cave and The Matrix
 
Theory of estimation
Theory of estimationTheory of estimation
Theory of estimation
 

Similar to Hessian Matrices in Statistics

Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Asma Ben Slimene
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Asma Ben Slimene
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbolsAxel de Romblay
 
Gradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionGradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionChristian Perone
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Charles Martin
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
Statistical Decision Theory
Statistical Decision TheoryStatistical Decision Theory
Statistical Decision TheorySangwoo Mo
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
 
Machine learning interviews day4
Machine learning interviews   day4Machine learning interviews   day4
Machine learning interviews day4rajmohanc
 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iAlexander Decker
 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iAlexander Decker
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapterChristian Robert
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control StudySatish Gupta
 

Similar to Hessian Matrices in Statistics (20)

Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbols
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
Gradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionGradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introduction
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
Statistical Decision Theory
Statistical Decision TheoryStatistical Decision Theory
Statistical Decision Theory
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
Machine learning interviews day4
Machine learning interviews   day4Machine learning interviews   day4
Machine learning interviews day4
 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type i
 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type i
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
Bayesian computation with INLA
Bayesian computation with INLABayesian computation with INLA
Bayesian computation with INLA
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control Study
 

Recently uploaded

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 

Recently uploaded (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Hessian Matrices in Statistics

  • 1. GVlogo Hessian Matrices In Statistics Ferris Jumah, David Schlueter, Matt Vance MTH 327 Final Project December 7, 2011 Hessian Matrices in Statistics
  • 2. GVlogo Topic Introduction Today we are going to talk about . . . Hessian Matrices in Statistics
  • 3. GVlogo Topic Introduction Today we are going to talk about . . . Introduce the Hessian matrix Hessian Matrices in Statistics
  • 4. GVlogo Topic Introduction Today we are going to talk about . . . Introduce the Hessian matrix Brief description of relevant statistics Hessian Matrices in Statistics
  • 5. GVlogo Topic Introduction Today we are going to talk about . . . Introduce the Hessian matrix Brief description of relevant statistics Maximum Likelihood Estimation (MLE) Hessian Matrices in Statistics
  • 6. GVlogo Topic Introduction Today we are going to talk about . . . Introduce the Hessian matrix Brief description of relevant statistics Maximum Likelihood Estimation (MLE) Fisher Information and Applications Hessian Matrices in Statistics
  • 7. GVlogo The Hessian Matrix Recall the Hessian matrix H(f) =              ∂2f ∂x2 1 ∂2f ∂x1 ∂x2 · · · ∂2f ∂x1 ∂xn ∂2f ∂x2 ∂x1 ∂2f ∂x2 2 · · · ∂2f ∂x2 ∂xn ... ... ... ... ∂2f ∂xn ∂x1 ∂2f ∂xn ∂x2 · · · ∂2f ∂x2 n              (1) Hessian Matrices in Statistics
  • 8. GVlogo Statistics: Some things to recall Hessian Matrices in Statistics
  • 9. GVlogo Statistics: Some things to recall Now, let’s talk a bit about Inferential Statisitics Hessian Matrices in Statistics
  • 10. GVlogo Statistics: Some things to recall Now, let’s talk a bit about Inferential Statisitics Parameters Hessian Matrices in Statistics
  • 11. GVlogo Statistics: Some things to recall Now, let’s talk a bit about Inferential Statisitics Parameters Random Variables Definition: A random variable X is a function X : Ω → R Hessian Matrices in Statistics
  • 12. GVlogo Statistics: Some things to recall Now, let’s talk a bit about Inferential Statisitics Parameters Random Variables Definition: A random variable X is a function X : Ω → R Each r.v. follows a distribution that has associated probability function f(x|θ) Hessian Matrices in Statistics
  • 13. GVlogo Statistics: Some things to recall Now, let’s talk a bit about Inferential Statisitics Parameters Random Variables Definition: A random variable X is a function X : Ω → R Each r.v. follows a distribution that has associated probability function f(x|θ) E.g. f(x|µ, σ2 ) = 1 σ √ 2π exp − (x − µ)2 2σ2 (2) Hessian Matrices in Statistics
  • 14. GVlogo Statistics: Some things to recall Now, let’s talk a bit about Inferential Statisitics Parameters Random Variables Definition: A random variable X is a function X : Ω → R Each r.v. follows a distribution that has associated probability function f(x|θ) E.g. f(x|µ, σ2 ) = 1 σ √ 2π exp − (x − µ)2 2σ2 (2) What is a Random Sample? Hessian Matrices in Statistics
  • 15. GVlogo Statistics: Some things to recall Now, let’s talk a bit about Inferential Statisitics Parameters Random Variables Definition: A random variable X is a function X : Ω → R Each r.v. follows a distribution that has associated probability function f(x|θ) E.g. f(x|µ, σ2 ) = 1 σ √ 2π exp − (x − µ)2 2σ2 (2) What is a Random Sample? X1, . . . , Xn i.i.d. Hessian Matrices in Statistics
  • 16. GVlogo Statistics: Some things to recall Now, let’s talk a bit about Inferential Statisitics Parameters Random Variables Definition: A random variable X is a function X : Ω → R Each r.v. follows a distribution that has associated probability function f(x|θ) E.g. f(x|µ, σ2 ) = 1 σ √ 2π exp − (x − µ)2 2σ2 (2) What is a Random Sample? X1, . . . , Xn i.i.d. Outputs of these r.v.s are our sample data Hessian Matrices in Statistics
  • 17. GVlogo Statistics: Some things to recall Now, let’s talk a bit about Inferential Statisitics Parameters Random Variables Definition: A random variable X is a function X : Ω → R Each r.v. follows a distribution that has associated probability function f(x|θ) E.g. f(x|µ, σ2 ) = 1 σ √ 2π exp − (x − µ)2 2σ2 (2) What is a Random Sample? X1, . . . , Xn i.i.d. Outputs of these r.v.s are our sample data Hessian Matrices in Statistics
  • 18. GVlogo Stats cont. Estimators (ˆθ) of Population Parameters Hessian Matrices in Statistics
  • 19. GVlogo Stats cont. Estimators (ˆθ) of Population Parameters Definition: Estimator is often a formula to calculate an estimate of a parameter, θ based on sample data Hessian Matrices in Statistics
  • 20. GVlogo Stats cont. Estimators (ˆθ) of Population Parameters Definition: Estimator is often a formula to calculate an estimate of a parameter, θ based on sample data Many estimators, but which is the best? Hessian Matrices in Statistics
  • 21. GVlogo Maximum Likelihood Estimation (MLE) Key Concept: Maximum Likelihood Estimation Hessian Matrices in Statistics
  • 22. GVlogo Maximum Likelihood Estimation (MLE) Key Concept: Maximum Likelihood Estimation GOAL: to determine the best estimate of a parameter θ from a sample Hessian Matrices in Statistics
  • 23. GVlogo Maximum Likelihood Estimation (MLE) Key Concept: Maximum Likelihood Estimation GOAL: to determine the best estimate of a parameter θ from a sample Likelihood Function Hessian Matrices in Statistics
  • 24. GVlogo Maximum Likelihood Estimation (MLE) Key Concept: Maximum Likelihood Estimation GOAL: to determine the best estimate of a parameter θ from a sample Likelihood Function We obtain data vector x = (x1, . . . , xn) Hessian Matrices in Statistics
  • 25. GVlogo Maximum Likelihood Estimation (MLE) Key Concept: Maximum Likelihood Estimation GOAL: to determine the best estimate of a parameter θ from a sample Likelihood Function We obtain data vector x = (x1, . . . , xn) Since random sample is i.i.d., we express the probability of our observed data given θ as Hessian Matrices in Statistics
  • 26. GVlogo Maximum Likelihood Estimation (MLE) Key Concept: Maximum Likelihood Estimation GOAL: to determine the best estimate of a parameter θ from a sample Likelihood Function We obtain data vector x = (x1, . . . , xn) Since random sample is i.i.d., we express the probability of our observed data given θ as f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3) fn(x|θ) = n i=1 f(xi|θ) (4) Hessian Matrices in Statistics
  • 27. GVlogo Maximum Likelihood Estimation (MLE) Key Concept: Maximum Likelihood Estimation GOAL: to determine the best estimate of a parameter θ from a sample Likelihood Function We obtain data vector x = (x1, . . . , xn) Since random sample is i.i.d., we express the probability of our observed data given θ as f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3) fn(x|θ) = n i=1 f(xi|θ) (4) Implication of maximizing likelihood function Hessian Matrices in Statistics
  • 28. GVlogo Example of MLE Example: Gaussian (Normal) Linear regression Hessian Matrices in Statistics
  • 29. GVlogo Example of MLE Example: Gaussian (Normal) Linear regression Recall Least Squares Regression Wish to determine weight vector w Hessian Matrices in Statistics
  • 30. GVlogo Example of MLE Example: Gaussian (Normal) Linear regression Recall Least Squares Regression Wish to determine weight vector w Likelihood function given by P(y|x, w) = 1 σ √ 2π n exp − i(yi − wT xi)2 2σ2 (5) Hessian Matrices in Statistics
  • 31. GVlogo Example of MLE Example: Gaussian (Normal) Linear regression Recall Least Squares Regression Wish to determine weight vector w Likelihood function given by P(y|x, w) = 1 σ √ 2π n exp − i(yi − wT xi)2 2σ2 (5) Need to minimize n i=1 (yi − wT xi)2 = (y − Aw)T (y − Aw) (6) where A is the design matrix of our data. Hessian Matrices in Statistics
  • 32. GVlogo Example of MLE Example: Gaussian (Normal) Linear regression Recall Least Squares Regression Wish to determine weight vector w Likelihood function given by P(y|x, w) = 1 σ √ 2π n exp − i(yi − wT xi)2 2σ2 (5) Need to minimize n i=1 (yi − wT xi)2 = (y − Aw)T (y − Aw) (6) where A is the design matrix of our data. Hessian Matrices in Statistics
  • 33. GVlogo Example of MLE cont. Following standard optimization procedure, we compute gradient of Hessian Matrices in Statistics
  • 34. GVlogo Example of MLE cont. Following standard optimization procedure, we compute gradient of S = −AT y + AT Aw (7) Hessian Matrices in Statistics
  • 35. GVlogo Example of MLE cont. Following standard optimization procedure, we compute gradient of S = −AT y + AT Aw (7) Notice linear combination of weights and columns of AT A Hessian Matrices in Statistics
  • 36. GVlogo Example of MLE cont. Following standard optimization procedure, we compute gradient of S = −AT y + AT Aw (7) Notice linear combination of weights and columns of AT A Our resulting critical point is ˆw = (AT A)−1 AT y, (8) Hessian Matrices in Statistics
  • 37. GVlogo Example of MLE cont. Following standard optimization procedure, we compute gradient of S = −AT y + AT Aw (7) Notice linear combination of weights and columns of AT A Our resulting critical point is ˆw = (AT A)−1 AT y, (8) which we recognize to be the normal equations! Hessian Matrices in Statistics
  • 38. GVlogo Computing the Hessian Matrix We compute the Hessian in order to show that this is minimum ∂ ∂wk S = ∂ ∂wk      w1      x1,1 ... xn,1      + · · · + wk      x1,k ... xn,k      + · · · + wn      x1,n ... xn,n           Hessian Matrices in Statistics
  • 39. GVlogo Computing the Hessian Matrix We compute the Hessian in order to show that this is minimum ∂ ∂wk S = ∂ ∂wk      w1      x1,1 ... xn,1      + · · · + wk      x1,k ... xn,k      + · · · + wn      x1,n ... xn,n           =      x1,k ... xn,k      Hessian Matrices in Statistics
  • 40. GVlogo Computing the Hessian Matrix We compute the Hessian in order to show that this is minimum ∂ ∂wk S = ∂ ∂wk      w1      x1,1 ... xn,1      + · · · + wk      x1,k ... xn,k      + · · · + wn      x1,n ... xn,n           =      x1,k ... xn,k      Therefore, H = AT A (9) which is positive semi-definite. Therefore, our estimate for w maximizes our likelihood function Hessian Matrices in Statistics
  • 41. GVlogo MLE cont. Advantages and Disadvantages Hessian Matrices in Statistics
  • 42. GVlogo MLE cont. Advantages and Disadvantages Larger samples, as n → ∞, give better estimates Hessian Matrices in Statistics
  • 43. GVlogo MLE cont. Advantages and Disadvantages Larger samples, as n → ∞, give better estimates ˆθn → θ Hessian Matrices in Statistics
  • 44. GVlogo MLE cont. Advantages and Disadvantages Larger samples, as n → ∞, give better estimates ˆθn → θ Other Advantages Hessian Matrices in Statistics
  • 45. GVlogo MLE cont. Advantages and Disadvantages Larger samples, as n → ∞, give better estimates ˆθn → θ Other Advantages Disadvantages: Uniqueness, existence, reliance upon distribution fit Hessian Matrices in Statistics
  • 46. GVlogo MLE cont. Advantages and Disadvantages Larger samples, as n → ∞, give better estimates ˆθn → θ Other Advantages Disadvantages: Uniqueness, existence, reliance upon distribution fit Begs the question: How much information about a parameter can be gathered from sample data? Hessian Matrices in Statistics
  • 47. GVlogo Fisher Information Key Concept: Fisher Information Hessian Matrices in Statistics
  • 48. GVlogo Fisher Information Key Concept: Fisher Information We determine the amount of information about a parameter from sample using Fisher information defined by Hessian Matrices in Statistics
  • 49. GVlogo Fisher Information Key Concept: Fisher Information We determine the amount of information about a parameter from sample using Fisher information defined by I(θ) = −E ∂2 ln[f(x|θ)] ∂θ . (10) Hessian Matrices in Statistics
  • 50. GVlogo Fisher Information Key Concept: Fisher Information We determine the amount of information about a parameter from sample using Fisher information defined by I(θ) = −E ∂2 ln[f(x|θ)] ∂θ . (10) Intuitive appeal: More data provides more information about population parameter Hessian Matrices in Statistics
  • 51. GVlogo Fisher information example Example: Finding the Fisher information for the normal distribution N(µ, σ2 ) Hessian Matrices in Statistics
  • 52. GVlogo Fisher information example Example: Finding the Fisher information for the normal distribution N(µ, σ2 ) Log likelihood function of ln[f(x|θ)] = − 1 2 ln(2πσ2 ) − (x − µ)2 2σ2 (11) where the the parameter vector θ = (µ, σ2 ). Hessian Matrices in Statistics
  • 53. GVlogo Fisher information example Example: Finding the Fisher information for the normal distribution N(µ, σ2 ) Log likelihood function of ln[f(x|θ)] = − 1 2 ln(2πσ2 ) − (x − µ)2 2σ2 (11) where the the parameter vector θ = (µ, σ2 ). The gradient of the log likelihood is, Hessian Matrices in Statistics
  • 54. GVlogo Fisher information example Example: Finding the Fisher information for the normal distribution N(µ, σ2 ) Log likelihood function of ln[f(x|θ)] = − 1 2 ln(2πσ2 ) − (x − µ)2 2σ2 (11) where the the parameter vector θ = (µ, σ2 ). The gradient of the log likelihood is, ∂ ln[f(x|θ)] ∂µ , ∂ ln[f(x|θ)] ∂σ2 = x − µ σ2 , (x − µ)2 2σ4 − 1 2σ2 (12) Hessian Matrices in Statistics
  • 55. GVlogo Fisher information example continued We now compute the Hessian matrix that will lead us to our Fisher information matrix Hessian Matrices in Statistics
  • 56. GVlogo Fisher information example continued We now compute the Hessian matrix that will lead us to our Fisher information matrix ∂2 ln[f(x|θ)]) ∂θ2 =      ∂2 ln[f(x|θ)] ∂µ2 ∂2 ln[f(x|θ)]) ∂µ∂σ2 ∂2 ln[f(x|θ)] ∂µ∂σ2 ∂2 ln[f(x|θ)] ∂(σ2)2      Hessian Matrices in Statistics
  • 57. GVlogo Fisher information example continued We now compute the Hessian matrix that will lead us to our Fisher information matrix ∂2 ln[f(x|θ)]) ∂θ2 =      ∂2 ln[f(x|θ)] ∂µ2 ∂2 ln[f(x|θ)]) ∂µ∂σ2 ∂2 ln[f(x|θ)] ∂µ∂σ2 ∂2 ln[f(x|θ)] ∂(σ2)2      =        −1 σ2 − x − µ σ4 − x − µ σ4 1 2σ4 − (x − µ)2 σ6        (13) We now compute our Fisher information matrix. Hessian Matrices in Statistics
  • 58. GVlogo Fisher information example continued We now compute the Hessian matrix that will lead us to our Fisher information matrix ∂2 ln[f(x|θ)]) ∂θ2 =      ∂2 ln[f(x|θ)] ∂µ2 ∂2 ln[f(x|θ)]) ∂µ∂σ2 ∂2 ln[f(x|θ)] ∂µ∂σ2 ∂2 ln[f(x|θ)] ∂(σ2)2      =        −1 σ2 − x − µ σ4 − x − µ σ4 1 2σ4 − (x − µ)2 σ6        (13) We now compute our Fisher information matrix. We see that I(θ) = −E ∂2 f(x|θ) ∂θ2 (14) Hessian Matrices in Statistics
  • 59. GVlogo Fisher information example continued We now compute the Hessian matrix that will lead us to our Fisher information matrix ∂2 ln[f(x|θ)]) ∂θ2 =      ∂2 ln[f(x|θ)] ∂µ2 ∂2 ln[f(x|θ)]) ∂µ∂σ2 ∂2 ln[f(x|θ)] ∂µ∂σ2 ∂2 ln[f(x|θ)] ∂(σ2)2      =        −1 σ2 − x − µ σ4 − x − µ σ4 1 2σ4 − (x − µ)2 σ6        (13) We now compute our Fisher information matrix. We see that I(θ) = −E ∂2 f(x|θ) ∂θ2 (14) = 1 σ2 0 0 −1 2σ4 (15) Hessian Matrices in Statistics
  • 60. GVlogo Applications of Fisher information Fisher information is used in the calculation of . . . Hessian Matrices in Statistics
  • 61. GVlogo Applications of Fisher information Fisher information is used in the calculation of . . . Lower bound of V ar(ˆθ) given by Hessian Matrices in Statistics
  • 62. GVlogo Applications of Fisher information Fisher information is used in the calculation of . . . Lower bound of V ar(ˆθ) given by V ar(ˆθ) ≥ 1 I(θ) (16) for an estimator ˆθ Hessian Matrices in Statistics
  • 63. GVlogo Applications of Fisher information Fisher information is used in the calculation of . . . Lower bound of V ar(ˆθ) given by V ar(ˆθ) ≥ 1 I(θ) (16) for an estimator ˆθ Wald Test: Comparing a proposed value of θ against the MLE Hessian Matrices in Statistics
  • 64. GVlogo Applications of Fisher information Fisher information is used in the calculation of . . . Lower bound of V ar(ˆθ) given by V ar(ˆθ) ≥ 1 I(θ) (16) for an estimator ˆθ Wald Test: Comparing a proposed value of θ against the MLE Test statistic given by W = ˆθ − θ0 s.e.(ˆθ) (17) Hessian Matrices in Statistics
  • 65. GVlogo Applications of Fisher information Fisher information is used in the calculation of . . . Lower bound of V ar(ˆθ) given by V ar(ˆθ) ≥ 1 I(θ) (16) for an estimator ˆθ Wald Test: Comparing a proposed value of θ against the MLE Test statistic given by W = ˆθ − θ0 s.e.(ˆθ) (17) where s.e.(ˆθ) = 1 I(θ) (18) Hessian Matrices in Statistics