Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Hessian Matrices in Statistics
1. GVlogo
Hessian Matrices In Statistics
Ferris Jumah, David Schlueter, Matt Vance
MTH 327
Final Project
December 7, 2011
Hessian Matrices in Statistics
4. GVlogo
Topic Introduction
Today we are going to talk about . . .
Introduce the Hessian matrix
Brief description of relevant statistics
Hessian Matrices in Statistics
5. GVlogo
Topic Introduction
Today we are going to talk about . . .
Introduce the Hessian matrix
Brief description of relevant statistics
Maximum Likelihood Estimation (MLE)
Hessian Matrices in Statistics
6. GVlogo
Topic Introduction
Today we are going to talk about . . .
Introduce the Hessian matrix
Brief description of relevant statistics
Maximum Likelihood Estimation (MLE)
Fisher Information and Applications
Hessian Matrices in Statistics
9. GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Hessian Matrices in Statistics
10. GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Hessian Matrices in Statistics
11. GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random Variables
Definition: A random variable X is a function X : Ω → R
Hessian Matrices in Statistics
12. GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random Variables
Definition: A random variable X is a function X : Ω → R
Each r.v. follows a distribution that has associated probability function
f(x|θ)
Hessian Matrices in Statistics
13. GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random Variables
Definition: A random variable X is a function X : Ω → R
Each r.v. follows a distribution that has associated probability function
f(x|θ)
E.g.
f(x|µ, σ2
) =
1
σ
√
2π
exp −
(x − µ)2
2σ2
(2)
Hessian Matrices in Statistics
14. GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random Variables
Definition: A random variable X is a function X : Ω → R
Each r.v. follows a distribution that has associated probability function
f(x|θ)
E.g.
f(x|µ, σ2
) =
1
σ
√
2π
exp −
(x − µ)2
2σ2
(2)
What is a Random Sample?
Hessian Matrices in Statistics
15. GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random Variables
Definition: A random variable X is a function X : Ω → R
Each r.v. follows a distribution that has associated probability function
f(x|θ)
E.g.
f(x|µ, σ2
) =
1
σ
√
2π
exp −
(x − µ)2
2σ2
(2)
What is a Random Sample? X1, . . . , Xn i.i.d.
Hessian Matrices in Statistics
16. GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random Variables
Definition: A random variable X is a function X : Ω → R
Each r.v. follows a distribution that has associated probability function
f(x|θ)
E.g.
f(x|µ, σ2
) =
1
σ
√
2π
exp −
(x − µ)2
2σ2
(2)
What is a Random Sample? X1, . . . , Xn i.i.d.
Outputs of these r.v.s are our sample data
Hessian Matrices in Statistics
17. GVlogo
Statistics: Some things to recall
Now, let’s talk a bit about Inferential Statisitics
Parameters
Random Variables
Definition: A random variable X is a function X : Ω → R
Each r.v. follows a distribution that has associated probability function
f(x|θ)
E.g.
f(x|µ, σ2
) =
1
σ
√
2π
exp −
(x − µ)2
2σ2
(2)
What is a Random Sample? X1, . . . , Xn i.i.d.
Outputs of these r.v.s are our sample data
Hessian Matrices in Statistics
19. GVlogo
Stats cont.
Estimators (ˆθ) of Population Parameters
Definition: Estimator is often a formula to calculate an estimate of a
parameter, θ based on sample data
Hessian Matrices in Statistics
20. GVlogo
Stats cont.
Estimators (ˆθ) of Population Parameters
Definition: Estimator is often a formula to calculate an estimate of a
parameter, θ based on sample data
Many estimators, but which is the best?
Hessian Matrices in Statistics
22. GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood Estimation
GOAL: to determine the best estimate of a parameter θ from a sample
Hessian Matrices in Statistics
23. GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood Estimation
GOAL: to determine the best estimate of a parameter θ from a sample
Likelihood Function
Hessian Matrices in Statistics
24. GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood Estimation
GOAL: to determine the best estimate of a parameter θ from a sample
Likelihood Function
We obtain data vector x = (x1, . . . , xn)
Hessian Matrices in Statistics
25. GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood Estimation
GOAL: to determine the best estimate of a parameter θ from a sample
Likelihood Function
We obtain data vector x = (x1, . . . , xn)
Since random sample is i.i.d., we express the probability of our observed
data given θ as
Hessian Matrices in Statistics
26. GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood Estimation
GOAL: to determine the best estimate of a parameter θ from a sample
Likelihood Function
We obtain data vector x = (x1, . . . , xn)
Since random sample is i.i.d., we express the probability of our observed
data given θ as
f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)
fn(x|θ) =
n
i=1
f(xi|θ) (4)
Hessian Matrices in Statistics
27. GVlogo
Maximum Likelihood Estimation (MLE)
Key Concept: Maximum Likelihood Estimation
GOAL: to determine the best estimate of a parameter θ from a sample
Likelihood Function
We obtain data vector x = (x1, . . . , xn)
Since random sample is i.i.d., we express the probability of our observed
data given θ as
f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)
fn(x|θ) =
n
i=1
f(xi|θ) (4)
Implication of maximizing likelihood function
Hessian Matrices in Statistics
29. GVlogo
Example of MLE
Example: Gaussian (Normal) Linear regression
Recall Least Squares Regression
Wish to determine weight vector w
Hessian Matrices in Statistics
30. GVlogo
Example of MLE
Example: Gaussian (Normal) Linear regression
Recall Least Squares Regression
Wish to determine weight vector w
Likelihood function given by
P(y|x, w) =
1
σ
√
2π
n
exp − i(yi − wT xi)2
2σ2
(5)
Hessian Matrices in Statistics
31. GVlogo
Example of MLE
Example: Gaussian (Normal) Linear regression
Recall Least Squares Regression
Wish to determine weight vector w
Likelihood function given by
P(y|x, w) =
1
σ
√
2π
n
exp − i(yi − wT xi)2
2σ2
(5)
Need to minimize
n
i=1
(yi − wT
xi)2
= (y − Aw)T
(y − Aw) (6)
where A is the design matrix of our data.
Hessian Matrices in Statistics
32. GVlogo
Example of MLE
Example: Gaussian (Normal) Linear regression
Recall Least Squares Regression
Wish to determine weight vector w
Likelihood function given by
P(y|x, w) =
1
σ
√
2π
n
exp − i(yi − wT xi)2
2σ2
(5)
Need to minimize
n
i=1
(yi − wT
xi)2
= (y − Aw)T
(y − Aw) (6)
where A is the design matrix of our data.
Hessian Matrices in Statistics
33. GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
Hessian Matrices in Statistics
34. GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
S = −AT
y + AT
Aw (7)
Hessian Matrices in Statistics
35. GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
S = −AT
y + AT
Aw (7)
Notice linear combination of weights and columns of AT
A
Hessian Matrices in Statistics
36. GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
S = −AT
y + AT
Aw (7)
Notice linear combination of weights and columns of AT
A
Our resulting critical point is
ˆw = (AT
A)−1
AT
y, (8)
Hessian Matrices in Statistics
37. GVlogo
Example of MLE cont.
Following standard optimization procedure, we compute gradient of
S = −AT
y + AT
Aw (7)
Notice linear combination of weights and columns of AT
A
Our resulting critical point is
ˆw = (AT
A)−1
AT
y, (8)
which we recognize to be the normal equations!
Hessian Matrices in Statistics
38. GVlogo
Computing the Hessian Matrix
We compute the Hessian in order to show that this is minimum
∂
∂wk
S =
∂
∂wk
w1
x1,1
...
xn,1
+ · · · + wk
x1,k
...
xn,k
+ · · · + wn
x1,n
...
xn,n
Hessian Matrices in Statistics
39. GVlogo
Computing the Hessian Matrix
We compute the Hessian in order to show that this is minimum
∂
∂wk
S =
∂
∂wk
w1
x1,1
...
xn,1
+ · · · + wk
x1,k
...
xn,k
+ · · · + wn
x1,n
...
xn,n
=
x1,k
...
xn,k
Hessian Matrices in Statistics
40. GVlogo
Computing the Hessian Matrix
We compute the Hessian in order to show that this is minimum
∂
∂wk
S =
∂
∂wk
w1
x1,1
...
xn,1
+ · · · + wk
x1,k
...
xn,k
+ · · · + wn
x1,n
...
xn,n
=
x1,k
...
xn,k
Therefore,
H = AT
A (9)
which is positive semi-definite. Therefore, our estimate for w
maximizes our likelihood function
Hessian Matrices in Statistics
42. GVlogo
MLE cont.
Advantages and Disadvantages
Larger samples, as n → ∞, give better estimates
Hessian Matrices in Statistics
43. GVlogo
MLE cont.
Advantages and Disadvantages
Larger samples, as n → ∞, give better estimates
ˆθn → θ
Hessian Matrices in Statistics
44. GVlogo
MLE cont.
Advantages and Disadvantages
Larger samples, as n → ∞, give better estimates
ˆθn → θ
Other Advantages
Hessian Matrices in Statistics
45. GVlogo
MLE cont.
Advantages and Disadvantages
Larger samples, as n → ∞, give better estimates
ˆθn → θ
Other Advantages
Disadvantages: Uniqueness, existence, reliance upon distribution fit
Hessian Matrices in Statistics
46. GVlogo
MLE cont.
Advantages and Disadvantages
Larger samples, as n → ∞, give better estimates
ˆθn → θ
Other Advantages
Disadvantages: Uniqueness, existence, reliance upon distribution fit
Begs the question: How much information about a parameter can be
gathered from sample data?
Hessian Matrices in Statistics
48. GVlogo
Fisher Information
Key Concept: Fisher Information
We determine the amount of information about a parameter from
sample using Fisher information defined by
Hessian Matrices in Statistics
49. GVlogo
Fisher Information
Key Concept: Fisher Information
We determine the amount of information about a parameter from
sample using Fisher information defined by
I(θ) = −E
∂2
ln[f(x|θ)]
∂θ
. (10)
Hessian Matrices in Statistics
50. GVlogo
Fisher Information
Key Concept: Fisher Information
We determine the amount of information about a parameter from
sample using Fisher information defined by
I(θ) = −E
∂2
ln[f(x|θ)]
∂θ
. (10)
Intuitive appeal: More data provides more information about
population parameter
Hessian Matrices in Statistics
52. GVlogo
Fisher information example
Example: Finding the Fisher information for the normal distribution
N(µ, σ2
)
Log likelihood function of
ln[f(x|θ)] = −
1
2
ln(2πσ2
) −
(x − µ)2
2σ2
(11)
where the the parameter vector θ = (µ, σ2
).
Hessian Matrices in Statistics
53. GVlogo
Fisher information example
Example: Finding the Fisher information for the normal distribution
N(µ, σ2
)
Log likelihood function of
ln[f(x|θ)] = −
1
2
ln(2πσ2
) −
(x − µ)2
2σ2
(11)
where the the parameter vector θ = (µ, σ2
).
The gradient of the log likelihood is,
Hessian Matrices in Statistics
54. GVlogo
Fisher information example
Example: Finding the Fisher information for the normal distribution
N(µ, σ2
)
Log likelihood function of
ln[f(x|θ)] = −
1
2
ln(2πσ2
) −
(x − µ)2
2σ2
(11)
where the the parameter vector θ = (µ, σ2
).
The gradient of the log likelihood is,
∂ ln[f(x|θ)]
∂µ
,
∂ ln[f(x|θ)]
∂σ2
=
x − µ
σ2
,
(x − µ)2
2σ4
−
1
2σ2
(12)
Hessian Matrices in Statistics
55. GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisher
information matrix
Hessian Matrices in Statistics
56. GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisher
information matrix
∂2
ln[f(x|θ)])
∂θ2
=
∂2
ln[f(x|θ)]
∂µ2
∂2
ln[f(x|θ)])
∂µ∂σ2
∂2
ln[f(x|θ)]
∂µ∂σ2
∂2
ln[f(x|θ)]
∂(σ2)2
Hessian Matrices in Statistics
57. GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisher
information matrix
∂2
ln[f(x|θ)])
∂θ2
=
∂2
ln[f(x|θ)]
∂µ2
∂2
ln[f(x|θ)])
∂µ∂σ2
∂2
ln[f(x|θ)]
∂µ∂σ2
∂2
ln[f(x|θ)]
∂(σ2)2
=
−1
σ2
−
x − µ
σ4
−
x − µ
σ4
1
2σ4
−
(x − µ)2
σ6
(13)
We now compute our Fisher information matrix.
Hessian Matrices in Statistics
58. GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisher
information matrix
∂2
ln[f(x|θ)])
∂θ2
=
∂2
ln[f(x|θ)]
∂µ2
∂2
ln[f(x|θ)])
∂µ∂σ2
∂2
ln[f(x|θ)]
∂µ∂σ2
∂2
ln[f(x|θ)]
∂(σ2)2
=
−1
σ2
−
x − µ
σ4
−
x − µ
σ4
1
2σ4
−
(x − µ)2
σ6
(13)
We now compute our Fisher information matrix. We see that
I(θ) = −E
∂2
f(x|θ)
∂θ2
(14)
Hessian Matrices in Statistics
59. GVlogo
Fisher information example continued
We now compute the Hessian matrix that will lead us to our Fisher
information matrix
∂2
ln[f(x|θ)])
∂θ2
=
∂2
ln[f(x|θ)]
∂µ2
∂2
ln[f(x|θ)])
∂µ∂σ2
∂2
ln[f(x|θ)]
∂µ∂σ2
∂2
ln[f(x|θ)]
∂(σ2)2
=
−1
σ2
−
x − µ
σ4
−
x − µ
σ4
1
2σ4
−
(x − µ)2
σ6
(13)
We now compute our Fisher information matrix. We see that
I(θ) = −E
∂2
f(x|θ)
∂θ2
(14)
=
1
σ2 0
0 −1
2σ4
(15)
Hessian Matrices in Statistics
60. GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .
Hessian Matrices in Statistics
61. GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .
Lower bound of V ar(ˆθ) given by
Hessian Matrices in Statistics
62. GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .
Lower bound of V ar(ˆθ) given by
V ar(ˆθ) ≥
1
I(θ)
(16)
for an estimator ˆθ
Hessian Matrices in Statistics
63. GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .
Lower bound of V ar(ˆθ) given by
V ar(ˆθ) ≥
1
I(θ)
(16)
for an estimator ˆθ
Wald Test: Comparing a proposed value of θ against the MLE
Hessian Matrices in Statistics
64. GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .
Lower bound of V ar(ˆθ) given by
V ar(ˆθ) ≥
1
I(θ)
(16)
for an estimator ˆθ
Wald Test: Comparing a proposed value of θ against the MLE
Test statistic given by
W =
ˆθ − θ0
s.e.(ˆθ)
(17)
Hessian Matrices in Statistics
65. GVlogo
Applications of Fisher information
Fisher information is used in the calculation of . . .
Lower bound of V ar(ˆθ) given by
V ar(ˆθ) ≥
1
I(θ)
(16)
for an estimator ˆθ
Wald Test: Comparing a proposed value of θ against the MLE
Test statistic given by
W =
ˆθ − θ0
s.e.(ˆθ)
(17)
where
s.e.(ˆθ) =
1
I(θ)
(18)
Hessian Matrices in Statistics