SlideShare a Scribd company logo
1 of 59
Download to read offline
Maximum Likelihood Estimation
INFO-2301: Quantitative Reasoning 2
Michael Paul and Jordan Boyd-Graber
MARCH 7, 2017
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 9
Why MLE?
• Before: Distribution + Parameter → x
• Now: x + Distribution → Parameter
• (Much more realistic)
• But: Says nothing about how good a fit a distribution is
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 9
Likelihood
• Likelihood is p(x;θ)
• We want estimate of θ that best explains data we seen
• I.e., Maximum Likelihood Estimate (MLE)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 9
Likelihood
• The likelihood function refers to the PMF (discrete) or PDF (continuous).
• For discrete distributions, the likelihood of x is P(X = x).
• For continuous distributions, the likelihood of x is the density f(x).
• We will often refer to likelihood rather than probability/mass/density so
that the term applies to either scenario.
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 9
Optimizing Unconstrained Functions
Suppose we wanted to optimize
= x2
− 2x +2 (1)
-5 -4 -3 -2 -1 0 1 2 3 4 5
-3
-2
-1
1
2
3
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 9
Optimizing Unconstrained Functions
Suppose we wanted to optimize
= x2
− 2x +2 (1)
∂
∂ x
= −2x − 2 (2)
-5 -4 -3 -2 -1 0 1 2 3 4 5
-3
-2
-1
1
2
3
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 9
Optimizing Unconstrained Functions
∂
∂ x
=0 (3)
−2x − 2 =0 (4)
x =− 1 (5)
(Should also check that second derivative is negative)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 9
Optimizing Constrained Functions
Theorem: Lagrange Multiplier Method
Given functions f(x1,...xn) and g(x1,...xn), the critical points of f
restricted to the set g = 0 are solutions to equations:
∂ f
∂ xi
(x1,...xn) =λ
∂ g
∂ xi
(x1,...xn) ∀i
g(x1,...xn) = 0
This is n +1 equations in the n +1 variables x1,...xn,λ.
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 9
Lagrange Example
Maximize (x,y) = xy subject to the constraint 20x +10y = 200.
• Compute derivatives
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9
Lagrange Example
Maximize (x,y) = xy subject to the constraint 20x +10y = 200.
• Compute derivatives
∂
∂ x
=
1
2
y
x
∂ g
∂ x
= 20
∂
∂ y
=
1
2
x
y
∂ g
∂ y
= 10
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9
Lagrange Example
Maximize (x,y) = xy subject to the constraint 20x +10y = 200.
• Compute derivatives
∂
∂ x
=
1
2
y
x
∂ g
∂ x
= 20
∂
∂ y
=
1
2
x
y
∂ g
∂ y
= 10
• Create new systems of equations
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9
Lagrange Example
Maximize (x,y) = xy subject to the constraint 20x +10y = 200.
• Compute derivatives
∂
∂ x
=
1
2
y
x
∂ g
∂ x
= 20
∂
∂ y
=
1
2
x
y
∂ g
∂ y
= 10
• Create new systems of equations
1
2
y
x
= 20λ
1
2
x
y
= 10λ
20x +10y = 200
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9
Lagrange Example
• Dividing the first equation by the second gives us
y
x
= 2 (6)
• which means y = 2x, plugging this into the constraint equation gives:
20x +10(2x) = 200
x = 5 ⇒ y = 10
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 9 of 9
Maximum Likelihood Estimation
INFO-2301: Quantitative Reasoning 2
Michael Paul and Jordan Boyd-Graber
MARCH 7, 2017
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 4
Continuous Distribution: Gaussian
• Recall the density function
f(x) =
1
2πσ2
exp −
(x − µ)2
2σ2
(1)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 ...xN, then log likelihood is
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
Continuous Distribution: Gaussian
• Recall the density function
f(x) =
1
2πσ2
exp −
(x − µ)2
2σ2
(1)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 ...xN, then log likelihood is
(µ,σ) ≡ − N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(2)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
Continuous Distribution: Gaussian
• Recall the density function
f(x) =
1
2πσ2
exp −
(x − µ)2
2σ2
(1)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 ...xN, then log likelihood is
(µ,σ) ≡ − N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(2)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
Continuous Distribution: Gaussian
• Recall the density function
f(x) =
1
2πσ2
exp −
(x − µ)2
2σ2
(1)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 ...xN, then log likelihood is
(µ,σ) ≡ − N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(2)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
Continuous Distribution: Gaussian
• Recall the density function
f(x) =
1
2πσ2
exp −
(x − µ)2
2σ2
(1)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 ...xN, then log likelihood is
(µ,σ) ≡ − N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(2)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
MLE of Gaussian µ
(µ,σ) =− N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(3)
∂
∂ µ
=0 +
1
σ2
i
(xi − µ) (4)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
MLE of Gaussian µ
(µ,σ) =− N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(3)
∂
∂ µ
=0 +
1
σ2
i
(xi − µ) (4)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
MLE of Gaussian µ
(µ,σ) =− N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(3)
∂
∂ µ
=0 +
1
σ2
i
(xi − µ) (4)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
MLE of Gaussian µ
(µ,σ) =− N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(3)
∂
∂ µ
=0 +
1
σ2
i
(xi − µ) (4)
Solve for µ:
0 =
1
σ2
i
(xi − µ) (5)
0 =
i
xi − Nµ (6)
µ = i xi
N
(7)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
MLE of Gaussian µ
(µ,σ) =− N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(3)
∂
∂ µ
=0 +
1
σ2
i
(xi − µ) (4)
Solve for µ:
0 =
1
σ2
i
(xi − µ) (5)
0 =
i
xi − Nµ (6)
µ = i xi
N
(7)
Consistent with what we said before
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
MLE of Gaussian σ
(µ,σ) =− N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(8)
∂
∂ σ
=−
N
σ
+0 +
1
σ3
i
(xi − µ)2
(9)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
MLE of Gaussian σ
(µ,σ) =− N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(8)
∂
∂ σ
=−
N
σ
+0 +
1
σ3
i
(xi − µ)2
(9)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
MLE of Gaussian σ
(µ,σ) =− N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(8)
∂
∂ σ
=−
N
σ
+0 +
1
σ3
i
(xi − µ)2
(9)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
MLE of Gaussian σ
(µ,σ) =− N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(8)
∂
∂ σ
=−
N
σ
+0 +
1
σ3
i
(xi − µ)2
(9)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
MLE of Gaussian σ
(µ,σ) =− N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(8)
∂
∂ σ
=−
N
σ
+0 +
1
σ3
i
(xi − µ)2
(9)
Solve for σ:
0 = −
n
σ
+
1
σ3
i
(xi − µ)2
(10)
N
σ
=
1
σ3
i
(xi − µ)2
(11)
σ2
= i (xi − µ)2
N
(12)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
MLE of Gaussian σ
(µ,σ) =− N logσ −
N
2
log(2π)−
1
2σ2
i
(xi − µ)2
(8)
∂
∂ σ
=−
N
σ
+0 +
1
σ3
i
(xi − µ)2
(9)
Solve for σ:
0 = −
n
σ
+
1
σ3
i
(xi − µ)2
(10)
N
σ
=
1
σ3
i
(xi − µ)2
(11)
σ2
= i (xi − µ)2
N
(12)
Consistent with what we said before
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
Maximum Likelihood Estimation
INFO-2301: Quantitative Reasoning 2
Michael Paul and Jordan Boyd-Graber
MARCH 7, 2017
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 8
Optimizing Constrained Functions
Theorem: Lagrange Multiplier Method
Given functions f(x1,...xn) and g(x1,...xn), the critical points of f
restricted to the set g = 0 are solutions to equations:
∂ f
∂ xi
(x1,...xn) =λ
∂ g
∂ xi
(x1,...xn) ∀i
g(x1,...xn) = 0
This is n +1 equations in the n +1 variables x1,...xn,λ.
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 8
Lagrange Example
Maximize (x,y) = xy subject to the constraint 20x +10y = 200.
• Compute derivatives
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8
Lagrange Example
Maximize (x,y) = xy subject to the constraint 20x +10y = 200.
• Compute derivatives
∂
∂ x
=
1
2
y
x
∂ g
∂ x
= 20
∂
∂ y
=
1
2
x
y
∂ g
∂ y
= 10
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8
Lagrange Example
Maximize (x,y) = xy subject to the constraint 20x +10y = 200.
• Compute derivatives
∂
∂ x
=
1
2
y
x
∂ g
∂ x
= 20
∂
∂ y
=
1
2
x
y
∂ g
∂ y
= 10
• Create new systems of equations
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8
Lagrange Example
Maximize (x,y) = xy subject to the constraint 20x +10y = 200.
• Compute derivatives
∂
∂ x
=
1
2
y
x
∂ g
∂ x
= 20
∂
∂ y
=
1
2
x
y
∂ g
∂ y
= 10
• Create new systems of equations
1
2
y
x
= 20λ
1
2
x
y
= 10λ
20x +10y = 200
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8
Lagrange Example
• Dividing the first equation by the second gives us
y
x
= 2 (1)
• which means y = 2x, plugging this into the constraint equation gives:
20x +10(2x) = 200
x = 5 ⇒ y = 10
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 8
Discrete Distribution: Multinomial
• Recall the mass function (N is total number of observations, xi is the
number for each cell, θi probability of cell)
p(x |θ) =
N!
i xi !
θ
xi
i (2)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 ...xN, then log likelihood is
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
Discrete Distribution: Multinomial
• Recall the mass function (N is total number of observations, xi is the
number for each cell, θi probability of cell)
p(x |θ) =
N!
i xi !
θ
xi
i (2)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 ...xN, then log likelihood is
(θ) ≡ log(n!)−
i
log(xi !)+
i
xi logθi (3)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
Discrete Distribution: Multinomial
• Recall the mass function (N is total number of observations, xi is the
number for each cell, θi probability of cell)
p(x |θ) =
N!
i xi !
θ
xi
i (2)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 ...xN, then log likelihood is
(θ) ≡ log(n!)−
i
log(xi !)+
i
xi logθi (3)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
Discrete Distribution: Multinomial
• Recall the mass function (N is total number of observations, xi is the
number for each cell, θi probability of cell)
p(x |θ) =
N!
i xi !
θ
xi
i (2)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 ...xN, then log likelihood is
(θ) ≡ log(n!)−
i
log(xi !)+
i
xi logθi (3)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
Discrete Distribution: Multinomial
• Recall the mass function (N is total number of observations, xi is the
number for each cell, θi probability of cell)
p(x |θ) =
N!
i xi !
θ
xi
i (2)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 ...xN, then log likelihood is
(θ) ≡ log(n!)−
i
log(xi !)+
i
xi logθi (3)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
MLE of Multinomial θ
(θ) =log(N!)−
i
log(xi !)+
i
xi logθi +λ 1 −
i
θi (4)
(5)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
MLE of Multinomial θ
(θ) =log(N!)−
i
log(xi !)+
i
xi logθi +λ 1 −
i
θi (4)
(5)
Where did this come from? Constraint that θ must be a distribution.
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
MLE of Multinomial θ
(θ) =log(N!)−
i
log(xi !)+
i
xi logθi +λ 1 −
i
θi (4)
(5)
• ∂
∂ θi
= xi
θi
− λ
• ∂
∂ λ = 1 − i θi
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
MLE of Multinomial θ
(θ) =log(N!)−
i
log(xi !)+
i
xi logθi +λ 1 −
i
θi (4)
(5)
• ∂
∂ θi
= xi
θi
− λ
• ∂
∂ λ = 1 − i θi
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
MLE of Multinomial θ
(θ) =log(N!)−
i
log(xi !)+
i
xi logθi +λ 1 −
i
θi (4)
(5)
• ∂
∂ θi
= xi
θi
− λ
• ∂
∂ λ = 1 − i θi
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
MLE of Multinomial θ
(θ) =log(N!)−
i
log(xi !)+
i
xi logθi +λ 1 −
i
θi (4)
(5)
• ∂
∂ θi
= xi
θi
− λ
• ∂
∂ λ = 1 − i θi
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
MLE of Multinomial θ
• We have system of equations
θ1 =
x1
λ
(6)
...
... (7)
θK =
xK
λ
(8)
i
θi =1 (9)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8
MLE of Multinomial θ
• We have system of equations
θ1 =
x1
λ
(6)
...
... (7)
θK =
xK
λ
(8)
i
θi =1 (9)
• So let’s substitute the first K equations into the last:
i
xi
λ
= 1 (10)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8
MLE of Multinomial θ
• We have system of equations
θ1 =
x1
λ
(6)
...
... (7)
θK =
xK
λ
(8)
i
θi =1 (9)
• So let’s substitute the first K equations into the last:
i
xi
λ
= 1 (10)
• So λ = i xi = N,
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8
MLE of Multinomial θ
• We have system of equations
θ1 =
x1
λ
(6)
...
... (7)
θK =
xK
λ
(8)
i
θi =1 (9)
• So let’s substitute the first K equations into the last:
i
xi
λ
= 1 (10)
• So λ = i xi = N, and θi = xi
N
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8
Maximum Likelihood Estimation
INFO-2301: Quantitative Reasoning 2
Michael Paul and Jordan Boyd-Graber
MARCH 7, 2017
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 6
Big Pictures
• Ran through several common examples
• For existing distributions you can (and should) look up MLE
• For new models, you can’t (foreshadowing of later in class)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 6
Big Pictures
• Ran through several common examples
• For existing distributions you can (and should) look up MLE
• For new models, you can’t (foreshadowing of later in class)
◦ Classification models
◦ Unsupervised models (Expectation-Maximization)
• Not always so easy
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 6
Classification
• Classification can be viewed as
p(y |x,θ)
• Have x,y, need θ
• Discovering θ is also problem of
MLE
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 6
Clustering
• Clustering can be viewed as
p(x |z,θ)
• Have x, need z,θ
• z is guessed at iteratively
(Expectation)
• θ estimated to maximize
likelihood (Maximization)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 6
Not always so easy: Bias
• An estimator is biased if ˆθ = θ
• We won’t prove it, but the estimate for variance is biased
• Comes from estimating µ, so need to “shrink” variance
ˆσ2
=
1
N − 1 i
(xi − µ)2
(1)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 6
Not always so easy: Intractable Likelihoods
• Not always possible to “solve for” optimal estimator
• Use gradient optimization (we’ll see this for logistic regression)
• Use other approximations (e.g., Monte Carlo sampling)
• Whole subfield of statistics / information science
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 6

More Related Content

What's hot

On approximate bounds of zeros of polynomials within
On approximate bounds of zeros of polynomials withinOn approximate bounds of zeros of polynomials within
On approximate bounds of zeros of polynomials withineSAT Publishing House
 
Power point
Power pointPower point
Power point42445711
 
Gauss Quadrature Formula
Gauss Quadrature FormulaGauss Quadrature Formula
Gauss Quadrature FormulaMaitree Patel
 
A Proof of Twin primes and Golbach's Conjecture
A Proof of Twin primes and Golbach's ConjectureA Proof of Twin primes and Golbach's Conjecture
A Proof of Twin primes and Golbach's Conjecturenikos mantzakouras
 
On fixed point theorem in fuzzy metric spaces
On fixed point theorem in fuzzy metric spacesOn fixed point theorem in fuzzy metric spaces
On fixed point theorem in fuzzy metric spacesAlexander Decker
 
S. Duplij. Polyadic algebraic structures and their applications
S. Duplij. Polyadic algebraic structures and their applicationsS. Duplij. Polyadic algebraic structures and their applications
S. Duplij. Polyadic algebraic structures and their applicationsSteven Duplij (Stepan Douplii)
 
線形回帰モデル
線形回帰モデル線形回帰モデル
線形回帰モデル貴之 八木
 
Chapter 6 algebraic expressions iii
Chapter 6   algebraic expressions iiiChapter 6   algebraic expressions iii
Chapter 6 algebraic expressions iiiKhusaini Majid
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Distributive Property 7th
Distributive Property 7thDistributive Property 7th
Distributive Property 7thjscafidi7
 
2.7 distributive property day 3
2.7 distributive property day 32.7 distributive property day 3
2.7 distributive property day 3bweldon
 
An applied two dimensional b-spline model for
An applied two dimensional b-spline model forAn applied two dimensional b-spline model for
An applied two dimensional b-spline model forIAEME Publication
 
On Triplet of Positive Integers Such That the Sum of Any Two of Them is a Per...
On Triplet of Positive Integers Such That the Sum of Any Two of Them is a Per...On Triplet of Positive Integers Such That the Sum of Any Two of Them is a Per...
On Triplet of Positive Integers Such That the Sum of Any Two of Them is a Per...inventionjournals
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsGraph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsLuc Brun
 
Daniel Hong ENGR 019 Q6
Daniel Hong ENGR 019 Q6Daniel Hong ENGR 019 Q6
Daniel Hong ENGR 019 Q6Daniel Hong
 

What's hot (20)

On approximate bounds of zeros of polynomials within
On approximate bounds of zeros of polynomials withinOn approximate bounds of zeros of polynomials within
On approximate bounds of zeros of polynomials within
 
Algebra 6
Algebra 6Algebra 6
Algebra 6
 
Power point
Power pointPower point
Power point
 
Gauss Quadrature Formula
Gauss Quadrature FormulaGauss Quadrature Formula
Gauss Quadrature Formula
 
Bc4103338340
Bc4103338340Bc4103338340
Bc4103338340
 
A Proof of Twin primes and Golbach's Conjecture
A Proof of Twin primes and Golbach's ConjectureA Proof of Twin primes and Golbach's Conjecture
A Proof of Twin primes and Golbach's Conjecture
 
X std maths - Relations and functions (ex 1.4)
X std maths - Relations and functions  (ex 1.4)X std maths - Relations and functions  (ex 1.4)
X std maths - Relations and functions (ex 1.4)
 
On fixed point theorem in fuzzy metric spaces
On fixed point theorem in fuzzy metric spacesOn fixed point theorem in fuzzy metric spaces
On fixed point theorem in fuzzy metric spaces
 
S. Duplij. Polyadic algebraic structures and their applications
S. Duplij. Polyadic algebraic structures and their applicationsS. Duplij. Polyadic algebraic structures and their applications
S. Duplij. Polyadic algebraic structures and their applications
 
L(2,1)-labeling
L(2,1)-labelingL(2,1)-labeling
L(2,1)-labeling
 
線形回帰モデル
線形回帰モデル線形回帰モデル
線形回帰モデル
 
勾配法
勾配法勾配法
勾配法
 
Chapter 6 algebraic expressions iii
Chapter 6   algebraic expressions iiiChapter 6   algebraic expressions iii
Chapter 6 algebraic expressions iii
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Distributive Property 7th
Distributive Property 7thDistributive Property 7th
Distributive Property 7th
 
2.7 distributive property day 3
2.7 distributive property day 32.7 distributive property day 3
2.7 distributive property day 3
 
An applied two dimensional b-spline model for
An applied two dimensional b-spline model forAn applied two dimensional b-spline model for
An applied two dimensional b-spline model for
 
On Triplet of Positive Integers Such That the Sum of Any Two of Them is a Per...
On Triplet of Positive Integers Such That the Sum of Any Two of Them is a Per...On Triplet of Positive Integers Such That the Sum of Any Two of Them is a Per...
On Triplet of Positive Integers Such That the Sum of Any Two of Them is a Per...
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsGraph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & Trends
 
Daniel Hong ENGR 019 Q6
Daniel Hong ENGR 019 Q6Daniel Hong ENGR 019 Q6
Daniel Hong ENGR 019 Q6
 

Similar to MLE Gaussian

MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4arogozhnikov
 
Swartz Factoring
Swartz FactoringSwartz Factoring
Swartz Factoringswartzje
 
A new generalized lindley distribution
A new generalized lindley distributionA new generalized lindley distribution
A new generalized lindley distributionAlexander Decker
 
Logistic Regression/Markov Chain presentation
Logistic Regression/Markov Chain presentationLogistic Regression/Markov Chain presentation
Logistic Regression/Markov Chain presentationMichael Hankin
 
Factoring Polynomials
Factoring PolynomialsFactoring Polynomials
Factoring Polynomialsitutor
 
Lecture 03 factoring polynomials good one
Lecture 03 factoring polynomials good oneLecture 03 factoring polynomials good one
Lecture 03 factoring polynomials good oneHazel Joy Chong
 
Math083 day 1 chapter 6 2013 fall
Math083 day 1 chapter 6 2013 fallMath083 day 1 chapter 6 2013 fall
Math083 day 1 chapter 6 2013 falljbianco9910
 
ch13powerpoint.ppt
ch13powerpoint.pptch13powerpoint.ppt
ch13powerpoint.pptJasminNarag2
 
16 partial fraction decompositions x
16 partial fraction decompositions x16 partial fraction decompositions x
16 partial fraction decompositions xmath266
 
Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Derbew Tesfa
 
Factoring by grouping ppt
Factoring by grouping pptFactoring by grouping ppt
Factoring by grouping pptDoreen Mhizha
 
The Use of Fuzzy Optimization Methods for Radiation.ppt
The Use of Fuzzy Optimization Methods for Radiation.pptThe Use of Fuzzy Optimization Methods for Radiation.ppt
The Use of Fuzzy Optimization Methods for Radiation.ppteslameslam18
 

Similar to MLE Gaussian (20)

Equivariance
EquivarianceEquivariance
Equivariance
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
Ch02 fuzzyrelation
Ch02 fuzzyrelationCh02 fuzzyrelation
Ch02 fuzzyrelation
 
Swartz Factoring
Swartz FactoringSwartz Factoring
Swartz Factoring
 
A new generalized lindley distribution
A new generalized lindley distributionA new generalized lindley distribution
A new generalized lindley distribution
 
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
 
Logistic Regression/Markov Chain presentation
Logistic Regression/Markov Chain presentationLogistic Regression/Markov Chain presentation
Logistic Regression/Markov Chain presentation
 
Regression (I)
Regression (I)Regression (I)
Regression (I)
 
Factoring Polynomials
Factoring PolynomialsFactoring Polynomials
Factoring Polynomials
 
AnsChap1.pdf
AnsChap1.pdfAnsChap1.pdf
AnsChap1.pdf
 
Lecture 03 factoring polynomials good one
Lecture 03 factoring polynomials good oneLecture 03 factoring polynomials good one
Lecture 03 factoring polynomials good one
 
Math083 day 1 chapter 6 2013 fall
Math083 day 1 chapter 6 2013 fallMath083 day 1 chapter 6 2013 fall
Math083 day 1 chapter 6 2013 fall
 
ch13powerpoint.ppt
ch13powerpoint.pptch13powerpoint.ppt
ch13powerpoint.ppt
 
16 partial fraction decompositions x
16 partial fraction decompositions x16 partial fraction decompositions x
16 partial fraction decompositions x
 
數理方法 2 1-5
數理方法 2 1-5數理方法 2 1-5
數理方法 2 1-5
 
Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research
 
factoring polynomials
factoring polynomialsfactoring polynomials
factoring polynomials
 
Factoring by grouping ppt
Factoring by grouping pptFactoring by grouping ppt
Factoring by grouping ppt
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
The Use of Fuzzy Optimization Methods for Radiation.ppt
The Use of Fuzzy Optimization Methods for Radiation.pptThe Use of Fuzzy Optimization Methods for Radiation.ppt
The Use of Fuzzy Optimization Methods for Radiation.ppt
 

Recently uploaded

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.k64182334
 
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsAhmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsoolala9823
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.
 
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsAhmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 

MLE Gaussian

  • 1. Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 9
  • 2. Why MLE? • Before: Distribution + Parameter → x • Now: x + Distribution → Parameter • (Much more realistic) • But: Says nothing about how good a fit a distribution is INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 9
  • 3. Likelihood • Likelihood is p(x;θ) • We want estimate of θ that best explains data we seen • I.e., Maximum Likelihood Estimate (MLE) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 9
  • 4. Likelihood • The likelihood function refers to the PMF (discrete) or PDF (continuous). • For discrete distributions, the likelihood of x is P(X = x). • For continuous distributions, the likelihood of x is the density f(x). • We will often refer to likelihood rather than probability/mass/density so that the term applies to either scenario. INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 9
  • 5. Optimizing Unconstrained Functions Suppose we wanted to optimize = x2 − 2x +2 (1) -5 -4 -3 -2 -1 0 1 2 3 4 5 -3 -2 -1 1 2 3 INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 9
  • 6. Optimizing Unconstrained Functions Suppose we wanted to optimize = x2 − 2x +2 (1) ∂ ∂ x = −2x − 2 (2) -5 -4 -3 -2 -1 0 1 2 3 4 5 -3 -2 -1 1 2 3 INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 9
  • 7. Optimizing Unconstrained Functions ∂ ∂ x =0 (3) −2x − 2 =0 (4) x =− 1 (5) (Should also check that second derivative is negative) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 9
  • 8. Optimizing Constrained Functions Theorem: Lagrange Multiplier Method Given functions f(x1,...xn) and g(x1,...xn), the critical points of f restricted to the set g = 0 are solutions to equations: ∂ f ∂ xi (x1,...xn) =λ ∂ g ∂ xi (x1,...xn) ∀i g(x1,...xn) = 0 This is n +1 equations in the n +1 variables x1,...xn,λ. INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 9
  • 9. Lagrange Example Maximize (x,y) = xy subject to the constraint 20x +10y = 200. • Compute derivatives INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9
  • 10. Lagrange Example Maximize (x,y) = xy subject to the constraint 20x +10y = 200. • Compute derivatives ∂ ∂ x = 1 2 y x ∂ g ∂ x = 20 ∂ ∂ y = 1 2 x y ∂ g ∂ y = 10 INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9
  • 11. Lagrange Example Maximize (x,y) = xy subject to the constraint 20x +10y = 200. • Compute derivatives ∂ ∂ x = 1 2 y x ∂ g ∂ x = 20 ∂ ∂ y = 1 2 x y ∂ g ∂ y = 10 • Create new systems of equations INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9
  • 12. Lagrange Example Maximize (x,y) = xy subject to the constraint 20x +10y = 200. • Compute derivatives ∂ ∂ x = 1 2 y x ∂ g ∂ x = 20 ∂ ∂ y = 1 2 x y ∂ g ∂ y = 10 • Create new systems of equations 1 2 y x = 20λ 1 2 x y = 10λ 20x +10y = 200 INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9
  • 13. Lagrange Example • Dividing the first equation by the second gives us y x = 2 (6) • which means y = 2x, plugging this into the constraint equation gives: 20x +10(2x) = 200 x = 5 ⇒ y = 10 INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 9 of 9
  • 14. Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 4
  • 15. Continuous Distribution: Gaussian • Recall the density function f(x) = 1 2πσ2 exp − (x − µ)2 2σ2 (1) • Taking the log makes math easier, doesn’t change answer (monotonic) • If we observe x1 ...xN, then log likelihood is INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
  • 16. Continuous Distribution: Gaussian • Recall the density function f(x) = 1 2πσ2 exp − (x − µ)2 2σ2 (1) • Taking the log makes math easier, doesn’t change answer (monotonic) • If we observe x1 ...xN, then log likelihood is (µ,σ) ≡ − N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (2) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
  • 17. Continuous Distribution: Gaussian • Recall the density function f(x) = 1 2πσ2 exp − (x − µ)2 2σ2 (1) • Taking the log makes math easier, doesn’t change answer (monotonic) • If we observe x1 ...xN, then log likelihood is (µ,σ) ≡ − N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (2) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
  • 18. Continuous Distribution: Gaussian • Recall the density function f(x) = 1 2πσ2 exp − (x − µ)2 2σ2 (1) • Taking the log makes math easier, doesn’t change answer (monotonic) • If we observe x1 ...xN, then log likelihood is (µ,σ) ≡ − N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (2) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
  • 19. Continuous Distribution: Gaussian • Recall the density function f(x) = 1 2πσ2 exp − (x − µ)2 2σ2 (1) • Taking the log makes math easier, doesn’t change answer (monotonic) • If we observe x1 ...xN, then log likelihood is (µ,σ) ≡ − N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (2) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
  • 20. MLE of Gaussian µ (µ,σ) =− N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (3) ∂ ∂ µ =0 + 1 σ2 i (xi − µ) (4) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
  • 21. MLE of Gaussian µ (µ,σ) =− N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (3) ∂ ∂ µ =0 + 1 σ2 i (xi − µ) (4) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
  • 22. MLE of Gaussian µ (µ,σ) =− N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (3) ∂ ∂ µ =0 + 1 σ2 i (xi − µ) (4) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
  • 23. MLE of Gaussian µ (µ,σ) =− N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (3) ∂ ∂ µ =0 + 1 σ2 i (xi − µ) (4) Solve for µ: 0 = 1 σ2 i (xi − µ) (5) 0 = i xi − Nµ (6) µ = i xi N (7) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
  • 24. MLE of Gaussian µ (µ,σ) =− N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (3) ∂ ∂ µ =0 + 1 σ2 i (xi − µ) (4) Solve for µ: 0 = 1 σ2 i (xi − µ) (5) 0 = i xi − Nµ (6) µ = i xi N (7) Consistent with what we said before INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
  • 25. MLE of Gaussian σ (µ,σ) =− N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (8) ∂ ∂ σ =− N σ +0 + 1 σ3 i (xi − µ)2 (9) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
  • 26. MLE of Gaussian σ (µ,σ) =− N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (8) ∂ ∂ σ =− N σ +0 + 1 σ3 i (xi − µ)2 (9) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
  • 27. MLE of Gaussian σ (µ,σ) =− N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (8) ∂ ∂ σ =− N σ +0 + 1 σ3 i (xi − µ)2 (9) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
  • 28. MLE of Gaussian σ (µ,σ) =− N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (8) ∂ ∂ σ =− N σ +0 + 1 σ3 i (xi − µ)2 (9) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
  • 29. MLE of Gaussian σ (µ,σ) =− N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (8) ∂ ∂ σ =− N σ +0 + 1 σ3 i (xi − µ)2 (9) Solve for σ: 0 = − n σ + 1 σ3 i (xi − µ)2 (10) N σ = 1 σ3 i (xi − µ)2 (11) σ2 = i (xi − µ)2 N (12) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
  • 30. MLE of Gaussian σ (µ,σ) =− N logσ − N 2 log(2π)− 1 2σ2 i (xi − µ)2 (8) ∂ ∂ σ =− N σ +0 + 1 σ3 i (xi − µ)2 (9) Solve for σ: 0 = − n σ + 1 σ3 i (xi − µ)2 (10) N σ = 1 σ3 i (xi − µ)2 (11) σ2 = i (xi − µ)2 N (12) Consistent with what we said before INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
  • 31. Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 8
  • 32. Optimizing Constrained Functions Theorem: Lagrange Multiplier Method Given functions f(x1,...xn) and g(x1,...xn), the critical points of f restricted to the set g = 0 are solutions to equations: ∂ f ∂ xi (x1,...xn) =λ ∂ g ∂ xi (x1,...xn) ∀i g(x1,...xn) = 0 This is n +1 equations in the n +1 variables x1,...xn,λ. INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 8
  • 33. Lagrange Example Maximize (x,y) = xy subject to the constraint 20x +10y = 200. • Compute derivatives INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8
  • 34. Lagrange Example Maximize (x,y) = xy subject to the constraint 20x +10y = 200. • Compute derivatives ∂ ∂ x = 1 2 y x ∂ g ∂ x = 20 ∂ ∂ y = 1 2 x y ∂ g ∂ y = 10 INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8
  • 35. Lagrange Example Maximize (x,y) = xy subject to the constraint 20x +10y = 200. • Compute derivatives ∂ ∂ x = 1 2 y x ∂ g ∂ x = 20 ∂ ∂ y = 1 2 x y ∂ g ∂ y = 10 • Create new systems of equations INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8
  • 36. Lagrange Example Maximize (x,y) = xy subject to the constraint 20x +10y = 200. • Compute derivatives ∂ ∂ x = 1 2 y x ∂ g ∂ x = 20 ∂ ∂ y = 1 2 x y ∂ g ∂ y = 10 • Create new systems of equations 1 2 y x = 20λ 1 2 x y = 10λ 20x +10y = 200 INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8
  • 37. Lagrange Example • Dividing the first equation by the second gives us y x = 2 (1) • which means y = 2x, plugging this into the constraint equation gives: 20x +10(2x) = 200 x = 5 ⇒ y = 10 INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 8
  • 38. Discrete Distribution: Multinomial • Recall the mass function (N is total number of observations, xi is the number for each cell, θi probability of cell) p(x |θ) = N! i xi ! θ xi i (2) • Taking the log makes math easier, doesn’t change answer (monotonic) • If we observe x1 ...xN, then log likelihood is INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
  • 39. Discrete Distribution: Multinomial • Recall the mass function (N is total number of observations, xi is the number for each cell, θi probability of cell) p(x |θ) = N! i xi ! θ xi i (2) • Taking the log makes math easier, doesn’t change answer (monotonic) • If we observe x1 ...xN, then log likelihood is (θ) ≡ log(n!)− i log(xi !)+ i xi logθi (3) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
  • 40. Discrete Distribution: Multinomial • Recall the mass function (N is total number of observations, xi is the number for each cell, θi probability of cell) p(x |θ) = N! i xi ! θ xi i (2) • Taking the log makes math easier, doesn’t change answer (monotonic) • If we observe x1 ...xN, then log likelihood is (θ) ≡ log(n!)− i log(xi !)+ i xi logθi (3) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
  • 41. Discrete Distribution: Multinomial • Recall the mass function (N is total number of observations, xi is the number for each cell, θi probability of cell) p(x |θ) = N! i xi ! θ xi i (2) • Taking the log makes math easier, doesn’t change answer (monotonic) • If we observe x1 ...xN, then log likelihood is (θ) ≡ log(n!)− i log(xi !)+ i xi logθi (3) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
  • 42. Discrete Distribution: Multinomial • Recall the mass function (N is total number of observations, xi is the number for each cell, θi probability of cell) p(x |θ) = N! i xi ! θ xi i (2) • Taking the log makes math easier, doesn’t change answer (monotonic) • If we observe x1 ...xN, then log likelihood is (θ) ≡ log(n!)− i log(xi !)+ i xi logθi (3) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
  • 43. MLE of Multinomial θ (θ) =log(N!)− i log(xi !)+ i xi logθi +λ 1 − i θi (4) (5) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
  • 44. MLE of Multinomial θ (θ) =log(N!)− i log(xi !)+ i xi logθi +λ 1 − i θi (4) (5) Where did this come from? Constraint that θ must be a distribution. INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
  • 45. MLE of Multinomial θ (θ) =log(N!)− i log(xi !)+ i xi logθi +λ 1 − i θi (4) (5) • ∂ ∂ θi = xi θi − λ • ∂ ∂ λ = 1 − i θi INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
  • 46. MLE of Multinomial θ (θ) =log(N!)− i log(xi !)+ i xi logθi +λ 1 − i θi (4) (5) • ∂ ∂ θi = xi θi − λ • ∂ ∂ λ = 1 − i θi INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
  • 47. MLE of Multinomial θ (θ) =log(N!)− i log(xi !)+ i xi logθi +λ 1 − i θi (4) (5) • ∂ ∂ θi = xi θi − λ • ∂ ∂ λ = 1 − i θi INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
  • 48. MLE of Multinomial θ (θ) =log(N!)− i log(xi !)+ i xi logθi +λ 1 − i θi (4) (5) • ∂ ∂ θi = xi θi − λ • ∂ ∂ λ = 1 − i θi INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
  • 49. MLE of Multinomial θ • We have system of equations θ1 = x1 λ (6) ... ... (7) θK = xK λ (8) i θi =1 (9) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8
  • 50. MLE of Multinomial θ • We have system of equations θ1 = x1 λ (6) ... ... (7) θK = xK λ (8) i θi =1 (9) • So let’s substitute the first K equations into the last: i xi λ = 1 (10) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8
  • 51. MLE of Multinomial θ • We have system of equations θ1 = x1 λ (6) ... ... (7) θK = xK λ (8) i θi =1 (9) • So let’s substitute the first K equations into the last: i xi λ = 1 (10) • So λ = i xi = N, INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8
  • 52. MLE of Multinomial θ • We have system of equations θ1 = x1 λ (6) ... ... (7) θK = xK λ (8) i θi =1 (9) • So let’s substitute the first K equations into the last: i xi λ = 1 (10) • So λ = i xi = N, and θi = xi N INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8
  • 53. Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 6
  • 54. Big Pictures • Ran through several common examples • For existing distributions you can (and should) look up MLE • For new models, you can’t (foreshadowing of later in class) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 6
  • 55. Big Pictures • Ran through several common examples • For existing distributions you can (and should) look up MLE • For new models, you can’t (foreshadowing of later in class) ◦ Classification models ◦ Unsupervised models (Expectation-Maximization) • Not always so easy INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 6
  • 56. Classification • Classification can be viewed as p(y |x,θ) • Have x,y, need θ • Discovering θ is also problem of MLE INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 6
  • 57. Clustering • Clustering can be viewed as p(x |z,θ) • Have x, need z,θ • z is guessed at iteratively (Expectation) • θ estimated to maximize likelihood (Maximization) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 6
  • 58. Not always so easy: Bias • An estimator is biased if ˆθ = θ • We won’t prove it, but the estimate for variance is biased • Comes from estimating µ, so need to “shrink” variance ˆσ2 = 1 N − 1 i (xi − µ)2 (1) INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 6
  • 59. Not always so easy: Intractable Likelihoods • Not always possible to “solve for” optimal estimator • Use gradient optimization (we’ll see this for logistic regression) • Use other approximations (e.g., Monte Carlo sampling) • Whole subfield of statistics / information science INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 6