SlideShare a Scribd company logo
Maximum Likelihood Estimation
02-680 Essential Mathematics and Statistics for Scientists
DeGroot Ch 7.5
Wasserman Ch 9
Overview
• Our recipe for learning
Data
Step 1: collect data
Samples
Features
Step 0: form a scientific hypothesis
Step 2:
Pick the
appropriate
probability
distribution,
with parameter 𝜃
Step 3:
Estimate the
parameter 𝜃
Coin Flip Example
• Our recipe for learning
Step 1: collect data
Samples
Feature
Step 0: I got a new coin. What is the probability of head?
Step 2:
Pick the
appropriate
probability
distribution,
with parameter 𝜃
Step 3:
Estimate the
parameter 𝜃
Weight Example
• Our recipe for learning
Step 1: collect data
Samples
Feature
Step 0: What is the distribution of weights of people in Pittsburgh?
Step 2:
Pick the
appropriate
probability
distribution,
with parameter 𝜃
Step 3:
Estimate the
parameter 𝜃
Before estimation in Step 3, we need to set things up
Statistical Inference, Learning: a Formal Set Up
• Given a sample X1,...,Xn ∼ F, how do we learn our probability model?
– Statistical model F: a set of distributions
– Parametric model: F that can be parameterized by a finite number of parameters
F = { p( x| 𝜃 ): 𝜃 ∈Θ },
where 𝜃 = {𝜃1, ..., 𝜃k} is unknown parameters θ ∈ Θ.
Parametric Models: Coin Flip
Ex) Probability of head p of a coin
Bernoulli distribution
Parameter:
Parametric model:
Parametric Models: Gene Expression
Ex) Learn the probability distribution of the expression levels of gene A
Normal distribution
Parameters:
Parametric model:
Which Parametric Model?
• Discrete or continuous?
• Univariate or multivariate features?
• Prior knowledge
ex) Count data in intervals: Poisson distribution
ex) real-value, bell-shape: Normal distribution
• Evidence from exploratory analysis of data
ex) relationship between mean and variance: Poisson vs negative binomial distribution
Overview
• Our recipe for learning
Data
Step 1: collect data
Samples
Features
Step 0: form a scientific hypothesis
Step 2:
Pick the
appropriate
probability
distribution,
with parameter 𝜃
Step 3:
Estimate the
parameter 𝜃
Maximum Likelihood Estimation
• The most commonly used method for parametric estimation
• MLE for probability distributions we have seen so far
• MLE for complex state-of-the-art probability models
Parametric Models Meet Data
• Ingredient 1: Given our choice of parametric model X ~ F, for a random variable X
• Ingredient 2: Collect data D for n samples. Model each sample as a random variable,
X1,...,Xn
• Recipe: Maximum likelihood estimation (MLE)
– Pick a member in the set F that maximizes the “likelihood of data D”
Illustration for MLE
Formally, Parametric Estimation, Point Estimation
• Select f ∈ F that best describes data
• Select parameter 𝜃 that best describes data
• Inference: estimate 𝜃 given data
• Maximum likelihood estimate: the parameter 𝜃 estimated from data using MLE
MLE as an Optimization Problem
• How can we select parameter 𝜃 that best describes data?
• Maximum likelihood estimation as an optimization
– Likelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data
– Maximize the likelihood: find the parameter 𝜃 that maximizes the score function
“Likelihood of Data”
• Assume X1,...,Xn are i.i.d. random variables, representing samples from a distribution
P(X|𝜃) in F. Then, the “likelihood of data” is defined as the probability of data D
P (D| 𝜃) =
“Likelihood of Data”
• Assume X1,...,Xn are i.i.d. random variables, representing samples from a distribution
P(X|𝜃) in F.
• The likelihood function, also called the likelihood of data, is given as
𝐿"(𝜃) = ∏#$%
"
P(Xi |𝜃)
• Function of parameter 𝜃 given data X1,...,Xn
“Log Likelihood of Data”
• The log likelihood function is
𝑙"(𝜃) = log 𝐿"(𝜃)
=
• Why log likelihood instead of likelihood?
MLE as an Optimization Problem
• How can we select parameter 𝜃 that best describes data?
• Maximum likelihood estimation as an optimization
– Llikelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data
– Maximize the likelihood: find the parameter 𝜽 that maximizes the score function
Maximum Likelihood Estimation
• Maximum likelihood estimator
*
𝜃" = argmax
&
𝐿"(𝜃)
= argmax
&
𝑙"(𝜃)
log is a monotonically increasing concave function
MLE as an Optimization Problem
• How can we select parameter 𝜃 that best describes data?
• Maximum likelihood estimation as an optimization
– Llikelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data
– Maximize the likelihood: find the parameter 𝜽 that maximizes the score function
Illustration for MLE
MLE, Examples
• Let’s perform MLE
– Bernoulli distribution
• You have a coin with unknown
probability of head, p
• You flip the coin 10 times and get
H, H, T, T, H, T, T, H, T, H
• What is your estimate of the
probability of head p?
– Normal distribution
• You have a gene, gene A, whose
expression level and variance are
unknown
• You collect the expression
measurements of gene A for 7
individuals
• How would you estimate for mean and
variance?
MLE: Bernoulli
• Let X1,...,Xn ∼ Bernoulli(p). Find a maximum likelihood estimate p.
• Step 1: Write down the log likelihood of data
𝐿"(𝑝) =
𝑙"(𝑝) =
MLE: Bernoulli Distribution
• Step 2: Maximize the log likelihood
argmax
'
𝑙"(𝑝)
MLE: Bernoulli Distribution with Observed Data
• Let X ∼ Bernoulli(p).
• Flipped the coin 10 times and got
H, H, T, T, T, H, H, T, T, H
• Step 1: Write down the log likelihood of data
𝐿(𝑝; 𝐷) =
𝑙(𝑝; 𝐷) =
MLE: Bernoulli Distribution
• Step 2: Maximize the log likelihood
argmax
'
𝑙(𝑝; 𝐷)
MLE: Normal Distribution
• Let X1,...,Xn ∼ N(µ, σ2). Find a maximum likelihood estimate of µ, σ2 .
• Step 1: Write down the log likelihood of data
𝐿"(µ, σ2) =
𝑙"(µ, σ2) =
MLE: Normal Distribution
• Step 2: Maximize the log likelihood
argmax
µ, σ2
𝑙"(µ, σ2)
MLE for µ
MLE: Normal Distribution
• Step 2: Maximize the log likelihood
argmax
µ, σ2
𝑙"(µ, σ2)
MLE for σ2
MLE: Normal Distribution with Observations
• Let X ∼ N(µ, σ2). Find a maximum likelihood estimate of µ, σ2 .
• Data with 10 samples
10, 12, 9, 14, 8, 11, 7, 6, 10.5, 12.5
• Step 1: Write down the log likelihood of data
𝐿"(µ, σ2) =
𝑙"(µ, σ2) =
MLE: Poisson Distribution
• Let X1,...,Xn ∼ Poisson(𝜆). Find a maximum likelihood estimate 𝜆.
• Step 1: Write down the log likelihood of data
𝐿"(𝜆) =
𝑙"(𝜆) =
MLE: Poisson Distribution
• Step 2: Maximize the log likelihood
argmax
(
𝑙"(𝜆)
Markov Model
• Joint distribution of all binary random variables X1, . . . , XT
P(X1, . . . , XT )
• A Markov model is defined by
– P(X1): initial distribution
– P(Xk| Xk-1): transition probabilities identical for k=2,…,T
MLE: Markov Chain
• Let X1,...,XT follows Markov chain. Find a maximum likelihood estimate of the
Markov chain parameters 𝜃.
• Assume three sequences of observations D
1, 0, 0, 1, 1
1, 0, 1, 1, 0
0, 1, 0, 0, 1
MLE: Markov Chain, Intuition
• Assume three sequences of observations D
1, 0, 0, 1, 1
1, 0, 1, 1, 0
0, 1, 0, 0, 1
• What is your estimate of P(X1)?
MLE: Markov Chain, Intuition
• Assume three sequences of observations D
1, 0, 0, 1, 1
1, 0, 1, 1, 0
0, 1, 0, 0, 1
• What is your estimate of P(Xk| Xk-1=0)?
MLE: Markov Chain, Intuition
• Assume three sequences of observations D
1, 0, 0, 1, 1
1, 0, 1, 1, 0
0, 1, 0, 0, 1
• What is your estimate of P(Xk| Xk-1=1)?
MLE: Markov Chain
• Let X1,...,XT follows Markov chain. Find a maximum likelihood estimate of the
Markov chain parameters 𝜃.
• Assume three sequences of observations D
X1
1,...,XT
1
X1
2,...,XT
2
X1
3,...,XT
3
MLE: Markov Chain
• Step 1: Write down the log likelihood of data
𝐿"(𝜃) =
𝑙"(𝜃) =
MLE: Markov Chain
• Step 2: Maximize the log likelihood (initial probabilities)
argmax
&
𝑙"(𝜃)
MLE: Markov Chain
• Step 2: Maximize the log likelihood (transition probabilities)
argmax
&
𝑙"(𝜃)
MLE in Practice
• Follow the recipe and work out MLE on paper, given probability model
• Write a program to compute a maximum likelihood estimate of parameters from
data
– Load data into memory
– For normal distribution, compute μ = sample mean, σ2 = sample variance
– For Bernoulli distribution, compute p = proportion of successes
• Simple but important learning principle!
Maximum Likelihood Estimation as an Optimization Problem
• To find an MLE of the parameter 𝜃, we solve the following optimization problem
*
𝜃" = argmax
&
𝑙"(𝜃)
• In our examples for Bernoulli, univariate/multivariate normal distributions
– 𝐿!(𝜃) is a convex function: a single global maximum
– a closed-form solution exists
Maximum Likelihood Estimation as an Optimization Problem
• For some models, MLE is easy
• In general, in more complex probability models, performing MLE is not always easy
– 𝐿!(𝜃) is a non-convex function, multiple local maxima
– a closed-form solution does not exist, need to rely on iterative optimization methods
Other Probability Models and MLE: Neural Networks
• Deep neural nets for modeling P(Y|X)
• Optimization criterion for learning the model: MLE!
• No closed form solution for the parameter estimates:
– Rely on iterative optimization method
Summary
• Maximum likelihood estimation is the most commonly used technique for learning a
parametric probability models from data
• MLE, recipe
– Write down the log likelihood
– Differentiate the log likelihood with respect to the parameters
– Set the above to zero and solve for the parameters
• MLE for
– Bernoulli distribution
– Univariate/multivariate normal distributions

More Related Content

Similar to MLE.pdf

kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptx
Aryanhayaran
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical View
Chia-Ching Lin
 
Unit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxUnit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptx
smithashetty24
 
Into to prob_prog_hari
Into to prob_prog_hariInto to prob_prog_hari
Into to prob_prog_hari
Hariharan Chandrasekaran
 
Applications of Multivariable Calculus.ppt
Applications of Multivariable Calculus.pptApplications of Multivariable Calculus.ppt
Applications of Multivariable Calculus.ppt
saiprashanth973626
 
Machine learning
Machine learningMachine learning
Machine learning
Sukhwinder Singh
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
GaytriDhingra1
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
GaytriDhingra1
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
Natan Katz
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
hoangminhdong
 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
AnjaliJain608033
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.ppt
ssuserec53e73
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.ppt
ssuserec53e73
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.ppt
ssuser71fb63
 
ML-04.pdf
ML-04.pdfML-04.pdf
ML-04.pdf
Mohammad Akbari
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdf
HODIT12
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
MostafaHazemMostafaa
 
Lecture5.pptx
Lecture5.pptxLecture5.pptx
Lecture5.pptx
ARVIND SARDAR
 
Anomaly detection Full Article
Anomaly detection Full ArticleAnomaly detection Full Article
Anomaly detection Full Article
MenglinLiu1
 
13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf
EmanAsem4
 

Similar to MLE.pdf (20)

kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptx
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical View
 
Unit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxUnit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptx
 
Into to prob_prog_hari
Into to prob_prog_hariInto to prob_prog_hari
Into to prob_prog_hari
 
Applications of Multivariable Calculus.ppt
Applications of Multivariable Calculus.pptApplications of Multivariable Calculus.ppt
Applications of Multivariable Calculus.ppt
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.ppt
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.ppt
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.ppt
 
ML-04.pdf
ML-04.pdfML-04.pdf
ML-04.pdf
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdf
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
 
Lecture5.pptx
Lecture5.pptxLecture5.pptx
Lecture5.pptx
 
Anomaly detection Full Article
Anomaly detection Full ArticleAnomaly detection Full Article
Anomaly detection Full Article
 
13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf
 

Recently uploaded

Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 

Recently uploaded (20)

Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 

MLE.pdf

  • 1. Maximum Likelihood Estimation 02-680 Essential Mathematics and Statistics for Scientists DeGroot Ch 7.5 Wasserman Ch 9
  • 2. Overview • Our recipe for learning Data Step 1: collect data Samples Features Step 0: form a scientific hypothesis Step 2: Pick the appropriate probability distribution, with parameter 𝜃 Step 3: Estimate the parameter 𝜃
  • 3. Coin Flip Example • Our recipe for learning Step 1: collect data Samples Feature Step 0: I got a new coin. What is the probability of head? Step 2: Pick the appropriate probability distribution, with parameter 𝜃 Step 3: Estimate the parameter 𝜃
  • 4. Weight Example • Our recipe for learning Step 1: collect data Samples Feature Step 0: What is the distribution of weights of people in Pittsburgh? Step 2: Pick the appropriate probability distribution, with parameter 𝜃 Step 3: Estimate the parameter 𝜃
  • 5. Before estimation in Step 3, we need to set things up
  • 6. Statistical Inference, Learning: a Formal Set Up • Given a sample X1,...,Xn ∼ F, how do we learn our probability model? – Statistical model F: a set of distributions – Parametric model: F that can be parameterized by a finite number of parameters F = { p( x| 𝜃 ): 𝜃 ∈Θ }, where 𝜃 = {𝜃1, ..., 𝜃k} is unknown parameters θ ∈ Θ.
  • 7. Parametric Models: Coin Flip Ex) Probability of head p of a coin Bernoulli distribution Parameter: Parametric model:
  • 8. Parametric Models: Gene Expression Ex) Learn the probability distribution of the expression levels of gene A Normal distribution Parameters: Parametric model:
  • 9. Which Parametric Model? • Discrete or continuous? • Univariate or multivariate features? • Prior knowledge ex) Count data in intervals: Poisson distribution ex) real-value, bell-shape: Normal distribution • Evidence from exploratory analysis of data ex) relationship between mean and variance: Poisson vs negative binomial distribution
  • 10. Overview • Our recipe for learning Data Step 1: collect data Samples Features Step 0: form a scientific hypothesis Step 2: Pick the appropriate probability distribution, with parameter 𝜃 Step 3: Estimate the parameter 𝜃
  • 11. Maximum Likelihood Estimation • The most commonly used method for parametric estimation • MLE for probability distributions we have seen so far • MLE for complex state-of-the-art probability models
  • 12. Parametric Models Meet Data • Ingredient 1: Given our choice of parametric model X ~ F, for a random variable X • Ingredient 2: Collect data D for n samples. Model each sample as a random variable, X1,...,Xn • Recipe: Maximum likelihood estimation (MLE) – Pick a member in the set F that maximizes the “likelihood of data D”
  • 14. Formally, Parametric Estimation, Point Estimation • Select f ∈ F that best describes data • Select parameter 𝜃 that best describes data • Inference: estimate 𝜃 given data • Maximum likelihood estimate: the parameter 𝜃 estimated from data using MLE
  • 15. MLE as an Optimization Problem • How can we select parameter 𝜃 that best describes data? • Maximum likelihood estimation as an optimization – Likelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data – Maximize the likelihood: find the parameter 𝜃 that maximizes the score function
  • 16. “Likelihood of Data” • Assume X1,...,Xn are i.i.d. random variables, representing samples from a distribution P(X|𝜃) in F. Then, the “likelihood of data” is defined as the probability of data D P (D| 𝜃) =
  • 17. “Likelihood of Data” • Assume X1,...,Xn are i.i.d. random variables, representing samples from a distribution P(X|𝜃) in F. • The likelihood function, also called the likelihood of data, is given as 𝐿"(𝜃) = ∏#$% " P(Xi |𝜃) • Function of parameter 𝜃 given data X1,...,Xn
  • 18. “Log Likelihood of Data” • The log likelihood function is 𝑙"(𝜃) = log 𝐿"(𝜃) = • Why log likelihood instead of likelihood?
  • 19. MLE as an Optimization Problem • How can we select parameter 𝜃 that best describes data? • Maximum likelihood estimation as an optimization – Llikelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data – Maximize the likelihood: find the parameter 𝜽 that maximizes the score function
  • 20. Maximum Likelihood Estimation • Maximum likelihood estimator * 𝜃" = argmax & 𝐿"(𝜃) = argmax & 𝑙"(𝜃) log is a monotonically increasing concave function
  • 21. MLE as an Optimization Problem • How can we select parameter 𝜃 that best describes data? • Maximum likelihood estimation as an optimization – Llikelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data – Maximize the likelihood: find the parameter 𝜽 that maximizes the score function
  • 23. MLE, Examples • Let’s perform MLE – Bernoulli distribution • You have a coin with unknown probability of head, p • You flip the coin 10 times and get H, H, T, T, H, T, T, H, T, H • What is your estimate of the probability of head p? – Normal distribution • You have a gene, gene A, whose expression level and variance are unknown • You collect the expression measurements of gene A for 7 individuals • How would you estimate for mean and variance?
  • 24. MLE: Bernoulli • Let X1,...,Xn ∼ Bernoulli(p). Find a maximum likelihood estimate p. • Step 1: Write down the log likelihood of data 𝐿"(𝑝) = 𝑙"(𝑝) =
  • 25. MLE: Bernoulli Distribution • Step 2: Maximize the log likelihood argmax ' 𝑙"(𝑝)
  • 26. MLE: Bernoulli Distribution with Observed Data • Let X ∼ Bernoulli(p). • Flipped the coin 10 times and got H, H, T, T, T, H, H, T, T, H • Step 1: Write down the log likelihood of data 𝐿(𝑝; 𝐷) = 𝑙(𝑝; 𝐷) =
  • 27. MLE: Bernoulli Distribution • Step 2: Maximize the log likelihood argmax ' 𝑙(𝑝; 𝐷)
  • 28. MLE: Normal Distribution • Let X1,...,Xn ∼ N(µ, σ2). Find a maximum likelihood estimate of µ, σ2 . • Step 1: Write down the log likelihood of data 𝐿"(µ, σ2) = 𝑙"(µ, σ2) =
  • 29. MLE: Normal Distribution • Step 2: Maximize the log likelihood argmax µ, σ2 𝑙"(µ, σ2) MLE for µ
  • 30. MLE: Normal Distribution • Step 2: Maximize the log likelihood argmax µ, σ2 𝑙"(µ, σ2) MLE for σ2
  • 31. MLE: Normal Distribution with Observations • Let X ∼ N(µ, σ2). Find a maximum likelihood estimate of µ, σ2 . • Data with 10 samples 10, 12, 9, 14, 8, 11, 7, 6, 10.5, 12.5 • Step 1: Write down the log likelihood of data 𝐿"(µ, σ2) = 𝑙"(µ, σ2) =
  • 32. MLE: Poisson Distribution • Let X1,...,Xn ∼ Poisson(𝜆). Find a maximum likelihood estimate 𝜆. • Step 1: Write down the log likelihood of data 𝐿"(𝜆) = 𝑙"(𝜆) =
  • 33. MLE: Poisson Distribution • Step 2: Maximize the log likelihood argmax ( 𝑙"(𝜆)
  • 34. Markov Model • Joint distribution of all binary random variables X1, . . . , XT P(X1, . . . , XT ) • A Markov model is defined by – P(X1): initial distribution – P(Xk| Xk-1): transition probabilities identical for k=2,…,T
  • 35. MLE: Markov Chain • Let X1,...,XT follows Markov chain. Find a maximum likelihood estimate of the Markov chain parameters 𝜃. • Assume three sequences of observations D 1, 0, 0, 1, 1 1, 0, 1, 1, 0 0, 1, 0, 0, 1
  • 36. MLE: Markov Chain, Intuition • Assume three sequences of observations D 1, 0, 0, 1, 1 1, 0, 1, 1, 0 0, 1, 0, 0, 1 • What is your estimate of P(X1)?
  • 37. MLE: Markov Chain, Intuition • Assume three sequences of observations D 1, 0, 0, 1, 1 1, 0, 1, 1, 0 0, 1, 0, 0, 1 • What is your estimate of P(Xk| Xk-1=0)?
  • 38. MLE: Markov Chain, Intuition • Assume three sequences of observations D 1, 0, 0, 1, 1 1, 0, 1, 1, 0 0, 1, 0, 0, 1 • What is your estimate of P(Xk| Xk-1=1)?
  • 39. MLE: Markov Chain • Let X1,...,XT follows Markov chain. Find a maximum likelihood estimate of the Markov chain parameters 𝜃. • Assume three sequences of observations D X1 1,...,XT 1 X1 2,...,XT 2 X1 3,...,XT 3
  • 40. MLE: Markov Chain • Step 1: Write down the log likelihood of data 𝐿"(𝜃) = 𝑙"(𝜃) =
  • 41. MLE: Markov Chain • Step 2: Maximize the log likelihood (initial probabilities) argmax & 𝑙"(𝜃)
  • 42. MLE: Markov Chain • Step 2: Maximize the log likelihood (transition probabilities) argmax & 𝑙"(𝜃)
  • 43. MLE in Practice • Follow the recipe and work out MLE on paper, given probability model • Write a program to compute a maximum likelihood estimate of parameters from data – Load data into memory – For normal distribution, compute μ = sample mean, σ2 = sample variance – For Bernoulli distribution, compute p = proportion of successes • Simple but important learning principle!
  • 44. Maximum Likelihood Estimation as an Optimization Problem • To find an MLE of the parameter 𝜃, we solve the following optimization problem * 𝜃" = argmax & 𝑙"(𝜃) • In our examples for Bernoulli, univariate/multivariate normal distributions – 𝐿!(𝜃) is a convex function: a single global maximum – a closed-form solution exists
  • 45. Maximum Likelihood Estimation as an Optimization Problem • For some models, MLE is easy • In general, in more complex probability models, performing MLE is not always easy – 𝐿!(𝜃) is a non-convex function, multiple local maxima – a closed-form solution does not exist, need to rely on iterative optimization methods
  • 46. Other Probability Models and MLE: Neural Networks • Deep neural nets for modeling P(Y|X) • Optimization criterion for learning the model: MLE! • No closed form solution for the parameter estimates: – Rely on iterative optimization method
  • 47. Summary • Maximum likelihood estimation is the most commonly used technique for learning a parametric probability models from data • MLE, recipe – Write down the log likelihood – Differentiate the log likelihood with respect to the parameters – Set the above to zero and solve for the parameters • MLE for – Bernoulli distribution – Univariate/multivariate normal distributions