SlideShare a Scribd company logo
1 of 47
Download to read offline
Maximum Likelihood Estimation
02-680 Essential Mathematics and Statistics for Scientists
DeGroot Ch 7.5
Wasserman Ch 9
Overview
• Our recipe for learning
Data
Step 1: collect data
Samples
Features
Step 0: form a scientific hypothesis
Step 2:
Pick the
appropriate
probability
distribution,
with parameter 𝜃
Step 3:
Estimate the
parameter 𝜃
Coin Flip Example
• Our recipe for learning
Step 1: collect data
Samples
Feature
Step 0: I got a new coin. What is the probability of head?
Step 2:
Pick the
appropriate
probability
distribution,
with parameter 𝜃
Step 3:
Estimate the
parameter 𝜃
Weight Example
• Our recipe for learning
Step 1: collect data
Samples
Feature
Step 0: What is the distribution of weights of people in Pittsburgh?
Step 2:
Pick the
appropriate
probability
distribution,
with parameter 𝜃
Step 3:
Estimate the
parameter 𝜃
Before estimation in Step 3, we need to set things up
Statistical Inference, Learning: a Formal Set Up
• Given a sample X1,...,Xn ∼ F, how do we learn our probability model?
– Statistical model F: a set of distributions
– Parametric model: F that can be parameterized by a finite number of parameters
F = { p( x| 𝜃 ): 𝜃 ∈Θ },
where 𝜃 = {𝜃1, ..., 𝜃k} is unknown parameters θ ∈ Θ.
Parametric Models: Coin Flip
Ex) Probability of head p of a coin
Bernoulli distribution
Parameter:
Parametric model:
Parametric Models: Gene Expression
Ex) Learn the probability distribution of the expression levels of gene A
Normal distribution
Parameters:
Parametric model:
Which Parametric Model?
• Discrete or continuous?
• Univariate or multivariate features?
• Prior knowledge
ex) Count data in intervals: Poisson distribution
ex) real-value, bell-shape: Normal distribution
• Evidence from exploratory analysis of data
ex) relationship between mean and variance: Poisson vs negative binomial distribution
Overview
• Our recipe for learning
Data
Step 1: collect data
Samples
Features
Step 0: form a scientific hypothesis
Step 2:
Pick the
appropriate
probability
distribution,
with parameter 𝜃
Step 3:
Estimate the
parameter 𝜃
Maximum Likelihood Estimation
• The most commonly used method for parametric estimation
• MLE for probability distributions we have seen so far
• MLE for complex state-of-the-art probability models
Parametric Models Meet Data
• Ingredient 1: Given our choice of parametric model X ~ F, for a random variable X
• Ingredient 2: Collect data D for n samples. Model each sample as a random variable,
X1,...,Xn
• Recipe: Maximum likelihood estimation (MLE)
– Pick a member in the set F that maximizes the “likelihood of data D”
Illustration for MLE
Formally, Parametric Estimation, Point Estimation
• Select f ∈ F that best describes data
• Select parameter 𝜃 that best describes data
• Inference: estimate 𝜃 given data
• Maximum likelihood estimate: the parameter 𝜃 estimated from data using MLE
MLE as an Optimization Problem
• How can we select parameter 𝜃 that best describes data?
• Maximum likelihood estimation as an optimization
– Likelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data
– Maximize the likelihood: find the parameter 𝜃 that maximizes the score function
“Likelihood of Data”
• Assume X1,...,Xn are i.i.d. random variables, representing samples from a distribution
P(X|𝜃) in F. Then, the “likelihood of data” is defined as the probability of data D
P (D| 𝜃) =
“Likelihood of Data”
• Assume X1,...,Xn are i.i.d. random variables, representing samples from a distribution
P(X|𝜃) in F.
• The likelihood function, also called the likelihood of data, is given as
𝐿"(𝜃) = ∏#$%
"
P(Xi |𝜃)
• Function of parameter 𝜃 given data X1,...,Xn
“Log Likelihood of Data”
• The log likelihood function is
𝑙"(𝜃) = log 𝐿"(𝜃)
=
• Why log likelihood instead of likelihood?
MLE as an Optimization Problem
• How can we select parameter 𝜃 that best describes data?
• Maximum likelihood estimation as an optimization
– Llikelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data
– Maximize the likelihood: find the parameter 𝜽 that maximizes the score function
Maximum Likelihood Estimation
• Maximum likelihood estimator
*
𝜃" = argmax
&
𝐿"(𝜃)
= argmax
&
𝑙"(𝜃)
log is a monotonically increasing concave function
MLE as an Optimization Problem
• How can we select parameter 𝜃 that best describes data?
• Maximum likelihood estimation as an optimization
– Llikelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data
– Maximize the likelihood: find the parameter 𝜽 that maximizes the score function
Illustration for MLE
MLE, Examples
• Let’s perform MLE
– Bernoulli distribution
• You have a coin with unknown
probability of head, p
• You flip the coin 10 times and get
H, H, T, T, H, T, T, H, T, H
• What is your estimate of the
probability of head p?
– Normal distribution
• You have a gene, gene A, whose
expression level and variance are
unknown
• You collect the expression
measurements of gene A for 7
individuals
• How would you estimate for mean and
variance?
MLE: Bernoulli
• Let X1,...,Xn ∼ Bernoulli(p). Find a maximum likelihood estimate p.
• Step 1: Write down the log likelihood of data
𝐿"(𝑝) =
𝑙"(𝑝) =
MLE: Bernoulli Distribution
• Step 2: Maximize the log likelihood
argmax
'
𝑙"(𝑝)
MLE: Bernoulli Distribution with Observed Data
• Let X ∼ Bernoulli(p).
• Flipped the coin 10 times and got
H, H, T, T, T, H, H, T, T, H
• Step 1: Write down the log likelihood of data
𝐿(𝑝; 𝐷) =
𝑙(𝑝; 𝐷) =
MLE: Bernoulli Distribution
• Step 2: Maximize the log likelihood
argmax
'
𝑙(𝑝; 𝐷)
MLE: Normal Distribution
• Let X1,...,Xn ∼ N(µ, σ2). Find a maximum likelihood estimate of µ, σ2 .
• Step 1: Write down the log likelihood of data
𝐿"(µ, σ2) =
𝑙"(µ, σ2) =
MLE: Normal Distribution
• Step 2: Maximize the log likelihood
argmax
µ, σ2
𝑙"(µ, σ2)
MLE for µ
MLE: Normal Distribution
• Step 2: Maximize the log likelihood
argmax
µ, σ2
𝑙"(µ, σ2)
MLE for σ2
MLE: Normal Distribution with Observations
• Let X ∼ N(µ, σ2). Find a maximum likelihood estimate of µ, σ2 .
• Data with 10 samples
10, 12, 9, 14, 8, 11, 7, 6, 10.5, 12.5
• Step 1: Write down the log likelihood of data
𝐿"(µ, σ2) =
𝑙"(µ, σ2) =
MLE: Poisson Distribution
• Let X1,...,Xn ∼ Poisson(𝜆). Find a maximum likelihood estimate 𝜆.
• Step 1: Write down the log likelihood of data
𝐿"(𝜆) =
𝑙"(𝜆) =
MLE: Poisson Distribution
• Step 2: Maximize the log likelihood
argmax
(
𝑙"(𝜆)
Markov Model
• Joint distribution of all binary random variables X1, . . . , XT
P(X1, . . . , XT )
• A Markov model is defined by
– P(X1): initial distribution
– P(Xk| Xk-1): transition probabilities identical for k=2,…,T
MLE: Markov Chain
• Let X1,...,XT follows Markov chain. Find a maximum likelihood estimate of the
Markov chain parameters 𝜃.
• Assume three sequences of observations D
1, 0, 0, 1, 1
1, 0, 1, 1, 0
0, 1, 0, 0, 1
MLE: Markov Chain, Intuition
• Assume three sequences of observations D
1, 0, 0, 1, 1
1, 0, 1, 1, 0
0, 1, 0, 0, 1
• What is your estimate of P(X1)?
MLE: Markov Chain, Intuition
• Assume three sequences of observations D
1, 0, 0, 1, 1
1, 0, 1, 1, 0
0, 1, 0, 0, 1
• What is your estimate of P(Xk| Xk-1=0)?
MLE: Markov Chain, Intuition
• Assume three sequences of observations D
1, 0, 0, 1, 1
1, 0, 1, 1, 0
0, 1, 0, 0, 1
• What is your estimate of P(Xk| Xk-1=1)?
MLE: Markov Chain
• Let X1,...,XT follows Markov chain. Find a maximum likelihood estimate of the
Markov chain parameters 𝜃.
• Assume three sequences of observations D
X1
1,...,XT
1
X1
2,...,XT
2
X1
3,...,XT
3
MLE: Markov Chain
• Step 1: Write down the log likelihood of data
𝐿"(𝜃) =
𝑙"(𝜃) =
MLE: Markov Chain
• Step 2: Maximize the log likelihood (initial probabilities)
argmax
&
𝑙"(𝜃)
MLE: Markov Chain
• Step 2: Maximize the log likelihood (transition probabilities)
argmax
&
𝑙"(𝜃)
MLE in Practice
• Follow the recipe and work out MLE on paper, given probability model
• Write a program to compute a maximum likelihood estimate of parameters from
data
– Load data into memory
– For normal distribution, compute μ = sample mean, σ2 = sample variance
– For Bernoulli distribution, compute p = proportion of successes
• Simple but important learning principle!
Maximum Likelihood Estimation as an Optimization Problem
• To find an MLE of the parameter 𝜃, we solve the following optimization problem
*
𝜃" = argmax
&
𝑙"(𝜃)
• In our examples for Bernoulli, univariate/multivariate normal distributions
– 𝐿!(𝜃) is a convex function: a single global maximum
– a closed-form solution exists
Maximum Likelihood Estimation as an Optimization Problem
• For some models, MLE is easy
• In general, in more complex probability models, performing MLE is not always easy
– 𝐿!(𝜃) is a non-convex function, multiple local maxima
– a closed-form solution does not exist, need to rely on iterative optimization methods
Other Probability Models and MLE: Neural Networks
• Deep neural nets for modeling P(Y|X)
• Optimization criterion for learning the model: MLE!
• No closed form solution for the parameter estimates:
– Rely on iterative optimization method
Summary
• Maximum likelihood estimation is the most commonly used technique for learning a
parametric probability models from data
• MLE, recipe
– Write down the log likelihood
– Differentiate the log likelihood with respect to the parameters
– Set the above to zero and solve for the parameters
• MLE for
– Bernoulli distribution
– Univariate/multivariate normal distributions

More Related Content

Similar to MLE.pdf

kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptxAryanhayaran
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewChia-Ching Lin
 
Unit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxUnit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxsmithashetty24
 
Applications of Multivariable Calculus.ppt
Applications of Multivariable Calculus.pptApplications of Multivariable Calculus.ppt
Applications of Multivariable Calculus.pptsaiprashanth973626
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptxGaytriDhingra1
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural NetworksNatan Katz
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.pptssuserec53e73
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.pptssuserec53e73
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.pptssuser71fb63
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdfHODIT12
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1MostafaHazemMostafaa
 
Anomaly detection Full Article
Anomaly detection Full ArticleAnomaly detection Full Article
Anomaly detection Full ArticleMenglinLiu1
 
13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdfEmanAsem4
 

Similar to MLE.pdf (20)

kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptx
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical View
 
Unit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxUnit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptx
 
Into to prob_prog_hari
Into to prob_prog_hariInto to prob_prog_hari
Into to prob_prog_hari
 
Applications of Multivariable Calculus.ppt
Applications of Multivariable Calculus.pptApplications of Multivariable Calculus.ppt
Applications of Multivariable Calculus.ppt
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.ppt
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.ppt
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.ppt
 
ML-04.pdf
ML-04.pdfML-04.pdf
ML-04.pdf
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdf
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
 
Lecture5.pptx
Lecture5.pptxLecture5.pptx
Lecture5.pptx
 
Anomaly detection Full Article
Anomaly detection Full ArticleAnomaly detection Full Article
Anomaly detection Full Article
 
13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf
 

Recently uploaded

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 

Recently uploaded (20)

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 

MLE.pdf

  • 1. Maximum Likelihood Estimation 02-680 Essential Mathematics and Statistics for Scientists DeGroot Ch 7.5 Wasserman Ch 9
  • 2. Overview • Our recipe for learning Data Step 1: collect data Samples Features Step 0: form a scientific hypothesis Step 2: Pick the appropriate probability distribution, with parameter 𝜃 Step 3: Estimate the parameter 𝜃
  • 3. Coin Flip Example • Our recipe for learning Step 1: collect data Samples Feature Step 0: I got a new coin. What is the probability of head? Step 2: Pick the appropriate probability distribution, with parameter 𝜃 Step 3: Estimate the parameter 𝜃
  • 4. Weight Example • Our recipe for learning Step 1: collect data Samples Feature Step 0: What is the distribution of weights of people in Pittsburgh? Step 2: Pick the appropriate probability distribution, with parameter 𝜃 Step 3: Estimate the parameter 𝜃
  • 5. Before estimation in Step 3, we need to set things up
  • 6. Statistical Inference, Learning: a Formal Set Up • Given a sample X1,...,Xn ∼ F, how do we learn our probability model? – Statistical model F: a set of distributions – Parametric model: F that can be parameterized by a finite number of parameters F = { p( x| 𝜃 ): 𝜃 ∈Θ }, where 𝜃 = {𝜃1, ..., 𝜃k} is unknown parameters θ ∈ Θ.
  • 7. Parametric Models: Coin Flip Ex) Probability of head p of a coin Bernoulli distribution Parameter: Parametric model:
  • 8. Parametric Models: Gene Expression Ex) Learn the probability distribution of the expression levels of gene A Normal distribution Parameters: Parametric model:
  • 9. Which Parametric Model? • Discrete or continuous? • Univariate or multivariate features? • Prior knowledge ex) Count data in intervals: Poisson distribution ex) real-value, bell-shape: Normal distribution • Evidence from exploratory analysis of data ex) relationship between mean and variance: Poisson vs negative binomial distribution
  • 10. Overview • Our recipe for learning Data Step 1: collect data Samples Features Step 0: form a scientific hypothesis Step 2: Pick the appropriate probability distribution, with parameter 𝜃 Step 3: Estimate the parameter 𝜃
  • 11. Maximum Likelihood Estimation • The most commonly used method for parametric estimation • MLE for probability distributions we have seen so far • MLE for complex state-of-the-art probability models
  • 12. Parametric Models Meet Data • Ingredient 1: Given our choice of parametric model X ~ F, for a random variable X • Ingredient 2: Collect data D for n samples. Model each sample as a random variable, X1,...,Xn • Recipe: Maximum likelihood estimation (MLE) – Pick a member in the set F that maximizes the “likelihood of data D”
  • 14. Formally, Parametric Estimation, Point Estimation • Select f ∈ F that best describes data • Select parameter 𝜃 that best describes data • Inference: estimate 𝜃 given data • Maximum likelihood estimate: the parameter 𝜃 estimated from data using MLE
  • 15. MLE as an Optimization Problem • How can we select parameter 𝜃 that best describes data? • Maximum likelihood estimation as an optimization – Likelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data – Maximize the likelihood: find the parameter 𝜃 that maximizes the score function
  • 16. “Likelihood of Data” • Assume X1,...,Xn are i.i.d. random variables, representing samples from a distribution P(X|𝜃) in F. Then, the “likelihood of data” is defined as the probability of data D P (D| 𝜃) =
  • 17. “Likelihood of Data” • Assume X1,...,Xn are i.i.d. random variables, representing samples from a distribution P(X|𝜃) in F. • The likelihood function, also called the likelihood of data, is given as 𝐿"(𝜃) = ∏#$% " P(Xi |𝜃) • Function of parameter 𝜃 given data X1,...,Xn
  • 18. “Log Likelihood of Data” • The log likelihood function is 𝑙"(𝜃) = log 𝐿"(𝜃) = • Why log likelihood instead of likelihood?
  • 19. MLE as an Optimization Problem • How can we select parameter 𝜃 that best describes data? • Maximum likelihood estimation as an optimization – Llikelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data – Maximize the likelihood: find the parameter 𝜽 that maximizes the score function
  • 20. Maximum Likelihood Estimation • Maximum likelihood estimator * 𝜃" = argmax & 𝐿"(𝜃) = argmax & 𝑙"(𝜃) log is a monotonically increasing concave function
  • 21. MLE as an Optimization Problem • How can we select parameter 𝜃 that best describes data? • Maximum likelihood estimation as an optimization – Llikelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data – Maximize the likelihood: find the parameter 𝜽 that maximizes the score function
  • 23. MLE, Examples • Let’s perform MLE – Bernoulli distribution • You have a coin with unknown probability of head, p • You flip the coin 10 times and get H, H, T, T, H, T, T, H, T, H • What is your estimate of the probability of head p? – Normal distribution • You have a gene, gene A, whose expression level and variance are unknown • You collect the expression measurements of gene A for 7 individuals • How would you estimate for mean and variance?
  • 24. MLE: Bernoulli • Let X1,...,Xn ∼ Bernoulli(p). Find a maximum likelihood estimate p. • Step 1: Write down the log likelihood of data 𝐿"(𝑝) = 𝑙"(𝑝) =
  • 25. MLE: Bernoulli Distribution • Step 2: Maximize the log likelihood argmax ' 𝑙"(𝑝)
  • 26. MLE: Bernoulli Distribution with Observed Data • Let X ∼ Bernoulli(p). • Flipped the coin 10 times and got H, H, T, T, T, H, H, T, T, H • Step 1: Write down the log likelihood of data 𝐿(𝑝; 𝐷) = 𝑙(𝑝; 𝐷) =
  • 27. MLE: Bernoulli Distribution • Step 2: Maximize the log likelihood argmax ' 𝑙(𝑝; 𝐷)
  • 28. MLE: Normal Distribution • Let X1,...,Xn ∼ N(µ, σ2). Find a maximum likelihood estimate of µ, σ2 . • Step 1: Write down the log likelihood of data 𝐿"(µ, σ2) = 𝑙"(µ, σ2) =
  • 29. MLE: Normal Distribution • Step 2: Maximize the log likelihood argmax µ, σ2 𝑙"(µ, σ2) MLE for µ
  • 30. MLE: Normal Distribution • Step 2: Maximize the log likelihood argmax µ, σ2 𝑙"(µ, σ2) MLE for σ2
  • 31. MLE: Normal Distribution with Observations • Let X ∼ N(µ, σ2). Find a maximum likelihood estimate of µ, σ2 . • Data with 10 samples 10, 12, 9, 14, 8, 11, 7, 6, 10.5, 12.5 • Step 1: Write down the log likelihood of data 𝐿"(µ, σ2) = 𝑙"(µ, σ2) =
  • 32. MLE: Poisson Distribution • Let X1,...,Xn ∼ Poisson(𝜆). Find a maximum likelihood estimate 𝜆. • Step 1: Write down the log likelihood of data 𝐿"(𝜆) = 𝑙"(𝜆) =
  • 33. MLE: Poisson Distribution • Step 2: Maximize the log likelihood argmax ( 𝑙"(𝜆)
  • 34. Markov Model • Joint distribution of all binary random variables X1, . . . , XT P(X1, . . . , XT ) • A Markov model is defined by – P(X1): initial distribution – P(Xk| Xk-1): transition probabilities identical for k=2,…,T
  • 35. MLE: Markov Chain • Let X1,...,XT follows Markov chain. Find a maximum likelihood estimate of the Markov chain parameters 𝜃. • Assume three sequences of observations D 1, 0, 0, 1, 1 1, 0, 1, 1, 0 0, 1, 0, 0, 1
  • 36. MLE: Markov Chain, Intuition • Assume three sequences of observations D 1, 0, 0, 1, 1 1, 0, 1, 1, 0 0, 1, 0, 0, 1 • What is your estimate of P(X1)?
  • 37. MLE: Markov Chain, Intuition • Assume three sequences of observations D 1, 0, 0, 1, 1 1, 0, 1, 1, 0 0, 1, 0, 0, 1 • What is your estimate of P(Xk| Xk-1=0)?
  • 38. MLE: Markov Chain, Intuition • Assume three sequences of observations D 1, 0, 0, 1, 1 1, 0, 1, 1, 0 0, 1, 0, 0, 1 • What is your estimate of P(Xk| Xk-1=1)?
  • 39. MLE: Markov Chain • Let X1,...,XT follows Markov chain. Find a maximum likelihood estimate of the Markov chain parameters 𝜃. • Assume three sequences of observations D X1 1,...,XT 1 X1 2,...,XT 2 X1 3,...,XT 3
  • 40. MLE: Markov Chain • Step 1: Write down the log likelihood of data 𝐿"(𝜃) = 𝑙"(𝜃) =
  • 41. MLE: Markov Chain • Step 2: Maximize the log likelihood (initial probabilities) argmax & 𝑙"(𝜃)
  • 42. MLE: Markov Chain • Step 2: Maximize the log likelihood (transition probabilities) argmax & 𝑙"(𝜃)
  • 43. MLE in Practice • Follow the recipe and work out MLE on paper, given probability model • Write a program to compute a maximum likelihood estimate of parameters from data – Load data into memory – For normal distribution, compute μ = sample mean, σ2 = sample variance – For Bernoulli distribution, compute p = proportion of successes • Simple but important learning principle!
  • 44. Maximum Likelihood Estimation as an Optimization Problem • To find an MLE of the parameter 𝜃, we solve the following optimization problem * 𝜃" = argmax & 𝑙"(𝜃) • In our examples for Bernoulli, univariate/multivariate normal distributions – 𝐿!(𝜃) is a convex function: a single global maximum – a closed-form solution exists
  • 45. Maximum Likelihood Estimation as an Optimization Problem • For some models, MLE is easy • In general, in more complex probability models, performing MLE is not always easy – 𝐿!(𝜃) is a non-convex function, multiple local maxima – a closed-form solution does not exist, need to rely on iterative optimization methods
  • 46. Other Probability Models and MLE: Neural Networks • Deep neural nets for modeling P(Y|X) • Optimization criterion for learning the model: MLE! • No closed form solution for the parameter estimates: – Rely on iterative optimization method
  • 47. Summary • Maximum likelihood estimation is the most commonly used technique for learning a parametric probability models from data • MLE, recipe – Write down the log likelihood – Differentiate the log likelihood with respect to the parameters – Set the above to zero and solve for the parameters • MLE for – Bernoulli distribution – Univariate/multivariate normal distributions