SlideShare a Scribd company logo
1 of 4
Download to read offline
CS229 Lecture notes
Andrew Ng
Mixtures of Gaussians and the EM algorithm
In this set of notes, we discuss the EM (Expectation-Maximization) algorithm
for density estimation.
Suppose that we are given a training set {x(1)
, . . . , x(n)
} as usual. Since
we are in the unsupervised learning setting, these points do not come with
any labels.
We wish to model the data by specifying a joint distribution p(x(i)
, z(i)
) =
p(x(i)
|z(i)
)p(z(i)
). Here, z(i)
∼ Multinomial(φ) (where φj ≥ 0, k
j=1 φj = 1,
and the parameter φj gives p(z(i)
= j)), and x(i)
|z(i)
= j ∼ N(µj, Σj). We
let k denote the number of values that the z(i)
’s can take on. Thus, our
model posits that each x(i)
was generated by randomly choosing z(i)
from
{1, . . . , k}, and then x(i)
was drawn from one of k Gaussians depending on
z(i)
. This is called the mixture of Gaussians model. Also, note that the
z(i)
’s are latent random variables, meaning that they’re hidden/unobserved.
This is what will make our estimation problem difficult.
The parameters of our model are thus φ, µ and Σ. To estimate them, we
can write down the likelihood of our data:
ℓ(φ, µ, Σ) =
n
i=1
log p(x(i)
; φ, µ, Σ)
=
n
i=1
log
k
z(i)=1
p(x(i)
|z(i)
; µ, Σ)p(z(i)
; φ).
However, if we set to zero the derivatives of this formula with respect to
the parameters and try to solve, we’ll find that it is not possible to find the
maximum likelihood estimates of the parameters in closed form. (Try this
yourself at home.)
The random variables z(i)
indicate which of the k Gaussians each x(i)
had come from. Note that if we knew what the z(i)
’s were, the maximum
1
2
likelihood problem would have been easy. Specifically, we could then write
down the likelihood as
ℓ(φ, µ, Σ) =
n
i=1
log p(x(i)
|z(i)
; µ, Σ) + log p(z(i)
; φ).
Maximizing this with respect to φ, µ and Σ gives the parameters:
φj =
1
n
n
i=1
1{z(i)
= j},
µj =
n
i=1 1{z(i)
= j}x(i)
n
i=1 1{z(i) = j}
,
Σj =
n
i=1 1{z(i)
= j}(x(i)
− µj)(x(i)
− µj)T
n
i=1 1{z(i) = j}
.
Indeed, we see that if the z(i)
’s were known, then maximum likelihood
estimation becomes nearly identical to what we had when estimating the
parameters of the Gaussian discriminant analysis model, except that here
the z(i)
’s playing the role of the class labels.1
However, in our density estimation problem, the z(i)
’s are not known.
What can we do?
The EM algorithm is an iterative algorithm that has two main steps.
Applied to our problem, in the E-step, it tries to “guess” the values of the
z(i)
’s. In the M-step, it updates the parameters of our model based on our
guesses. Since in the M-step we are pretending that the guesses in the first
part were correct, the maximization becomes easy. Here’s the algorithm:
Repeat until convergence: {
(E-step) For each i, j, set
w
(i)
j := p(z(i)
= j|x(i)
; φ, µ, Σ)
1
There are other minor differences in the formulas here from what we’d obtained in
PS1 with Gaussian discriminant analysis, first because we’ve generalized the z(i)
’s to be
multinomial rather than Bernoulli, and second because here we are using a different Σj
for each Gaussian.
3
(M-step) Update the parameters:
φj :=
1
n
n
i=1
w
(i)
j ,
µj :=
n
i=1 w
(i)
j x(i)
n
i=1 w
(i)
j
,
Σj :=
n
i=1 w
(i)
j (x(i)
− µj)(x(i)
− µj)T
n
i=1 w
(i)
j
}
In the E-step, we calculate the posterior probability of our parameters
the z(i)
’s, given the x(i)
and using the current setting of our parameters. I.e.,
using Bayes rule, we obtain:
p(z(i)
= j|x(i)
; φ, µ, Σ) =
p(x(i)
|z(i)
= j; µ, Σ)p(z(i)
= j; φ)
k
l=1 p(x(i)|z(i) = l; µ, Σ)p(z(i) = l; φ)
Here, p(x(i)
|z(i)
= j; µ, Σ) is given by evaluating the density of a Gaussian
with mean µj and covariance Σj at x(i)
; p(z(i)
= j; φ) is given by φj, and so
on. The values w
(i)
j calculated in the E-step represent our “soft” guesses2
for
the values of z(i)
.
Also, you should contrast the updates in the M-step with the formulas we
had when the z(i)
’s were known exactly. They are identical, except that in-
stead of the indicator functions “1{z(i)
= j}” indicating from which Gaussian
each datapoint had come, we now instead have the w
(i)
j ’s.
The EM-algorithm is also reminiscent of the K-means clustering algo-
rithm, except that instead of the “hard” cluster assignments c(i), we instead
have the “soft” assignments w
(i)
j . Similar to K-means, it is also susceptible
to local optima, so reinitializing at several different initial parameters may
be a good idea.
It’s clear that the EM algorithm has a very natural interpretation of
repeatedly trying to guess the unknown z(i)
’s; but how did it come about,
and can we make any guarantees about it, such as regarding its convergence?
In the next set of notes, we will describe a more general view of EM, one
2
The term “soft” refers to our guesses being probabilities and taking values in [0, 1]; in
contrast, a “hard” guess is one that represents a single best guess (such as taking values
in {0, 1} or {1, . . . , k}).
4
that will allow us to easily apply it to other estimation problems in which
there are also latent variables, and which will allow us to give a convergence
guarantee.

More Related Content

What's hot

Limits and Continuity - Intuitive Approach part 3
Limits and Continuity - Intuitive Approach part 3Limits and Continuity - Intuitive Approach part 3
Limits and Continuity - Intuitive Approach part 3FellowBuddy.com
 
2.3 Operations that preserve convexity & 2.4 Generalized inequalities
2.3 Operations that preserve convexity & 2.4 Generalized inequalities2.3 Operations that preserve convexity & 2.4 Generalized inequalities
2.3 Operations that preserve convexity & 2.4 Generalized inequalitiesRyotaroTsukada
 
functions limits and continuity
functions limits and continuityfunctions limits and continuity
functions limits and continuityPume Ananda
 
Partial derivatives love coffee.key
Partial derivatives love coffee.keyPartial derivatives love coffee.key
Partial derivatives love coffee.keyguest1a1479
 
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastavaBIOLOGICAL FORUM
 
4.3 derivatives of inv erse trig. functions
4.3 derivatives of inv erse trig. functions4.3 derivatives of inv erse trig. functions
4.3 derivatives of inv erse trig. functionsdicosmo178
 
Limits and continuity powerpoint
Limits and continuity powerpointLimits and continuity powerpoint
Limits and continuity powerpointcanalculus
 
CHAIN RULE AND IMPLICIT FUNCTION
CHAIN RULE AND IMPLICIT FUNCTIONCHAIN RULE AND IMPLICIT FUNCTION
CHAIN RULE AND IMPLICIT FUNCTIONNikhil Pandit
 
limits and continuity
limits and continuitylimits and continuity
limits and continuityElias Dinsa
 
limits and continuity
limits and continuity limits and continuity
limits and continuity imran khan
 
Daniel Hong ENGR 019 Q6
Daniel Hong ENGR 019 Q6Daniel Hong ENGR 019 Q6
Daniel Hong ENGR 019 Q6Daniel Hong
 
Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)Matthew Leingang
 

What's hot (20)

L8 fuzzy relations contd.
L8 fuzzy relations contd.L8 fuzzy relations contd.
L8 fuzzy relations contd.
 
Limits and Continuity - Intuitive Approach part 3
Limits and Continuity - Intuitive Approach part 3Limits and Continuity - Intuitive Approach part 3
Limits and Continuity - Intuitive Approach part 3
 
2.3 Operations that preserve convexity & 2.4 Generalized inequalities
2.3 Operations that preserve convexity & 2.4 Generalized inequalities2.3 Operations that preserve convexity & 2.4 Generalized inequalities
2.3 Operations that preserve convexity & 2.4 Generalized inequalities
 
functions limits and continuity
functions limits and continuityfunctions limits and continuity
functions limits and continuity
 
Partial derivatives love coffee.key
Partial derivatives love coffee.keyPartial derivatives love coffee.key
Partial derivatives love coffee.key
 
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
 
Prob review
Prob reviewProb review
Prob review
 
4.3 derivatives of inv erse trig. functions
4.3 derivatives of inv erse trig. functions4.3 derivatives of inv erse trig. functions
4.3 derivatives of inv erse trig. functions
 
Limits and continuity powerpoint
Limits and continuity powerpointLimits and continuity powerpoint
Limits and continuity powerpoint
 
CHAIN RULE AND IMPLICIT FUNCTION
CHAIN RULE AND IMPLICIT FUNCTIONCHAIN RULE AND IMPLICIT FUNCTION
CHAIN RULE AND IMPLICIT FUNCTION
 
limits and continuity
limits and continuitylimits and continuity
limits and continuity
 
1551 limits and continuity
1551 limits and continuity1551 limits and continuity
1551 limits and continuity
 
FUZZY LOGIC
FUZZY LOGICFUZZY LOGIC
FUZZY LOGIC
 
limits and continuity
limits and continuity limits and continuity
limits and continuity
 
Daniel Hong ENGR 019 Q6
Daniel Hong ENGR 019 Q6Daniel Hong ENGR 019 Q6
Daniel Hong ENGR 019 Q6
 
Limits and derivatives
Limits and derivativesLimits and derivatives
Limits and derivatives
 
Ece3075 a 8
Ece3075 a 8Ece3075 a 8
Ece3075 a 8
 
Lesson 5: Continuity
Lesson 5: ContinuityLesson 5: Continuity
Lesson 5: Continuity
 
Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)
 
Limit and continuity (2)
Limit and continuity (2)Limit and continuity (2)
Limit and continuity (2)
 

Similar to Cs229 notes7b

Machine learning (9)
Machine learning (9)Machine learning (9)
Machine learning (9)NYversity
 
Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jungkyu Lee
 
Machine learning (10)
Machine learning (10)Machine learning (10)
Machine learning (10)NYversity
 
Cs229 notes8
Cs229 notes8Cs229 notes8
Cs229 notes8VuTran231
 
Cs229 notes9
Cs229 notes9Cs229 notes9
Cs229 notes9VuTran231
 
Machine learning (2)
Machine learning (2)Machine learning (2)
Machine learning (2)NYversity
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfAlexander Litvinenko
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)NYversity
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieMarco Moldenhauer
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learningSteve Nouri
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesEric Conner
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...mathsjournal
 

Similar to Cs229 notes7b (20)

Machine learning (9)
Machine learning (9)Machine learning (9)
Machine learning (9)
 
Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘
 
Machine learning (10)
Machine learning (10)Machine learning (10)
Machine learning (10)
 
Cs229 notes8
Cs229 notes8Cs229 notes8
Cs229 notes8
 
Cs229 notes9
Cs229 notes9Cs229 notes9
Cs229 notes9
 
Machine learning (2)
Machine learning (2)Machine learning (2)
Machine learning (2)
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdf
 
lec-ugc-sse.pdf
lec-ugc-sse.pdflec-ugc-sse.pdf
lec-ugc-sse.pdf
 
Statistical Method In Economics
Statistical Method In EconomicsStatistical Method In Economics
Statistical Method In Economics
 
Programming Exam Help
Programming Exam Help Programming Exam Help
Programming Exam Help
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)
 
Nokton theory-en
Nokton theory-enNokton theory-en
Nokton theory-en
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorie
 
Ch7
Ch7Ch7
Ch7
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
Propensity albert
Propensity albertPropensity albert
Propensity albert
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture Notes
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
 
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
 
Aaapresmarsella
AaapresmarsellaAaapresmarsella
Aaapresmarsella
 

More from VuTran231

4 work readiness and reflection on wr and pol
4 work readiness and reflection on wr and pol4 work readiness and reflection on wr and pol
4 work readiness and reflection on wr and polVuTran231
 
2 project-oriented learning examined
2 project-oriented learning examined2 project-oriented learning examined
2 project-oriented learning examinedVuTran231
 
2 blended learning directions
2 blended learning directions2 blended learning directions
2 blended learning directionsVuTran231
 
4 jigsaw review
4 jigsaw review4 jigsaw review
4 jigsaw reviewVuTran231
 
2 learning alone and together
2  learning alone and together2  learning alone and together
2 learning alone and togetherVuTran231
 
Cs229 notes11
Cs229 notes11Cs229 notes11
Cs229 notes11VuTran231
 
Cs229 notes10
Cs229 notes10Cs229 notes10
Cs229 notes10VuTran231
 
Cs229 notes7a
Cs229 notes7aCs229 notes7a
Cs229 notes7aVuTran231
 
Cs229 notes12
Cs229 notes12Cs229 notes12
Cs229 notes12VuTran231
 
Cs229 notes-deep learning
Cs229 notes-deep learningCs229 notes-deep learning
Cs229 notes-deep learningVuTran231
 
Cs229 notes4
Cs229 notes4Cs229 notes4
Cs229 notes4VuTran231
 

More from VuTran231 (12)

4 work readiness and reflection on wr and pol
4 work readiness and reflection on wr and pol4 work readiness and reflection on wr and pol
4 work readiness and reflection on wr and pol
 
2 project-oriented learning examined
2 project-oriented learning examined2 project-oriented learning examined
2 project-oriented learning examined
 
2 blended learning directions
2 blended learning directions2 blended learning directions
2 blended learning directions
 
4 jigsaw review
4 jigsaw review4 jigsaw review
4 jigsaw review
 
2 learning alone and together
2  learning alone and together2  learning alone and together
2 learning alone and together
 
Cs229 notes11
Cs229 notes11Cs229 notes11
Cs229 notes11
 
Cs229 notes10
Cs229 notes10Cs229 notes10
Cs229 notes10
 
Cs229 notes7a
Cs229 notes7aCs229 notes7a
Cs229 notes7a
 
Cs229 notes12
Cs229 notes12Cs229 notes12
Cs229 notes12
 
Cs229 notes-deep learning
Cs229 notes-deep learningCs229 notes-deep learning
Cs229 notes-deep learning
 
Cs229 notes4
Cs229 notes4Cs229 notes4
Cs229 notes4
 
EDA
EDAEDA
EDA
 

Recently uploaded

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 

Recently uploaded (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 

Cs229 notes7b

  • 1. CS229 Lecture notes Andrew Ng Mixtures of Gaussians and the EM algorithm In this set of notes, we discuss the EM (Expectation-Maximization) algorithm for density estimation. Suppose that we are given a training set {x(1) , . . . , x(n) } as usual. Since we are in the unsupervised learning setting, these points do not come with any labels. We wish to model the data by specifying a joint distribution p(x(i) , z(i) ) = p(x(i) |z(i) )p(z(i) ). Here, z(i) ∼ Multinomial(φ) (where φj ≥ 0, k j=1 φj = 1, and the parameter φj gives p(z(i) = j)), and x(i) |z(i) = j ∼ N(µj, Σj). We let k denote the number of values that the z(i) ’s can take on. Thus, our model posits that each x(i) was generated by randomly choosing z(i) from {1, . . . , k}, and then x(i) was drawn from one of k Gaussians depending on z(i) . This is called the mixture of Gaussians model. Also, note that the z(i) ’s are latent random variables, meaning that they’re hidden/unobserved. This is what will make our estimation problem difficult. The parameters of our model are thus φ, µ and Σ. To estimate them, we can write down the likelihood of our data: ℓ(φ, µ, Σ) = n i=1 log p(x(i) ; φ, µ, Σ) = n i=1 log k z(i)=1 p(x(i) |z(i) ; µ, Σ)p(z(i) ; φ). However, if we set to zero the derivatives of this formula with respect to the parameters and try to solve, we’ll find that it is not possible to find the maximum likelihood estimates of the parameters in closed form. (Try this yourself at home.) The random variables z(i) indicate which of the k Gaussians each x(i) had come from. Note that if we knew what the z(i) ’s were, the maximum 1
  • 2. 2 likelihood problem would have been easy. Specifically, we could then write down the likelihood as ℓ(φ, µ, Σ) = n i=1 log p(x(i) |z(i) ; µ, Σ) + log p(z(i) ; φ). Maximizing this with respect to φ, µ and Σ gives the parameters: φj = 1 n n i=1 1{z(i) = j}, µj = n i=1 1{z(i) = j}x(i) n i=1 1{z(i) = j} , Σj = n i=1 1{z(i) = j}(x(i) − µj)(x(i) − µj)T n i=1 1{z(i) = j} . Indeed, we see that if the z(i) ’s were known, then maximum likelihood estimation becomes nearly identical to what we had when estimating the parameters of the Gaussian discriminant analysis model, except that here the z(i) ’s playing the role of the class labels.1 However, in our density estimation problem, the z(i) ’s are not known. What can we do? The EM algorithm is an iterative algorithm that has two main steps. Applied to our problem, in the E-step, it tries to “guess” the values of the z(i) ’s. In the M-step, it updates the parameters of our model based on our guesses. Since in the M-step we are pretending that the guesses in the first part were correct, the maximization becomes easy. Here’s the algorithm: Repeat until convergence: { (E-step) For each i, j, set w (i) j := p(z(i) = j|x(i) ; φ, µ, Σ) 1 There are other minor differences in the formulas here from what we’d obtained in PS1 with Gaussian discriminant analysis, first because we’ve generalized the z(i) ’s to be multinomial rather than Bernoulli, and second because here we are using a different Σj for each Gaussian.
  • 3. 3 (M-step) Update the parameters: φj := 1 n n i=1 w (i) j , µj := n i=1 w (i) j x(i) n i=1 w (i) j , Σj := n i=1 w (i) j (x(i) − µj)(x(i) − µj)T n i=1 w (i) j } In the E-step, we calculate the posterior probability of our parameters the z(i) ’s, given the x(i) and using the current setting of our parameters. I.e., using Bayes rule, we obtain: p(z(i) = j|x(i) ; φ, µ, Σ) = p(x(i) |z(i) = j; µ, Σ)p(z(i) = j; φ) k l=1 p(x(i)|z(i) = l; µ, Σ)p(z(i) = l; φ) Here, p(x(i) |z(i) = j; µ, Σ) is given by evaluating the density of a Gaussian with mean µj and covariance Σj at x(i) ; p(z(i) = j; φ) is given by φj, and so on. The values w (i) j calculated in the E-step represent our “soft” guesses2 for the values of z(i) . Also, you should contrast the updates in the M-step with the formulas we had when the z(i) ’s were known exactly. They are identical, except that in- stead of the indicator functions “1{z(i) = j}” indicating from which Gaussian each datapoint had come, we now instead have the w (i) j ’s. The EM-algorithm is also reminiscent of the K-means clustering algo- rithm, except that instead of the “hard” cluster assignments c(i), we instead have the “soft” assignments w (i) j . Similar to K-means, it is also susceptible to local optima, so reinitializing at several different initial parameters may be a good idea. It’s clear that the EM algorithm has a very natural interpretation of repeatedly trying to guess the unknown z(i) ’s; but how did it come about, and can we make any guarantees about it, such as regarding its convergence? In the next set of notes, we will describe a more general view of EM, one 2 The term “soft” refers to our guesses being probabilities and taking values in [0, 1]; in contrast, a “hard” guess is one that represents a single best guess (such as taking values in {0, 1} or {1, . . . , k}).
  • 4. 4 that will allow us to easily apply it to other estimation problems in which there are also latent variables, and which will allow us to give a convergence guarantee.