Machine learning interviews day4

Machine Learning Interviews –
Day 4
Arpit Agarwal

Meta-Idea
Probability
Model Data
Inference
(Likelihood)
A model of the data generating process gives rise to data.
Model estimation from data is most commonly through Likelihood estimation

Likelihood Function
P Data Model P Model
( | ) ( )
( )
( | )
P Data
P Model Data 
Likelihood Function
Find the “best” model which has generated the data. In a likelihood function
the data is considered fixed and one searches for the best model over the
different choices available.

Maximum Likelihood Estimation
• We want to select a model which will
maximize the probability that the data was
generated from the model
maxlog P(Data|Model)

Examples
• Suppose we have the following data
– 0,1,1,0,0,1,1,0
• In this case it is sensible to choose the
Bernoulli distribution (B(p)) as the model
space.
• Now we want to choose the best p, i.e.,

Examples
Suppose the following are marks in a course
55.5, 67, 87, 48, 63
Marks typically follow a Normal distribution
whose density function is
Now, we want to find the best , such that

Examples – Mixture of Gaussian
• Suppose we have data about heights of
people (in cm)
– 185,140,134,150,170
• Heights follow a normal (log normal)
distribution but men on average are taller
than women. This suggests a mixture of two
distributions

Mixture of Gaussians
• The density function is given by
where = probability that a random point is
generated from k-th gaussian

0.5
0.4
0.3
0.2
0.1
0
Component 1 Component 2
-5 0 5 10
p(x)
0.5
0.4
0.3
0.2
0.1
0
Mixture Model
-5 0 5 10
x
p(x)

Mixture of Gaussian
• Let D = {x(1),x(2),…x(n)} be a set of n observations
points generated from a mixture of k gaussians.
• Let H = {z(1),z(2),..z(n)} be a set of n values of a hidden
variable Z.
– z(i) corresponds to the gaussian to which x(i) belongs
• Our goal is to find out the best , and
• For a new data point find out which gaussian it belongs
to.

Mixture of Gaussian – ML approach
• Maximize the log-likelihood
• Difficult to solve in general.
• Idea: Introduce z(i)’s in the optimization problem

Solution – EM Algorithm
• We use EM when we want to do maximum likelihood
parameter estimation but we have hidden data in our model.
• The log-likelihood of the observed data is
 
log p(D | ) log p(D,H | )
H
• As the likelihood might also depend on the values of the
hidden data, not only do we have to estimate  but also H

EM Algorithm - Outline
• start with initial guess of parameters
• E step: based on the current parameters and
observer variables find the probability
distribution over hidden variables
• M step: with respect to the probability
distribution over hidden variables, maximize
the joint log-likelihood.
• Repeat until convergence

Back to Mixture of Gaussians
• Let D = {x(1),x(2),…x(n)} be a set of n observations
points generated from a mixture of k gaussians.
• Let H = {z(1),z(2),..z(n)} be a set of n values of a hidden
variable Z.
– z(i) corresponds to the gaussian to which x(i) belongs
• Our goal is to find out the best , and
• For a new data point find out which gaussian it belongs
to.

EM Algorithm for Mixture of Normals

Machine learning interviews day4

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine learning interviews day4

Similar to Machine learning interviews day4 (20)

Recently uploaded

Recently uploaded (20)

Machine learning interviews day4