CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
Β
Em algorithm
1. Probabilistic Models
with Latent Variables
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 1
2. Density Estimation Problem
β’ Learning from unlabeled data π₯1, π₯2, β¦ , π₯π
β’ Unsupervised learning, density estimation
β’ Empirical distribution typically has multiple modes
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 2
3. Density Estimation Problem
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 3
From http://courses.ee.sun.ac.za/Pattern_Recognition_813
From http://yulearning.blogspot.co.uk
4. Density Estimation Problem
β’ Conv. composition of unimodal pdfβs: multimodal pdf
π π₯ = π π€πππ(π₯) where π π€π = 1
β’ Physical interpretation
β’ Sub populations
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 4
5. Latent Variables
β’ Introduce new variable ππ for each ππ
β’ Latent / hidden: not observed in the data
β’ Probabilistic interpretation
β’ Mixing weights: π€π β‘ π(π§π = π)
β’ Mixture densities: ππ(π₯) β‘ π(π₯|π§π = π)
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 5
6. Generative Mixture Model
For π = 1, β¦ , π
ππ~πππ ππ’ππ‘
ππ~πππ π(π₯|π§π)
β’ π(π₯π, π§π) = π π§π π(π₯π|π§π)
β’ π π₯π = π π(π₯π, π§π) recovers mixture
distribution
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 6
ππ
ππ
π
Plate Notation
7. Tasks in a Mixture Model
β’ Inference
π π§ π₯ =
π
π(π§π|π₯π)
β’ Parameter Estimation
β’ Find parameters that e.g. maximize likelihood
β’ Does not decouple according to classes
β’ Non convex, many local minima
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 7
8. Example: Gaussian Mixture Model
β’ Model
For π = 1, β¦ , π
ππ~πππ ππ’ππ‘(π)
ππ | ππ = π~πππ π(π₯|ππ, Ξ£)
β’ Inference
π(π§π = π|π₯π; π, Ξ£)
β’ Soft-max function
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 8
9. Example: Gaussian Mixture Model
β’ Loglikelihood
β’ Which training instance comes from which component?
π π =
π
log π π₯π =
π
log
π
π π§π = π π(π₯π|π§π = π)
β’ No closed form solution for maximizing π π
β’ Possibility 1: Gradient descent etc
β’ Possibility 2: Expectation Maximization
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 9
10. Expectation Maximization Algorithm
β’ Observation: Know values of ππ β easy to maximize
β’ Key idea: iterative updates
β’ Given parameter estimates, βinferβ all ππ variables
β’ Given inferred ππ variables, maximize wrt parameters
β’ Questions
β’ Does this converge?
β’ What does this maximize?
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 10
11. Expectation Maximization Algorithm
β’ Complete loglikelihood
ππ π =
π
log π π₯π, π§π =
π
log π π§π π(π₯π|π§π)
β’ Problem: π§π not known
β’ Possible solution: Replace w/ conditional expectation
β’ Expected complete loglikelihood
π π, ππππ = πΈ[
π
log π π₯π, π§π ]
Wrt π(π§|π₯, ππππ) where ππππ are the current parameters
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 11
13. Expectation Maximization Algorithm
β’ Expectation Step
β’ Update πΎππ based on current parameters
πΎππ =
πππ π₯π ππππ,π
π πππ π₯π ππππ,π
β’ Maximization Step
β’ Maximize π π, ππππ wrt parameters
β’ Overall algorithm
β’ Initialize all latent variables
β’ Iterate until convergence
β’ M Step
β’ E Step
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 13
14. Example: EM for GMM
β’ E Step remains the step for all mixture models
β’ M Step
β’ ππ = π πΎππ
π
=
πΎπ
π
β’ ππ = π πΎπππ₯π
πΎπ
β’ Ξ£ =?
β’ Compare with generative classifier
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 14
15. Analysis of EM Algorithm
β’ Expected complete LL is a lower bound on LL
β’ EM iteratively maximizes this lower bound
β’ Converges to a local maximum of the loglikelihood
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 15
16. Bayesian / MAP Estimation
β’ EM overfits
β’ Possible to perform MAP instead of MLE in M-step
β’ EM is partially Bayesian
β’ Posterior distribution over latent variables
β’ Point estimate over parameters
β’ Fully Bayesian approach is called Variational Bayes
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 16
17. (Lloydβs) K Means Algorithm
β’ Hard EM for Gaussian Mixture Model
β’ Point estimate of parameters (as usual)
β’ Point estimate of latent variables
β’ Spherical Gaussian mixture components
π§π
β
= arg max
k
π(π§π = π|π₯π, π) = arg min
π
π₯π β ππ 2
2
Where ππ =
π:π§π=π π₯π
π
β’ Most popular βhardβ clustering algorithm
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 17
18. K Means Problem
β’ Given {π₯π}, find k βmeansβ (π1
β
, β¦ , ππ
β
) and data
assignments (π§1
β
, β¦ , π§π
β
) such that
πβ, π§β = arg min
π,π§
π
π₯π β ππ§π 2
2
β’ Note: π§π is k-dimensional binary vector
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 18
19. Model selection: Choosing K for GMM
β’ Cross validation
β’ Plot likelihood on training set and validation set for
increasing values of k
β’ Likelihood on training set keeps improving
β’ Likelihood on validation set drops after βoptimalβ k
β’ Does not work for k-means! Why?
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 19
20. Principal Component Analysis: Motivation
β’ Dimensionality reduction
β’ Reduces #parameters to estimate
β’ Data often resides in much lower dimension, e.g., on a line
in a 3D space
β’ Provides βunderstandingβ
β’ Mixture models very restricted
β’ Latent variables restricted to small discrete set
β’ Can we βrelaxβ the latent variable?
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 20
21. Classical PCA: Motivation
β’ Revisit K-means
min
π,π
π½ π, π = π β πππ 2
πΉ
β’ W: matrix containing means
β’ Z: matrix containing cluster membership vectors
β’ How can we relax Z and W?
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 21
22. Classical PCA: Problem
min
π,π
π½ π, π = |π β πππ |2
πΉ
β’ X : π· Γ π
β’ Arbitrary Z of size π Γ πΏ,
β’ Orthonormal W of size π· Γ πΏ
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 22
23. Classical PCA: Optimal Solution
β’ Empirical covariance matrix Ξ£ =
1
π π π₯ππ₯π
π
β’ Scaled and centered data
β’ π = ππΏ where ππΏ contains L Eigen vectors for the L
largest Eigen values of Ξ£
β’ π§π = πππ₯π
β’ Alternative solution via Singular Value
Decomposition (SVD)
β’ W contains the βprincipal componentsβ that capture
the largest variance in the data
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 23
24. Probabilistic PCA
β’ Generative model
π(π§π) = π(π§π|π0, Ξ£0)
π(π₯π|π§π) = π(π₯π|ππ§π + π, Ξ¨)
Ξ¨ forced to be diagonal
β’ Latent linear models
β’ Factor Analysis
β’ Special Case: PCA with Ξ¨ = π2 πΌ
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 24
25. Visualization of Generative Process
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 25
From Bishop, PRML
26. Relationship with Gaussian Density
β’ πΆππ£[π₯] = πππ + Ξ¨
β’ Why does Ξ¨ need to be restricted?
β’ Intermediate low rank parameterization of Gaussian
covariance matrix between full rank and diagonal
β’ Compare #parameters
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 26
27. EM for PCA: Rod and Springs
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 27
From Bishop, PRML
28. Advantages of EM
β’ Simpler than gradient methods w/ constraints
β’ Handles missing data
β’ Easy path for handling more complex models
β’ Not always the fastest method
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 28
29. Summary of Latent Variable Models
β’ Learning from unlabeled data
β’ Latent variables
β’ Discrete: Clustering / Mixture models ; GMM
β’ Continuous: Dimensionality reduction ; PCA
β’ Summary / βUnderstandingβ of data
β’ Expectation Maximization Algorithm
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 29