SlideShare a Scribd company logo
1 of 29
Probabilistic Models
with Latent Variables
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 1
Density Estimation Problem
• Learning from unlabeled data 𝑥1, 𝑥2, … , 𝑥𝑁
• Unsupervised learning, density estimation
• Empirical distribution typically has multiple modes
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 2
Density Estimation Problem
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 3
From http://courses.ee.sun.ac.za/Pattern_Recognition_813
From http://yulearning.blogspot.co.uk
Density Estimation Problem
• Conv. composition of unimodal pdf’s: multimodal pdf
𝑓 𝑥 = 𝑘 𝑤𝑘𝑓𝑘(𝑥) where 𝑘 𝑤𝑘 = 1
• Physical interpretation
• Sub populations
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 4
Latent Variables
• Introduce new variable 𝑍𝑖 for each 𝑋𝑖
• Latent / hidden: not observed in the data
• Probabilistic interpretation
• Mixing weights: 𝑤𝑘 ≡ 𝑝(𝑧𝑖 = 𝑘)
• Mixture densities: 𝑓𝑘(𝑥) ≡ 𝑝(𝑥|𝑧𝑖 = 𝑘)
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 5
Generative Mixture Model
For 𝑖 = 1, … , 𝑁
𝑍𝑖~𝑖𝑖𝑑 𝑀𝑢𝑙𝑡
𝑋𝑖~𝑖𝑖𝑑 𝑝(𝑥|𝑧𝑖)
• 𝑃(𝑥𝑖, 𝑧𝑖) = 𝑝 𝑧𝑖 𝑝(𝑥𝑖|𝑧𝑖)
• 𝑃 𝑥𝑖 = 𝑘 𝑝(𝑥𝑖, 𝑧𝑖) recovers mixture
distribution
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 6
𝑍𝑖
𝑋𝑖
𝑁
Plate Notation
Tasks in a Mixture Model
• Inference
𝑃 𝑧 𝑥 =
𝑖
𝑃(𝑧𝑖|𝑥𝑖)
• Parameter Estimation
• Find parameters that e.g. maximize likelihood
• Does not decouple according to classes
• Non convex, many local minima
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 7
Example: Gaussian Mixture Model
• Model
For 𝑖 = 1, … , 𝑁
𝑍𝑖~𝑖𝑖𝑑 𝑀𝑢𝑙𝑡(𝜋)
𝑋𝑖 | 𝑍𝑖 = 𝑘~𝑖𝑖𝑑 𝑁(𝑥|𝜇𝑘, Σ)
• Inference
𝑃(𝑧𝑖 = 𝑘|𝑥𝑖; 𝜇, Σ)
• Soft-max function
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 8
Example: Gaussian Mixture Model
• Loglikelihood
• Which training instance comes from which component?
𝑙 𝜃 =
𝑖
log 𝑝 𝑥𝑖 =
𝑖
log
𝑘
𝑝 𝑧𝑖 = 𝑘 𝑝(𝑥𝑖|𝑧𝑖 = 𝑘)
• No closed form solution for maximizing 𝑙 𝜃
• Possibility 1: Gradient descent etc
• Possibility 2: Expectation Maximization
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 9
Expectation Maximization Algorithm
• Observation: Know values of 𝑍𝑖 ⇒ easy to maximize
• Key idea: iterative updates
• Given parameter estimates, “infer” all 𝑍𝑖 variables
• Given inferred 𝑍𝑖 variables, maximize wrt parameters
• Questions
• Does this converge?
• What does this maximize?
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 10
Expectation Maximization Algorithm
• Complete loglikelihood
𝑙𝑐 𝜃 =
𝑖
log 𝑝 𝑥𝑖, 𝑧𝑖 =
𝑖
log 𝑝 𝑧𝑖 𝑝(𝑥𝑖|𝑧𝑖)
• Problem: 𝑧𝑖 not known
• Possible solution: Replace w/ conditional expectation
• Expected complete loglikelihood
𝑄 𝜃, 𝜃𝑜𝑙𝑑 = 𝐸[
𝑖
log 𝑝 𝑥𝑖, 𝑧𝑖 ]
Wrt 𝑝(𝑧|𝑥, 𝜃𝑜𝑙𝑑) where 𝜃𝑜𝑙𝑑 are the current parameters
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 11
Expectation Maximization Algorithm
𝑄 𝜃, 𝜃𝑜𝑙𝑑 = 𝐸[
𝑖
log 𝑝 𝑥𝑖, 𝑧𝑖 ]
=
𝑖 𝑘
𝐸 𝐼 𝑧𝑖 = 𝑘 log[𝜋𝑘𝑝(𝑥𝑖|𝜃𝑘)]
=
𝑖 𝑘
𝑝(𝑧𝑖 = 𝑘|𝑥𝑖, 𝜃𝑜𝑙𝑑) log[𝜋𝑘𝑝(𝑥𝑖|𝜃𝑘)]
=
𝑖 𝑘
𝛾𝑖𝑘 log 𝜋𝑘 +
𝑖 𝑘
𝛾𝑖𝑘 log 𝑝(𝑥𝑖|𝜃𝑘)
Where 𝛾𝑖𝑘 = 𝑝(𝑧𝑖 = 𝑘|𝑥𝑖, 𝜃𝑜𝑙𝑑)
• Compare with likelihood for generative classifier
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 12
Expectation Maximization Algorithm
• Expectation Step
• Update 𝛾𝑖𝑘 based on current parameters
𝛾𝑖𝑘 =
𝜋𝑘𝑝 𝑥𝑖 𝜃𝑜𝑙𝑑,𝑘
𝑘 𝜋𝑘𝑝 𝑥𝑖 𝜃𝑜𝑙𝑑,𝑘
• Maximization Step
• Maximize 𝑄 𝜃, 𝜃𝑜𝑙𝑑 wrt parameters
• Overall algorithm
• Initialize all latent variables
• Iterate until convergence
• M Step
• E Step
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 13
Example: EM for GMM
• E Step remains the step for all mixture models
• M Step
• 𝜋𝑘 = 𝑖 𝛾𝑖𝑘
𝑁
=
𝛾𝑘
𝑁
• 𝜇𝑘 = 𝑖 𝛾𝑖𝑘𝑥𝑖
𝛾𝑘
• Σ =?
• Compare with generative classifier
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 14
Analysis of EM Algorithm
• Expected complete LL is a lower bound on LL
• EM iteratively maximizes this lower bound
• Converges to a local maximum of the loglikelihood
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 15
Bayesian / MAP Estimation
• EM overfits
• Possible to perform MAP instead of MLE in M-step
• EM is partially Bayesian
• Posterior distribution over latent variables
• Point estimate over parameters
• Fully Bayesian approach is called Variational Bayes
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 16
(Lloyd’s) K Means Algorithm
• Hard EM for Gaussian Mixture Model
• Point estimate of parameters (as usual)
• Point estimate of latent variables
• Spherical Gaussian mixture components
𝑧𝑖
∗
= arg max
k
𝑝(𝑧𝑖 = 𝑘|𝑥𝑖, 𝜃) = arg min
𝑘
𝑥𝑖 − 𝜇𝑘 2
2
Where 𝜇𝑘 =
𝑖:𝑧𝑖=𝑘 𝑥𝑖
𝑁
• Most popular “hard” clustering algorithm
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 17
K Means Problem
• Given {𝑥𝑖}, find k “means” (𝜇1
∗
, … , 𝜇𝑘
∗
) and data
assignments (𝑧1
∗
, … , 𝑧𝑁
∗
) such that
𝜇∗, 𝑧∗ = arg min
𝜇,𝑧
𝑖
𝑥𝑖 − 𝜇𝑧𝑖 2
2
• Note: 𝑧𝑖 is k-dimensional binary vector
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 18
Model selection: Choosing K for GMM
• Cross validation
• Plot likelihood on training set and validation set for
increasing values of k
• Likelihood on training set keeps improving
• Likelihood on validation set drops after “optimal” k
• Does not work for k-means! Why?
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 19
Principal Component Analysis: Motivation
• Dimensionality reduction
• Reduces #parameters to estimate
• Data often resides in much lower dimension, e.g., on a line
in a 3D space
• Provides “understanding”
• Mixture models very restricted
• Latent variables restricted to small discrete set
• Can we “relax” the latent variable?
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 20
Classical PCA: Motivation
• Revisit K-means
min
𝑊,𝑍
𝐽 𝑊, 𝑍 = 𝑋 − 𝑊𝑍𝑇 2
𝐹
• W: matrix containing means
• Z: matrix containing cluster membership vectors
• How can we relax Z and W?
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 21
Classical PCA: Problem
min
𝑊,𝑍
𝐽 𝑊, 𝑍 = |𝑋 − 𝑊𝑍𝑇 |2
𝐹
• X : 𝐷 × 𝑁
• Arbitrary Z of size 𝑁 × 𝐿,
• Orthonormal W of size 𝐷 × 𝐿
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 22
Classical PCA: Optimal Solution
• Empirical covariance matrix Σ =
1
𝑁 𝑖 𝑥𝑖𝑥𝑖
𝑇
• Scaled and centered data
• 𝑊 = 𝑉𝐿 where 𝑉𝐿 contains L Eigen vectors for the L
largest Eigen values of Σ
• 𝑧𝑖 = 𝑊𝑇𝑥𝑖
• Alternative solution via Singular Value
Decomposition (SVD)
• W contains the “principal components” that capture
the largest variance in the data
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 23
Probabilistic PCA
• Generative model
𝑃(𝑧𝑖) = 𝑁(𝑧𝑖|𝜇0, Σ0)
𝑃(𝑥𝑖|𝑧𝑖) = 𝑁(𝑥𝑖|𝑊𝑧𝑖 + 𝜇, Ψ)
Ψ forced to be diagonal
• Latent linear models
• Factor Analysis
• Special Case: PCA with Ψ = 𝜎2 𝐼
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 24
Visualization of Generative Process
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 25
From Bishop, PRML
Relationship with Gaussian Density
• 𝐶𝑜𝑣[𝑥] = 𝑊𝑊𝑇 + Ψ
• Why does Ψ need to be restricted?
• Intermediate low rank parameterization of Gaussian
covariance matrix between full rank and diagonal
• Compare #parameters
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 26
EM for PCA: Rod and Springs
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 27
From Bishop, PRML
Advantages of EM
• Simpler than gradient methods w/ constraints
• Handles missing data
• Easy path for handling more complex models
• Not always the fastest method
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 28
Summary of Latent Variable Models
• Learning from unlabeled data
• Latent variables
• Discrete: Clustering / Mixture models ; GMM
• Continuous: Dimensionality reduction ; PCA
• Summary / “Understanding” of data
• Expectation Maximization Algorithm
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 29

More Related Content

Similar to lecture6.pptx

Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
SANN: Programming Code Representation Using Attention Neural Network with Opt...
SANN: Programming Code Representation Using Attention Neural Network with Opt...SANN: Programming Code Representation Using Attention Neural Network with Opt...
SANN: Programming Code Representation Using Attention Neural Network with Opt...Peter Brusilovsky
 
Linear Probability Models and Big Data: Prediction, Inference and Selection Bias
Linear Probability Models and Big Data: Prediction, Inference and Selection BiasLinear Probability Models and Big Data: Prediction, Inference and Selection Bias
Linear Probability Models and Big Data: Prediction, Inference and Selection BiasSuneel Babu Chatla
 
Efficient Sparse Coding Algorithms
Efficient Sparse Coding AlgorithmsEfficient Sparse Coding Algorithms
Efficient Sparse Coding AlgorithmsAnshu Dipit
 
2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata GeneratorBoris Glavic
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 

Similar to lecture6.pptx (20)

Optforml
OptformlOptforml
Optforml
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
Session 4 .pdf
Session 4 .pdfSession 4 .pdf
Session 4 .pdf
 
Daa unit 1
Daa unit 1Daa unit 1
Daa unit 1
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGI
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
03_Optimization (1).pptx
03_Optimization (1).pptx03_Optimization (1).pptx
03_Optimization (1).pptx
 
Features.pdf
Features.pdfFeatures.pdf
Features.pdf
 
Derivative free optimizations
Derivative free optimizationsDerivative free optimizations
Derivative free optimizations
 
SANN: Programming Code Representation Using Attention Neural Network with Opt...
SANN: Programming Code Representation Using Attention Neural Network with Opt...SANN: Programming Code Representation Using Attention Neural Network with Opt...
SANN: Programming Code Representation Using Attention Neural Network with Opt...
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
Linear Probability Models and Big Data: Prediction, Inference and Selection Bias
Linear Probability Models and Big Data: Prediction, Inference and Selection BiasLinear Probability Models and Big Data: Prediction, Inference and Selection Bias
Linear Probability Models and Big Data: Prediction, Inference and Selection Bias
 
Efficient Sparse Coding Algorithms
Efficient Sparse Coding AlgorithmsEfficient Sparse Coding Algorithms
Efficient Sparse Coding Algorithms
 
2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator
 
Ds33717725
Ds33717725Ds33717725
Ds33717725
 
Ds33717725
Ds33717725Ds33717725
Ds33717725
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 

Recently uploaded

Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxMustafa Ahmed
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualBalamuruganV28
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisDr.Costas Sachpazis
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailingAshishSingh1301
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...Amil baba
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfJNTUA
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...josephjonse
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationEmaan Sharma
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdfAlexander Litvinenko
 
Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptjigup7320
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxCHAIRMAN M
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashidFaiyazSheikh
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxKarpagam Institute of Teechnology
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfJNTUA
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...drjose256
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdfKamal Acharya
 
Software Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfSoftware Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfssuser5c9d4b1
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docxrahulmanepalli02
 

Recently uploaded (20)

Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) ppt
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdf
 
Software Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfSoftware Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdf
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 

lecture6.pptx

  • 1. Probabilistic Models with Latent Variables Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 1
  • 2. Density Estimation Problem • Learning from unlabeled data 𝑥1, 𝑥2, … , 𝑥𝑁 • Unsupervised learning, density estimation • Empirical distribution typically has multiple modes Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 2
  • 3. Density Estimation Problem Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 3 From http://courses.ee.sun.ac.za/Pattern_Recognition_813 From http://yulearning.blogspot.co.uk
  • 4. Density Estimation Problem • Conv. composition of unimodal pdf’s: multimodal pdf 𝑓 𝑥 = 𝑘 𝑤𝑘𝑓𝑘(𝑥) where 𝑘 𝑤𝑘 = 1 • Physical interpretation • Sub populations Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 4
  • 5. Latent Variables • Introduce new variable 𝑍𝑖 for each 𝑋𝑖 • Latent / hidden: not observed in the data • Probabilistic interpretation • Mixing weights: 𝑤𝑘 ≡ 𝑝(𝑧𝑖 = 𝑘) • Mixture densities: 𝑓𝑘(𝑥) ≡ 𝑝(𝑥|𝑧𝑖 = 𝑘) Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 5
  • 6. Generative Mixture Model For 𝑖 = 1, … , 𝑁 𝑍𝑖~𝑖𝑖𝑑 𝑀𝑢𝑙𝑡 𝑋𝑖~𝑖𝑖𝑑 𝑝(𝑥|𝑧𝑖) • 𝑃(𝑥𝑖, 𝑧𝑖) = 𝑝 𝑧𝑖 𝑝(𝑥𝑖|𝑧𝑖) • 𝑃 𝑥𝑖 = 𝑘 𝑝(𝑥𝑖, 𝑧𝑖) recovers mixture distribution Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 6 𝑍𝑖 𝑋𝑖 𝑁 Plate Notation
  • 7. Tasks in a Mixture Model • Inference 𝑃 𝑧 𝑥 = 𝑖 𝑃(𝑧𝑖|𝑥𝑖) • Parameter Estimation • Find parameters that e.g. maximize likelihood • Does not decouple according to classes • Non convex, many local minima Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 7
  • 8. Example: Gaussian Mixture Model • Model For 𝑖 = 1, … , 𝑁 𝑍𝑖~𝑖𝑖𝑑 𝑀𝑢𝑙𝑡(𝜋) 𝑋𝑖 | 𝑍𝑖 = 𝑘~𝑖𝑖𝑑 𝑁(𝑥|𝜇𝑘, Σ) • Inference 𝑃(𝑧𝑖 = 𝑘|𝑥𝑖; 𝜇, Σ) • Soft-max function Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 8
  • 9. Example: Gaussian Mixture Model • Loglikelihood • Which training instance comes from which component? 𝑙 𝜃 = 𝑖 log 𝑝 𝑥𝑖 = 𝑖 log 𝑘 𝑝 𝑧𝑖 = 𝑘 𝑝(𝑥𝑖|𝑧𝑖 = 𝑘) • No closed form solution for maximizing 𝑙 𝜃 • Possibility 1: Gradient descent etc • Possibility 2: Expectation Maximization Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 9
  • 10. Expectation Maximization Algorithm • Observation: Know values of 𝑍𝑖 ⇒ easy to maximize • Key idea: iterative updates • Given parameter estimates, “infer” all 𝑍𝑖 variables • Given inferred 𝑍𝑖 variables, maximize wrt parameters • Questions • Does this converge? • What does this maximize? Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 10
  • 11. Expectation Maximization Algorithm • Complete loglikelihood 𝑙𝑐 𝜃 = 𝑖 log 𝑝 𝑥𝑖, 𝑧𝑖 = 𝑖 log 𝑝 𝑧𝑖 𝑝(𝑥𝑖|𝑧𝑖) • Problem: 𝑧𝑖 not known • Possible solution: Replace w/ conditional expectation • Expected complete loglikelihood 𝑄 𝜃, 𝜃𝑜𝑙𝑑 = 𝐸[ 𝑖 log 𝑝 𝑥𝑖, 𝑧𝑖 ] Wrt 𝑝(𝑧|𝑥, 𝜃𝑜𝑙𝑑) where 𝜃𝑜𝑙𝑑 are the current parameters Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 11
  • 12. Expectation Maximization Algorithm 𝑄 𝜃, 𝜃𝑜𝑙𝑑 = 𝐸[ 𝑖 log 𝑝 𝑥𝑖, 𝑧𝑖 ] = 𝑖 𝑘 𝐸 𝐼 𝑧𝑖 = 𝑘 log[𝜋𝑘𝑝(𝑥𝑖|𝜃𝑘)] = 𝑖 𝑘 𝑝(𝑧𝑖 = 𝑘|𝑥𝑖, 𝜃𝑜𝑙𝑑) log[𝜋𝑘𝑝(𝑥𝑖|𝜃𝑘)] = 𝑖 𝑘 𝛾𝑖𝑘 log 𝜋𝑘 + 𝑖 𝑘 𝛾𝑖𝑘 log 𝑝(𝑥𝑖|𝜃𝑘) Where 𝛾𝑖𝑘 = 𝑝(𝑧𝑖 = 𝑘|𝑥𝑖, 𝜃𝑜𝑙𝑑) • Compare with likelihood for generative classifier Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 12
  • 13. Expectation Maximization Algorithm • Expectation Step • Update 𝛾𝑖𝑘 based on current parameters 𝛾𝑖𝑘 = 𝜋𝑘𝑝 𝑥𝑖 𝜃𝑜𝑙𝑑,𝑘 𝑘 𝜋𝑘𝑝 𝑥𝑖 𝜃𝑜𝑙𝑑,𝑘 • Maximization Step • Maximize 𝑄 𝜃, 𝜃𝑜𝑙𝑑 wrt parameters • Overall algorithm • Initialize all latent variables • Iterate until convergence • M Step • E Step Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 13
  • 14. Example: EM for GMM • E Step remains the step for all mixture models • M Step • 𝜋𝑘 = 𝑖 𝛾𝑖𝑘 𝑁 = 𝛾𝑘 𝑁 • 𝜇𝑘 = 𝑖 𝛾𝑖𝑘𝑥𝑖 𝛾𝑘 • Σ =? • Compare with generative classifier Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 14
  • 15. Analysis of EM Algorithm • Expected complete LL is a lower bound on LL • EM iteratively maximizes this lower bound • Converges to a local maximum of the loglikelihood Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 15
  • 16. Bayesian / MAP Estimation • EM overfits • Possible to perform MAP instead of MLE in M-step • EM is partially Bayesian • Posterior distribution over latent variables • Point estimate over parameters • Fully Bayesian approach is called Variational Bayes Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 16
  • 17. (Lloyd’s) K Means Algorithm • Hard EM for Gaussian Mixture Model • Point estimate of parameters (as usual) • Point estimate of latent variables • Spherical Gaussian mixture components 𝑧𝑖 ∗ = arg max k 𝑝(𝑧𝑖 = 𝑘|𝑥𝑖, 𝜃) = arg min 𝑘 𝑥𝑖 − 𝜇𝑘 2 2 Where 𝜇𝑘 = 𝑖:𝑧𝑖=𝑘 𝑥𝑖 𝑁 • Most popular “hard” clustering algorithm Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 17
  • 18. K Means Problem • Given {𝑥𝑖}, find k “means” (𝜇1 ∗ , … , 𝜇𝑘 ∗ ) and data assignments (𝑧1 ∗ , … , 𝑧𝑁 ∗ ) such that 𝜇∗, 𝑧∗ = arg min 𝜇,𝑧 𝑖 𝑥𝑖 − 𝜇𝑧𝑖 2 2 • Note: 𝑧𝑖 is k-dimensional binary vector Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 18
  • 19. Model selection: Choosing K for GMM • Cross validation • Plot likelihood on training set and validation set for increasing values of k • Likelihood on training set keeps improving • Likelihood on validation set drops after “optimal” k • Does not work for k-means! Why? Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 19
  • 20. Principal Component Analysis: Motivation • Dimensionality reduction • Reduces #parameters to estimate • Data often resides in much lower dimension, e.g., on a line in a 3D space • Provides “understanding” • Mixture models very restricted • Latent variables restricted to small discrete set • Can we “relax” the latent variable? Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 20
  • 21. Classical PCA: Motivation • Revisit K-means min 𝑊,𝑍 𝐽 𝑊, 𝑍 = 𝑋 − 𝑊𝑍𝑇 2 𝐹 • W: matrix containing means • Z: matrix containing cluster membership vectors • How can we relax Z and W? Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 21
  • 22. Classical PCA: Problem min 𝑊,𝑍 𝐽 𝑊, 𝑍 = |𝑋 − 𝑊𝑍𝑇 |2 𝐹 • X : 𝐷 × 𝑁 • Arbitrary Z of size 𝑁 × 𝐿, • Orthonormal W of size 𝐷 × 𝐿 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 22
  • 23. Classical PCA: Optimal Solution • Empirical covariance matrix Σ = 1 𝑁 𝑖 𝑥𝑖𝑥𝑖 𝑇 • Scaled and centered data • 𝑊 = 𝑉𝐿 where 𝑉𝐿 contains L Eigen vectors for the L largest Eigen values of Σ • 𝑧𝑖 = 𝑊𝑇𝑥𝑖 • Alternative solution via Singular Value Decomposition (SVD) • W contains the “principal components” that capture the largest variance in the data Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 23
  • 24. Probabilistic PCA • Generative model 𝑃(𝑧𝑖) = 𝑁(𝑧𝑖|𝜇0, Σ0) 𝑃(𝑥𝑖|𝑧𝑖) = 𝑁(𝑥𝑖|𝑊𝑧𝑖 + 𝜇, Ψ) Ψ forced to be diagonal • Latent linear models • Factor Analysis • Special Case: PCA with Ψ = 𝜎2 𝐼 Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 24
  • 25. Visualization of Generative Process Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 25 From Bishop, PRML
  • 26. Relationship with Gaussian Density • 𝐶𝑜𝑣[𝑥] = 𝑊𝑊𝑇 + Ψ • Why does Ψ need to be restricted? • Intermediate low rank parameterization of Gaussian covariance matrix between full rank and diagonal • Compare #parameters Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 26
  • 27. EM for PCA: Rod and Springs Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 27 From Bishop, PRML
  • 28. Advantages of EM • Simpler than gradient methods w/ constraints • Handles missing data • Easy path for handling more complex models • Not always the fastest method Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 28
  • 29. Summary of Latent Variable Models • Learning from unlabeled data • Latent variables • Discrete: Clustering / Mixture models ; GMM • Continuous: Dimensionality reduction ; PCA • Summary / “Understanding” of data • Expectation Maximization Algorithm Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 29